# Learning Pandas
---

- **Pandas** is a powerful and popular python library designed for **Data Manipulation**(cleaning, transforming and structuring data) and **Data Analysis** (finding patterns, trends and insights)

- It simplifies working with structured datasets like tables, spreadsheets or time-series data.

---

## Some Pandas key-concepts:

### Series: 

- A **series** is a one-dimensional labeled array that can hold any datatype (integers, floats, strings or even python objects). Each element in the series has a *`unique label`* called ***Index***.

- Used to track changes or pattern over time, such as daily temperature, stock prices or sales revenue etc.


### Data Frames:

- A **Data Frame** is a **two-dimensional labeled data structure** in Pandas, similar to tables in a database, an Excel spreadsheet or a SQL table.

- It consists of `rows` and `columns` where:

a- Rows have indices (labels)

b- Columns have names (labels)

### Read data from CSV, Excel or JSON file into a dataframe

- **CSV**: pd.read_csv("file path")
- **Excel**: pd.read_excel("file path")
- **JSON**: pd.read_json("file path")

`Note:` If you can't read the file due to `encoding error`, you can add *encoding="utf-8"* or *encoding="latin1"*.

In [None]:
import pandas as pd

# CSV
df = pd.read_csv("sales_data_sample.csv", encoding="latin1")    # either "utf-8" or "latin1" in encoding

# Excel
df = pd.read_excel("SampleSuperstore.xlsx")

# JSON
df = pd.read_json("sample_Data.json")

print(df)

      Row ID        Order ID Order Date  Ship Date       Ship Mode  \
0          1  CA-2016-152156 2016-11-08 2016-11-11    Second Class   
1          2  CA-2016-152156 2016-11-08 2016-11-11    Second Class   
2          3  CA-2016-138688 2016-06-12 2016-06-16    Second Class   
3          4  US-2015-108966 2015-10-11 2015-10-18  Standard Class   
4          5  US-2015-108966 2015-10-11 2015-10-18  Standard Class   
...      ...             ...        ...        ...             ...   
9989    9990  CA-2014-110422 2014-01-21 2014-01-23    Second Class   
9990    9991  CA-2017-121258 2017-02-26 2017-03-03  Standard Class   
9991    9992  CA-2017-121258 2017-02-26 2017-03-03  Standard Class   
9992    9993  CA-2017-121258 2017-02-26 2017-03-03  Standard Class   
9993    9994  CA-2017-119914 2017-05-04 2017-05-09    Second Class   

     Customer ID     Customer Name    Segment        Country             City  \
0       CG-12520       Claire Gute   Consumer  United States        Henderson 

### Creating a DataFrame and storing in different formats

Data frame is created as a dictionary. It `automatically` marks the input given in indices.

In [None]:
data = {
    "Name": ["Harry", "Hermione", "Ron", "Malfoy"],
    "Age": [21, 22, 30, 25],
    "City": ["Edinburg", "Toronto", "Milan", "Helsinki"]
}

df = pd.DataFrame(data) # converted the data into dataframe (along with auto indexing)

print(df)

       Name  Age      City
0     Harry   21  Edinburg
1  Hermione   22   Toronto
2       Ron   30     Milan
3    Malfoy   25  Helsinki


### Saving the created data frame in to different formats

Dataframe can be saved into different formats (like `csv`, `excel` or `json` etc).

- **CSV:** df.to_csv("`filePathWithFileName`")
- **Excel:** df.to_excel("`filePathWithFileName`")
- **JSON:** df.to_json("`filePathWithFileName`")

We can remove that automatic-indexing if we don't want it by passing `index=False`

In [None]:
# Saving in csv
df.to_csv("output.csv", index=False)

# Saving in excel
df.to_excel("output.xlsx", index=False)

# Saving in json
df.to_json("output.json", orient="records")  # use orient = "records" for json to remove indexing