### Pandas

#### Introduction:

- What is Pandas?
  - It is a library built on top of numpy, that is used to work on structured data.
  - It can handle a wide variety of structured data formats like csv, excel, sql databases etc.
  - On top of data manipulation, it is helpful in operations like data cleaning, filtering, grouping and more.
  - The above operations make it an important tool for data preparation, post which this cleaned data can used to train AI.

- What are the two fundamental data structures provided by pandas?
  - **DataFrame**: A 2D size mutable tabular array with labeled axes (i.e labeled rows and columns), similar to an excel table.
  - **Series**: A labeled 1D array

#### Installation and setup
- Install pandas library using the command `py -m pip install pandas`
- Then import pandas into the python program


In [96]:
import pandas as pd
print(pd.__version__)

2.2.3


#### Creating a Series
- A series is nothing but a labeled 1 dimensional array.
- It is created using the `Series` constructor.  
- we can create it using a **list** or a **dictionary**.


- Example of series creation using list

In [97]:
list1 = [10, 20, 30, 40]
s1 = pd.Series(list1)
print(s1)

0    10
1    20
2    30
3    40
dtype: int64



- Example of series creation using list with custom indices (i.e labels)


In [98]:
s2 = pd.Series(list1, ['a','b','c','d'])
print(s2)

a    10
b    20
c    30
d    40
dtype: int64


- Example of series creation using dictionaries (the keys will be the indices of the created series)

In [99]:
dict1 = {
  'a': 10,
  'b': 20,
  'c': 30,
  'd': 40
}
s3 = pd.Series(dict1)
print(s3)

a    10
b    20
c    30
d    40
dtype: int64


##### Accessing elements in a series


In [100]:
print(s3['a'])

10


##### Slicing elements in a series
  - For slicing we can just use the order index

In [101]:
print(s3)
print(s3[1:3])

a    10
b    20
c    30
d    40
dtype: int64
b    20
c    30
dtype: int64


 
#### Creating a DataFrame
- A data frame is a 2D array with labeled axes (i.e labeled rows and columns similar to a spreadsheet).
- We can create a DataFrame using
  1. Dictionary of lists
  2. Numpy array
  3. External structured data like csv, excel etc.



- Example of creating a DataFrame using a dictionary of lists


In [102]:
# While using a dictionary, each key acts as a column name
data = {
  "Name": ["Matt", "John", "Kat"],
  "Age": [34,45,22]
}

df1 = pd.DataFrame(data)
print(df1)

#row labels are optional, by default pandas assigns an ordered index as row label (0,1 and so on)
#but we can provide them using the index argument
df1 = pd.DataFrame(data, index=["p1","p2","p3"])
print(df1)

#we can also set the row names by assigning the index attribute of the created dataframe
df1.index = ["per1","per2","per3"]
print(df1)

   Name  Age
0  Matt   34
1  John   45
2   Kat   22
    Name  Age
p1  Matt   34
p2  John   45
p3   Kat   22
      Name  Age
per1  Matt   34
per2  John   45
per3   Kat   22


- Example of creating a DataFrame using a numpy array