## Pandas
- Pandas is a Python library used for working with data sets
- It has functions for analyzing, cleaning, exploring, and manipulating data
- The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis"
- __pip install pandas__

### What is Pandas?
- Pandas is Python Data Analysis Library
- It is a very important library used for working with data sets
- It is used for handling data that convert data into rows and columns
- It provide functions to load the data
- It is built on top of the NumPy package

### Importance of Pandas
- Pandas allows us to analyze big data and make conclusions based on statistical theories
- Pandas can clean messy data sets, and make them readable and relevant
- Relevant data is very important in data science.

### What is data frame in Pandas?
- A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet
- DataFrames are one of the most common data structures used in modern data analytics
- They are a flexible and intuitive way of storing and working with data.
- __DataFrame = {"Column_name":[value]}__ <br>
  &emsp; DataFrame - representing data frame dictionary name <br>
  &emsp; Column_name - representing column name, that written in double quotation  <br>
  &emsp; __:__ - column name and value separated by '__:__' <br>
  &emsp; value - values wrritten in '[]', values are one kind of list, each values saperated by '__,__' <br>
- Using DataFrame() function create an empty data frame

In [3]:
import pandas as pd

In [4]:
df = pd.DataFrame()
type(df)

pandas.core.frame.DataFrame

In [3]:
std_data = {"Roll No" : [101, 102, 103, 104, 105], 
          "Subject-1 Marks" : [68, 78, 91, 85, 66], 
          "Subject-2 Marks" : [83, 75, 84, 90, 92], 
          "Subject-3 Marks" : [84, 93, 75, 86, 84]}

In [4]:
df = pd.DataFrame(std_data)

In [5]:
df

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks
0,101,68,83,84
1,102,78,75,93
2,103,91,84,75
3,104,85,90,86
4,105,66,92,84


In [6]:
product = {"Product No" : [201, 202, 203, 204, 205], 
          "Product Name" : ["Pen", "Pencil", "Scale", "Jel Pen", "Rounder"], 
          "Price" : [10, 5, 10, 15, 20], 
           "QTY" : [5, 1, 4, 3, 2], 
          "Manufacturing" : ["NOV-2023", "OCT-2023", "JUL-2023", "SEP-2023", "JUL-2023"]}

In [7]:
pdf = pd.DataFrame(product)

In [8]:
pdf

Unnamed: 0,Product No,Product Name,Price,QTY,Manufacturing
0,201,Pen,10,5,NOV-2023
1,202,Pencil,5,1,OCT-2023
2,203,Scale,10,4,JUL-2023
3,204,Jel Pen,15,3,SEP-2023
4,205,Rounder,20,2,JUL-2023


### head() function
- The head() function is primarily used to view the first n rows of a dataset
- It helps users quickly get an overview of the data and its structure
- Default value of head() function is 5

In [8]:
df.head(2)

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks
0,101,68,83,84
1,102,78,75,93


### tail() function
- The tail() function is primarily used to view the last n rows of a dataset
- It helps users quickly verifying data
- Default value of tail() function is 5

In [10]:
df.tail(2)

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks
3,104,85,90,86
4,105,66,92,84


### columns function
- columns is an attribute that provides access to the column labels of a data frame
- It returns an Index object representing the names of the columns in the DataFrame

In [11]:
df.columns

Index(['Roll No', 'Subject-1 Marks', 'Subject-2 Marks', 'Subject-3 Marks'], dtype='object')

##### List of a particular column data 

In [12]:
df['Roll No']

0    101
1    102
2    103
3    104
4    105
Name: Roll No, dtype: int64

### dtypes function
- A data type object describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted
- It returns dtype of each column

In [13]:
df.dtypes

Roll No            int64
Subject-1 Marks    int64
Subject-2 Marks    int64
Subject-3 Marks    int64
dtype: object

### values function
- The values property is used to get a numpy representation of the dataframe and returns a view object
- It return numpy array

In [14]:
df.values

array([[101,  68,  83,  84],
       [102,  78,  75,  93],
       [103,  91,  84,  75],
       [104,  85,  90,  86],
       [105,  66,  92,  84]], dtype=int64)

### sum() function
- The sum() method adds all values in each column and returns the sum for each column
- It is used to return the sum of the values for the requested axis by the user
- If the input value is an index axis, then it will add all the values in a column and works same for all the columns

In [15]:
df.sum()

Roll No            515
Subject-1 Marks    388
Subject-2 Marks    424
Subject-3 Marks    422
dtype: int64

In [16]:
df['Subject-1 Marks'].sum()

388

### max() function
- max() method finds the maximum of the values in the object and returns it
- If the input is a series, the method will return a scalar which will be the maximum of the values in the series

In [17]:
df.max()

Roll No            105
Subject-1 Marks     91
Subject-2 Marks     92
Subject-3 Marks     93
dtype: int64

In [18]:
df["Subject-2 Marks"].max()

92

### min() function
- min() method returns a series with the minimum value of each column
- By specifying the column axis ( axis='columns' ), the min() method searches column-wise and returns the minimum value for each row

In [19]:
df.min()

Roll No            101
Subject-1 Marks     66
Subject-2 Marks     75
Subject-3 Marks     75
dtype: int64

In [20]:
df["Subject-3 Marks"].min()

75

### Insert column in table
- A single column is nothing but a pandas series – that is a 1D homogenous array
- DataFrame.insert()  gives the freedom to add a column at any position we like and not just at the end
- It also provides different options for inserting the column values.

#### add column in existing table at last

In [9]:
df["Subject-4 Marks"] = [86, 80, 92, 85, 95]

In [10]:
df

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks,Subject-4 Marks
0,101,68,83,84,86
1,102,78,75,93,80
2,103,91,84,75,92
3,104,85,90,86,85
4,105,66,92,84,95


#### add column for total of each subject in existing table

In [11]:
df['Total'] = df['Subject-1 Marks'] + df['Subject-2 Marks'] + df['Subject-3 Marks'] + df['Subject-4 Marks']

In [12]:
df

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks,Subject-4 Marks,Total
0,101,68,83,84,86,321
1,102,78,75,93,80,326
2,103,91,84,75,92,342
3,104,85,90,86,85,346
4,105,66,92,84,95,337


#### add column for total percentage for student marks in existing table

In [13]:
df['Total Percentage'] = (df['Total'] * 100) / 400

In [14]:
df

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks,Subject-4 Marks,Total,Total Percentage
0,101,68,83,84,86,321,80.25
1,102,78,75,93,80,326,81.5
2,103,91,84,75,92,342,85.5
3,104,85,90,86,85,346,86.5
4,105,66,92,84,95,337,84.25


#### add column in existing table between two columns

In [15]:
a = [98, 89, 79, 83, 84]
df.insert(5, "Subject-5 Marks", a)

In [16]:
df

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks,Subject-4 Marks,Subject-5 Marks,Total,Total Percentage
0,101,68,83,84,86,98,321,80.25
1,102,78,75,93,80,89,326,81.5
2,103,91,84,75,92,79,342,85.5
3,104,85,90,86,85,83,346,86.5
4,105,66,92,84,95,84,337,84.25


### Remove column in table
- drop() method used to remove one or more columns from DataFrame
- It returns a new DataFrame with the specified rows or columns removed

#### remove column in existing table

In [29]:
df = df.drop(columns=['Subject-5 Marks'])

In [34]:
df

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks,Subject-4 Marks,Total,Total Percentage
0,101,68,83,84,86,319,79.75
1,102,78,75,93,80,339,84.75
2,103,91,84,75,92,325,81.25
3,104,85,90,86,85,347,86.75
4,105,66,92,84,95,326,81.5


In [36]:
df.drop(columns=['Subject-4 Marks'], inplace=True)

In [37]:
df

Unnamed: 0,Roll No,Subject-1 Marks,Subject-2 Marks,Subject-3 Marks,Total,Total Percentage
0,101,68,83,84,319,79.75
1,102,78,75,93,339,84.75
2,103,91,84,75,325,81.25
3,104,85,90,86,347,86.75
4,105,66,92,84,326,81.5


### What is index in pandas?
- The index of a DataFrame is a series of labels that identify each row
- The labels can be integers, strings, or any other hashable type
- The index is used for label-based access and alignment, and can be accessed or modified using this attribute

In [18]:
df = pd.DataFrame({"angle" : [0, 3, 4], "degrees" : [360, 180, 360]}, 
                 index = ['circle', 'triangle', 'rectangle'])

In [19]:
df

Unnamed: 0,angle,degrees
circle,0,360
triangle,3,180
rectangle,4,360


#### write a program to count a word in a given string

In [2]:
string = "Gujarat Vidyapith"
res = len(string.split())
print (str(res))

2


## How create NaN values dataframe?
- Create NaN values dataframe using numpy
- Also use np.nan each time want to add a NaN value in dataframe

In [7]:
stddf = {"Roll No" : [4001, np.nan, 4003, 4004],
         "Name" : ['Keyuri', 'Yashvi', np.nan, 'Jainil']}

In [6]:
df = df = pd.DataFrame(stddf)

In [7]:
df

Unnamed: 0,Roll No,Name
0,4001.0,Keyuri
1,,Yashvi
2,4003.0,
3,4004.0,Jainil


#### Create a dataframe consisting two column A and B. A has a value 2, 3, null, 4, 2, 1 and B has a value 2, 4, 2, 1, 3, null

In [8]:
data = {"A" : [2, 3, np.nan, 4, 2, 1], 
        "B" : [2, 4, 2, 1, 3, np.nan]}

In [9]:
df = pd.DataFrame(data)

In [10]:
df

Unnamed: 0,A,B
0,2.0,2.0
1,3.0,4.0
2,,2.0
3,4.0,1.0
4,2.0,3.0
5,1.0,


#### Create 3 dataframe and explian concat(), reset_index(), sort_values(), drop()3

In [11]:
zx = {"a" : [0, 1, 2, 3, 4,5 ], 
      "b" : ['a', 'd', 'g', 'j', 'm', 'p']}
dfzx = pd.DataFrame(zx)

In [12]:
dfzx

Unnamed: 0,a,b
0,0,a
1,1,d
2,2,g
3,3,j
4,4,m
5,5,p


In [13]:
zy = {"a" : [0, 1, 2, 3, 4,5 ], 
      "b" : ['b', 'e', 'h', 'k', 'n', 'q']}
dfzy = pd.DataFrame(zy)

In [14]:
dfzy

Unnamed: 0,a,b
0,0,b
1,1,e
2,2,h
3,3,k
4,4,n
5,5,q


In [15]:
zz = {"a" : [0, 1, 2, 3, 4,5 ], 
      "b" : ['c', 'f', 'i', 'l', 'o', 'r']}
dfzz = pd.DataFrame(zz)

In [16]:
dfzz

Unnamed: 0,a,b
0,0,c
1,1,f
2,2,i
3,3,l
4,4,o
5,5,r


#### concat() function
- concat() function used to Concatenating two dataframe
- Concatenate multiple dataframes using pandas.concat()
- For concatigation, columns and dtype of columns are must be same for each dataframe

In [17]:
pd.concat((dfzx, dfzy, dfzz))

Unnamed: 0,a,b
0,0,a
1,1,d
2,2,g
3,3,j
4,4,m
5,5,p
0,0,b
1,1,e
2,2,h
3,3,k


#### reset_index() function 
- reset_index() method used to reset the index back to the default 0, 1, 2 etc indexes
- By default it will keep old indexs in a column index, to avoid this, used drop parameter
- It is useful when index is need to be as column or when index is meaningless, needs to be reset to defalut index before another operations

In [87]:
pd.concat((dfzx, dfzy, dfzz)).reset_index()

Unnamed: 0,index,a,b
0,0,0,a
1,1,1,d
2,2,2,g
3,3,3,j
4,4,4,m
5,5,5,p
6,0,0,b
7,1,1,e
8,2,2,h
9,3,3,k


#### sort_values() function
- sort_values() function sorts values in a dataframe along the selected axis and returns a dataframe with sorted values or none
- ignore_index parameter enables you to control the index of the new output pandas object
- By default, this is set to ignore_index = False

In [88]:
pd.concat((dfzx, dfzy, dfzz)).reset_index().sort_values('index', ignore_index=True)

Unnamed: 0,index,a,b
0,0,0,a
1,0,0,c
2,0,0,b
3,1,1,d
4,1,1,f
5,1,1,e
6,2,2,i
7,2,2,h
8,2,2,g
9,3,3,k


#### drop() function : 
- drop() used to remove one or more rows or columns from a DataFrame
- axis=0 (or axis='rows') is horizontal axis
- axis=1 (or axis='columns') is vertical axis

In [89]:
pd.concat((dfzx, dfzy, dfzz)).reset_index().sort_values('index', ignore_index=True).drop('index', axis=1)

Unnamed: 0,a,b
0,0,a
1,0,c
2,0,b
3,1,d
4,1,f
5,1,e
6,2,i
7,2,h
8,2,g
9,3,k
