<a href="https://colab.research.google.com/github/sijuswamy/AIML_Files/blob/main/Pandas_for_Data_Analysis(AIML).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction to `Pandas`

##1. The pandas Series
The `Series` is the object of the `pandas` library designed to represent one-dimensional data structures, similar to an array but with some additional features. A series consists of two components.
* One-dimensional data (Values)
* Index

>Syntax

```{python}
pd.Series([values],name="column name")
```

Example:

```{python}
A=pd.Series([10,20,30,40],name="Marks")
```

In [None]:
# loading libraries
import numpy as np
import pandas as pd
pd.set_option("display.precision", 2)

In [None]:
ser1=pd.Series([85,90,70,80],name='Marks')

In [None]:
print(ser1)

0    85
1    90
2    70
3    80
Name: Marks, dtype: int64


**Series creation: Using a one-dimensional ndarray**
The following example creates a Series of the 1st 5 odd numbers.

In [None]:
odd_array=np.arange(1,10,2)
ser1=pd.Series(odd_array,name="odd_Number",dtype=int)
print(ser1)

In [None]:
print(odd_array)

[1 3 5 7 9]


In [None]:
print(ser1)

0    1
1    3
2    5
3    7
4    9
Name: odd_Number, dtype: int64


## Giving custome index by passing index to series list

In [None]:
odd_array=np.arange(1,5,2)
ser2=pd.Series(odd_array,name="Odd_numbers",index=['first term','second term'],dtype=int)
print(ser2)

first term     1
second term    3
Name: Odd_numbers, dtype: int64


If you want to individually see the two arrays that make up this series, you can call index and values attributes of the series.

In [None]:
ser2.index

Index(['first term', 'second term'], dtype='object')

In [None]:
print(ser2.values)

[1 3]


**Series creation: Using a Python list**
To create a `series` using a Python list, you can just pass a list to the data parameter of the `Series()` class constructor.

In [None]:
even_array=[2,4,6,8]
ser3=pd.Series(even_array,index=['first term','second term','third term','fourth term'],name="Marks")
ser3

first term     2
second term    4
third term     6
fourth term    8
Name: Marks, dtype: int64

**Series creation: Using a Python dictionary**

To create a `series` using a Python dictionary, you can just pass a dictionary to the data parameter of the `Series()` class constructor. This time, the arrays of the `index` and `values` are filled with the corresponding `keys` and `values` of the dictionary.

In [None]:
ser4=pd.Series({'first':1,'second':2,'third':3},name="Numbers",dtype=float)
ser4

first     1.0
second    2.0
third     3.0
Name: Numbers, dtype: float64

## Slicing & Mathematical operations on series

In [None]:
ser4[:2]# listing first 2 elements

first     1.0
second    2.0
Name: Numbers, dtype: float64

In [None]:
ser4.sum() # finding sum of values in the series

6.0

In [None]:
np.sqrt(np.var(ser4))# standard deviation

0.816496580927726

## The pandas DataFrame
A `DataFrame` is a two-dimensional data structure composed of rows and columns — exactly like a simple spreadsheet or a SQL table. Each column of a DataFrame is a `pandas` Series. These columns should be of the same length, but they can be of different data types — float, int, bool, and so on. DataFrames are both value-mutable and size-mutable (`Series`, by contrast, is only value-mutable, not size-mutable. The length of a `Series` cannot be changed although the values can be changed). This lets us perform operations that would alter values held within the `DataFrame` or add/delete columns to/from the `DataFrame`.

A `DataFrame` consists of three components.

* Two-dimensional data (Values)
* Row index
* Column index

>Syntax

```{python}
df=pd.DataFrame(two_dim_array,columns=[column names])
```

In [None]:
# example
student_data=np.array([["Sojan","Litty"],[20,35]])
df=pd.DataFrame(student_data.T,columns=["Name","Marks"]) # .T operation ensure right structure of the data frame

In [None]:
print(df)

    Name Marks
0  Sojan    20
1  Litty    35


## The simplest way to create a `Dataframe`

**DataFrame creation: Using a dictionary lists**
Suppose we have marks of maths, physics and chemistry of three students

 John, Adom, Muhammad.
 
The marks are

Maths: 45, 46, 49

Physics: 50, 49,48

Chemistry: 33, 45, 40.


We can reprecent this data as a dataframe with four column of values and one easy index (admission number as reference). The python code is:


In [None]:
#method 1 using array
d1=np.array([["john","Adom","muhammad"],[45,46,49],[50,49,48],[33,45,40]])
df1=pd.DataFrame(d1.T,columns=["Names","Maths","physics","Chemistry"])
df1

Unnamed: 0,Names,Maths,physics,Chemistry
0,john,45,50,33
1,Adom,46,49,45
2,muhammad,49,48,40


In [None]:
#method-2 using dictionary
data={'name':['John', 'Adom','Muhammad'],'maths':[45, 46, 49],'physics':[50,49,48],'chemistry':[33, 45,40]}
marklist=pd.DataFrame(data,index=['MG01','MG02','MG03'])
marklist

Unnamed: 0,name,maths,physics,chemistry
MG01,John,45,50,33
MG02,Adom,46,49,45
MG03,Muhammad,49,48,40


In [None]:
marklist.loc['MG04']=['Ravi',34,50,23]
marklist

Unnamed: 0,name,maths,physics,chemistry
MG01,John,45,50,33
MG02,Adom,46,49,45
MG03,Muhammad,49,48,40
MG04,Ravi,34,50,23


## Adding a new column

In [None]:
marklist['Total Marks']=marklist['maths']+marklist['physics']+marklist['chemistry']
marklist

Unnamed: 0,name,maths,physics,chemistry,Total Marks
MG01,John,45,50,33,128
MG02,Adom,46,49,45,140
MG03,Muhammad,49,48,40,137
MG04,Ravi,34,50,23,107


## Append a `DataFrame`

Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

>syntax `DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)`

In [None]:
# creating a new dataframe
data2={'name':['Alice', 'Bob','Charlie'],'maths':[41, 43, 40],'physics':[50,49,48],'chemistry':[33, 45,40]}
marklist2=pd.DataFrame(data2,index=['MG05','MG06','MG07'])
marklist2['Total Marks']=marklist2['maths']+marklist2['physics']+marklist2['chemistry']
marklist2

Unnamed: 0,name,maths,physics,chemistry,Total Marks
MG05,Alice,41,50,33,124
MG06,Bob,43,49,45,137
MG07,Charlie,40,48,40,128


In [None]:
#append the new dataframe
Marklist_A=marklist.append(marklist2)
Marklist_A.sort_values(by="Total Marks", ascending=False)# sort the dataframe in descending order of the Total Marks
#Marklist_A

Unnamed: 0,name,maths,physics,chemistry,Total Marks
MG02,Adom,46,49,45,140
MG03,Muhammad,49,48,40,137
MG06,Bob,43,49,45,137
MG01,John,45,50,33,128
MG07,Charlie,40,48,40,128
MG05,Alice,41,50,33,124
MG04,Ravi,34,50,23,107


In [None]:
Marklist_A


Unnamed: 0,name,maths,physics,chemistry,Total Marks
MG01,John,45,50,33,128
MG02,Adom,46,49,45,140
MG03,Muhammad,49,48,40,137
MG04,Ravi,34,50,23,107
MG05,Alice,41,50,33,124
MG06,Bob,43,49,45,137
MG07,Charlie,40,48,40,128


## Joining two `DataFrames`

>Syntax: `DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)`

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

The option to join using the key columns is to use the on parameter. `DataFrame.join` always uses other's index but we can use any column in DataFrame. This method preserves the original DataFrame’s index in the result.

**Example:** In the previous marklist, let's join the internal marks as follows

In [None]:
# creating internal marks
internals={'internal':[12,15,10,23,10,25,21],'name':['John', 'Adom','Muhammad','Ravi','Alice', 'Bob','Charlie']}
internal_marks=pd.DataFrame(internals)
internal_marks

Unnamed: 0,internal,name
0,12,John
1,15,Adom
2,10,Muhammad
3,23,Ravi
4,10,Alice
5,25,Bob
6,21,Charlie


In [None]:
#joining the internal marks method 1
Marklist_A.join(internal_marks.set_index('name'),on='name')

Unnamed: 0,name,maths,physics,chemistry,Total Marks,internal
MG01,John,45,50,33,128,12
MG02,Adom,46,49,45,140,15
MG03,Muhammad,49,48,40,137,10
MG04,Ravi,34,50,23,107,23
MG05,Alice,41,50,33,124,10
MG06,Bob,43,49,45,137,25
MG07,Charlie,40,48,40,128,21


## Merging two `DataFrames`

>Syntax: `DataFrame.merge(right, how='inner', on=None, left_on=None,right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)`

In [None]:
FM=Marklist_A.merge(internal_marks, how='left', on='name')
FM

Unnamed: 0,name,maths,physics,chemistry,Total Marks,internal
0,John,45,50,33,128,12
1,Adom,46,49,45,140,15
2,Muhammad,49,48,40,137,10
3,Ravi,34,50,23,107,23
4,Alice,41,50,33,124,10
5,Bob,43,49,45,137,25
6,Charlie,40,48,40,128,21
