Contents :
	1. Data structures: Series, DataFrame, Index
	2. Reading and Writing data: read_csv, read_excel, read_sql,to_csv, to_excel, tosql
	3. Data selection and Filtering: loc, iloc, at, iat, boolean indexing, query
	4. Data manupulation: drop, dropna, fillna, replace, rename, sort_values, sort_index
	5. Grouping and aggregation: groupby, mean, sum, count, min, max, agg
	6. Merging and joining data: merge, join, concat
	7. Reshaping and pivoting data: stack, unstack, melt, pivot_table
	8. Handling missing data: isna, notna, interpolate, ffill, bfill
	9. Appling functions to data: apply, applymap, map
	10. Visualizing data: plot, hist, scatter, boxplot, heatmap
	11. Time series data: data_range, resample, rolling, shift, diff
	12. Categorical data: Categorical, cut, qcut
	13. Text data: str, contains, extract, replace, split
	14. MultiIndexing: creating, selecting, slicing, indexing, setting levels	
	15. Input and output functions: reading and writing data from/to files, databases, and web services
	16. Performance optimization: vectorization, broadcasting, Cython, Numba, Dask
	
*** 1. Data structures ***
Series: 	
	-> A series is a one-dimentional labeled array that can hold any data, such as integers, strings, floats, or even Python objects.
	-> A series consists of two main parts: The data and the index.
	-> A data is one-dimentional array of values, and the index is the set of labels.
	-> The index can be a list of integers, strings, or any other datatype.
	-> You can create a Series by passing a list, tuple, or dictionary of values to the Series constructor.
	-> Example:
		import pandas as pd
		data = [1,2,3,4,6]
		s = pd.Series(data)
	-> By default, the index values are [0,1,2,3,4]. We can access the values and index using index of a Series using the 'values' and 'index' attributes.
	-> It is also possible to perform operations such as indexing, slicing, filtering, sorting, and aggregating. 

In [17]:
#Series

#Basic
import pandas as ps
dic={'Name':'Naveen','Age':23,'Skills':['Python','Java','SQL','Html','CSS'],'Language':['Telugu','English']}
s=ps.Series(dic)
print(s)



Name                                Naveen
Age                                     23
Skills      [Python, Java, SQL, Html, CSS]
Language                 [Telugu, English]
dtype: object


In [20]:
#Accesing values in series 
l=[1,2,2,4,5,6]
t=ps.Series(l)
print(t[0]) #get 0th index element
print(t[2:4])   #get elements form 2nd index to 3rd index 


1
2    2
3    4
dtype: int64


In [9]:
#Filtering values in a Series 
import pandas as ps
l=[1,2,4,4,5,6]
t=ps.Series(l)
print(t[t%2==0])

1    2
2    4
3    4
5    6
dtype: int64


In [13]:
# Sorting values in the series
import pandas as ps
l=[10,9,20,63,20,27,90]
t=ps.Series(l)
print(t.sort_values())
print(t)

1     9
0    10
2    20
4    20
5    27
3    63
6    90
dtype: int64
0    10
1     9
2    20
3    63
4    20
5    27
6    90
dtype: int64


In [16]:
# Change the index of series 
import pandas as ps 
data = [10,20,30,40,50]
index = ['a','b','c','d','e']
x=ps.Series(data,index=index)
print(x)

a    10
b    20
c    30
d    40
e    50
dtype: int64


In [4]:
# Arithmetic operations
import pandas as ps
s1=ps.Series([1,2,3,4,5])
s2=ps.Series([10,20,30,40])
s3=s1+s2
print(s3)

0    11.0
1    22.0
2    33.0
3    44.0
4     NaN
dtype: float64


In [6]:
# Series of dates
import pandas as ps
s=ps.Series(['2023-08-9','2022-09-3','2001-08-21'],dtype='datetime64[ns]')
print(s)

0   2023-08-09
1   2022-09-03
2   2001-08-21
dtype: datetime64[ns]


In [7]:
# Series with custom name
import pandas as ps
s=ps.Series([1,2,3,4,6],name='Naveen')
print(s)


0    1
1    2
2    3
3    4
4    6
Name: Naveen, dtype: int64


DataFrame:
	-> A dataframe in pandas is a two-dimentional, size-mutable, tabular data structure with labeled axis(rows and columns).
	-> It is similar to a spreadsheet in Excel or a SQL table, but with more powerful indexing and data manupulation capabilities.
	-> DataFrames can be created from a variety of sources, including CSV files, Excel files, SQL databases and python dictionaries.
	-> Once you have a dataframe, you can use variety of methods to manupulate and analyze the data, such as filtering, sorting, grouping, and aggregating.
	-> Overall, dataframes are a key data structure in pandas and are widely used in data analysis and data science workflows.
	-> Example:
		import pandas as pd
		data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,22,19],'Score':[8,10,7,10]}
		df = pd.DataFrame(data)
		print(df)

In [15]:
# DataFrame basic example
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,22,19],'Score':[8,10,7,10]}
df = pd.DataFrame(data)
df=df.groupby('Name')['Age'].mean()
print(df)

Name
Krisha    22.0
Naveen    23.0
Ram       21.0
Tharun    19.0
Name: Age, dtype: float64


In [4]:
# Dataframe group by gender.
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,22,19],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
df=df.groupby('Gender')['Age'].mean()
print(df)

Gender
Female    20.0
Male      22.5
Name: Age, dtype: float64


Index : We can use 'index' attribute to access the index of a dataframe or series. The index is a datastructure that repersents the labels of the rows or columns in the dataframe or series.
Here's an example:

In [8]:
# Dataframe
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,22,19],'Gender':['Male','Female','Male','Female']}
df=pd.DataFrame(data)
print(df.index)
# create series
s = pd.Series(['a','b','c','d'],index=[10,20,30,40])
print(s.index) 

RangeIndex(start=0, stop=4, step=1)
Index([10, 20, 30, 40], dtype='int64')


*** 2. Reading and Writing data ***
In pandas, We can read and write data in serveral file formats including CSV, Excel, SQL databases, and more. 
Here are some examples on each file formats.

In [2]:
# Reading and Writing data into CSV file.
import pandas as pd
df = pd.read_csv('data_csv.csv')
df.to_csv('Display_csv.csv',index=False) #It is going to create csv file and write data to the file. To avoid index field in this file we can use index="false". 
df1 = pd.read_csv('Display_csv.csv')
print(df1) 

      Name  Age  Gender
0   Naveen   23    Male
1  Krishna   22    Male
2    Sneha   22  Female
3      Sai   21  Female
4      Ram   24    Male


In [13]:
# Reading and writing data into excel file.
import pandas as ps 
df = ps.DataFrame({'Column 1':[1,2,3,4,5],'Column 2':['A','B','C','D','E']})
df.to_excel('data_excel.xlsx',index=False)
dr = ps.read_excel('data_excel.xlsx')
print(dr)

   Column 1 Column 2
0         1        A
1         2        B
2         3        C
3         4        D
4         5        E


In [10]:
# Reading and writing data into SQL database.
import pandas as ps
import sqlite3 as sql
pass


*** 3. Data selection and Filtering (loc, iloc, at, iat, boolean indexing, query) ***
-> In pandas, you can select and filter data using 'loc' and 'iloc' methods.
***Loc***
In pandas,'loc' is used to select data by label. We can use it to select rows and columns in a Dataframe.
Syntax :- df.loc[row_indexer,column_indexer]
Examples are as shown in below,

In [18]:
# Selecting single row using loc label 1.
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,22,19],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
print(df.loc[1])

Name         Ram
Age           21
Gender    Female
Name: 1, dtype: object


In [24]:
# selecting multiple rows using loc.
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,22,19],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
print(df.loc[:2,['Age','Gender']])


   Age  Gender
0   23    Male
1   21  Female
2   22    Male


In [7]:
# Selecting new column based on existing columns.
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
df.loc[:,'Age_group'] = pd.cut(df['Age'], bins=[0,20,30,40],labels=['<20','20-30','30-40'])
print(df)
print("Filtering data using isin() function")
print(df[df['Age'].isin(range(10,20))])


     Name  Age  Gender Age_group
0  Naveen   23    Male     20-30
1     Ram   21  Female     20-30
2  Krisha   18    Male       <20
3  Tharun   40  Female     30-40
Filtering data using isin() function
     Name  Age Gender Age_group
2  Krisha   18   Male       <20


*** Iloc ***
Iloc is a method in pandas that is used to select rows and columns by integer position. It is works by specifying the integer position of the rows and columns you want to select. Here's an example of how you can use 'iloc' to select a specific row and column in pandas dataframe.

In [23]:
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
value = df.iloc[1,2] #selects 2nd(1st index) row and 3rd(2nd index) column
print(value)
val = df.iloc[0:2,1:3] #selects 1st and 2nd row with 2nd and 3rd column
print(val)

Female
   Age  Gender
0   23    Male
1   21  Female


*** at ***
at is used to get a single value for a column/row label pair. Here's the syntax,
Syntax :- Dataframe.at[row_label,column_label]
Example :- 

In [2]:
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
value = df.at[2,'Name']
print(value)

Krisha


*** iat ***
iat is used to get single value for a column/row integer pair. Here's the syntax,
Syntax :- Dataframe.iat[row_index,column_index]
Example :- 

In [4]:
import pandas as pd 
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
value = df.iat[2,0]
print(value)

Krisha


*** Boolean Indexing ***
Boolean idexing in pandas is a way to filter a DataFrame or Series based on a certain condition. It involves using boolean expression to create a mask ,which is then used to select only the rows or columns that meet the condition.
Syntax :- 
    DataFrame[condition]
Example :-

In [5]:
import pandas as pd 
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,21,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
mask = df['Age']>20
df = df[mask]
print(df)

     Name  Age  Gender
0  Naveen   23    Male
1     Ram   21  Female
3  Tharun   40  Female


*** query ***
In pandas, 'query' is a method used to filter a DataFrame based on certain condition. It involves using a string expression to create a query, which is then used to select only the rows that meet the condition.
Syntax :- 
    DataFrame.query('< condition in string format >')
Example :-

In [6]:
import pandas as pd 
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,26,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
df = df.query('Age>25')
print(df)

     Name  Age  Gender
1     Ram   26  Female
3  Tharun   40  Female


*** 4. Data manupulation ***
Modifying perticular is done by following methods in pandas.
1. drop
2. dropna
3. fillna
4. replace
5. rename
6. sort_values
7. sort_index 

*** Drop ***
In pandas, 'drop' is a method used to remove rows or columns from a dataframe. It involves specifying the labels of rows and columns that you want to remove.
Syntax :-
    DataFrame.drop( < list of column labels >, axis = < row or column number >)
    where axis parameter is,
        0 - row 
        1 - column
Note :- The drop method dosen't modify the original dataframe. Instead, it returns a new dataframe with specified rows or conditions.
    But, if you really want to modify the original dataframe, you need to assign the result of the 'drop' method back to the original dataframe.
Example :-

In [13]:
import pandas as pd 
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,26,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
print("Droped by columns:")
df1 = df.drop(['Name','Age'],axis=1)
print(df1)
print()
print("Droped by rows")
df2 = df.drop([1,2],axis=0)
print(df2)

Droped by columns:
   Gender
0    Male
1  Female
2    Male
3  Female

Droped by rows
     Name  Age  Gender
0  Naveen   23    Male
3  Tharun   40  Female


*** dropna ***
In pandas, 'dropna' is a method used to remove missing or null values from a DataFrame. It involves removing any rows or columns that contain missing values.
Syntax :- 
    DataFrame.dropna()  or  Dataframe.dropna(axis=< 0 for row or 1 for column>) or  DataFrame.dropna(thresh=<number>)
    Where number specifies minimum number of non null values a row or column must have to be kept. 
    For example, if you only want to keep rows that have atleast 3 non null values, you could use --
        Dataframe.dropna(thresh=3)
Note :- Dropna also doesn't modify data.
Example :-
    

In [21]:
import pandas as pd
data = {'Name':['Naveen','Ram','Krishna','Tharun'],'Age':[23,None,18,40],'Gender':['Male',None,'Male',None]}
df = pd.DataFrame(data)
df1 = df.dropna()
print(df1) # Empty because there is no rows that are not null.
print()
df2 = df.dropna(axis=1)
print(df2) 
print()
df3 = df.dropna(thresh=2) # removes any rows with less than 2 non null values.
print(df3)

      Name   Age Gender
0   Naveen  23.0   Male
2  Krishna  18.0   Male

      Name
0   Naveen
1      Ram
2  Krishna
3   Tharun

      Name   Age Gender
0   Naveen  23.0   Male
2  Krishna  18.0   Male
3   Tharun  40.0   None


*** fillna ***
This method is used to replace missing values in dataframe or series with a specified value or method.
Syntax :- 
    dataFrame.fillna(<value to be in missing place>,inplace=<True or Flase>)
Here,
    if inplace is True the original dataframe is modified else, it returns a copy of dataframe and that has to be stored in another dataframe.
we can also use ffill and bfill methods.
where, 
    ffill - Fills the data with next non missing values in the series.
    bfill - Fills the data with previous non missing value in the series.
Example :-

In [29]:
import pandas as pd
data = {'Name':['Naveen','Ram','Krishna','Tharun'],'Age':[23,None,18,40],'Gender':['Male',None,'Male',None]}
df = pd.DataFrame(data)
df1 = df.fillna(method='ffill') #Farward filling. 
print(df1)
print()
df2 = df.fillna(method='bfill') #Backward filling.
print(df2)
df3 = df.fillna('Unknown',inplace=False) # fill all empty values with 'Unknown' 
print()
print(df)


      Name   Age Gender
0   Naveen  23.0   Male
1      Ram  23.0   Male
2  Krishna  18.0   Male
3   Tharun  40.0   Male

      Name   Age Gender
0   Naveen  23.0   Male
1      Ram  18.0   Male
2  Krishna  18.0   Male
3   Tharun  40.0   None

      Name      Age   Gender
0   Naveen     23.0     Male
1      Ram  Unknown  Unknown
2  Krishna     18.0     Male
3   Tharun     40.0  Unknown


*** Replace ***
It is used to replace specified value in a dataframe.
Syntax :-
    DataFrame.replace(<old value>,<new value>,inplace=<boolean>)
Example :-

In [31]:
import pandas as pd
data = {'Name':['Naveen','Ram','Krishna','Tharun'],'Age':[23,27,18,40],'Gender':['M','M','M','M']}
df = pd.DataFrame(data)
df1 = df.replace('M','Male')
print(df1)

      Name  Age Gender
0   Naveen   23   Male
1      Ram   27   Male
2  Krishna   18   Male
3   Tharun   40   Male


*** rename ***
It is used to rename names of columns or index labels in a dataframe.
Syntax :-
    DataFrame.replace(columns={<old value>:<new value>},inplace=<boolean>)
Note :- Not possible to change rows 
Example :-

In [32]:
import pandas as pd
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,26,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data)
df1 = df.rename(columns={'Name':'Username','Gender':'Sex'}) # changing columns
print(df1)

  Username  Age     Sex
0   Naveen   23    Male
1      Ram   26  Female
2   Krisha   18    Male
3   Tharun   40  Female


*** Sort_values ***
We can sort a dataframe by one or more columns.
Syntax :- 
    DataFrame.sort_values(by=<list of columns>,inplace=<boolean>)
Example :-


In [37]:
import pandas as pd 
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,26,18,40],'Gender':['Male','Female','Female','Female']}
df = pd.DataFrame(data)
df1 = df.sort_values(by=['Gender','Age'])
print(df1)

     Name  Age  Gender
2  Krisha   18  Female
1     Ram   26  Female
3  Tharun   40  Female
0  Naveen   23    Male


*** sort_index ***
It is used to sort a dataframe by using index.
Syntax :-
    DataFrame.sort_index(inplace=<Boolean>)
Example :-

In [38]:
import pandas as pd 
data = {'Name':['Naveen','Ram','Krisha','Tharun'],'Age':[23,26,18,40],'Gender':['Male','Female','Male','Female']}
df = pd.DataFrame(data,index=[2,3,0,1])
df1 = df.sort_index()
print(df1)

     Name  Age  Gender
0  Krisha   18    Male
1  Tharun   40  Female
2  Naveen   23    Male
3     Ram   26  Female


*** Head ***
Selecting first n rows of a dataframe.
Syntax :-
    df.head(n) #where n is any number of rows.

*** Tail ***
Select last n rows of a dataframe.
Syntax :-
    df.tail(n) 

*** Drop ***
Removing rows and columns of a dataframe. This method returns new dataframe without the rows that are droped from previous dataframe.
Note :- The original dataframe doesnot modify by the drop method.
The axis parameter specifies to drop column(axis = 1) or row(axis = 1).  
Syntax :-
    df.drop(<column name>,axis = <1/0>)

*** Fillna ***
Fillna is method in pandas that allows you to fill missing or NaN (Not a Number) values in a dataframe with a specified value or method.
Note :- It returns the new dataframe that contains the dataframe with the missing values filled with specified value or method.
Here, we have farword filling(filling empty value by its previous value) and backward filling(filling empty value by its next value).  
Syntax :-
    df.fillna(<value to be placed in missing value>) or
    df.fillna(method=<ffill/bfill>)

*** Groupby ***
Groupby allows you to group data based on a variable and perform aggregate functions on the groups.
Syntax :- 
    df.groupby(<rows to be fetched>).<set of attributes to perform operation>.<method (operation) >()
Some of the methods are - max(),mean(),min(),size(),sum(),count(),filter() etc

*** Pivot_table ***
It is a way of summerizing data in a tabular format. Pandas provides a 'pivot_table' function that allows you to create pivot tables from the dataframe.
Syntax :-
    pd.pivot_table(<dataframe>, values = <set of attributes to perform operation>, index = <set of attributes to fetch>, aggfunc = <any operation like mean>)
Example :-
    df = pd.pivot_table(df, values = 'salary', index = ['Gender','Age'], aggfunc = 'mean')