# Pandas Deep Dive

## Outline


* Basic Data Exploration
* Missing Values
* Duplicates
* Selecting / Dropping Columns
* String operations on whole columns
* Column splits/transformations
* MultiIndex
* Querying a Dataframe
* `DataFrame.at()` vs. `DataFrame.loc`
* Joins
* Sorting
* Grouping - Aggregation & Value Counts
* Resetting Index
* Pivot & Melt
* Column / Row Concatenation
* looping through DataFrame

In [1]:
import pandas as pd
import numpy as np

Read CSV file from system 

In [2]:
data = pd.read_csv('startup_funding.csv',thousands=',')

# Making copy of data

It is imporatant to make a copy of data first, as in future if we corrupt the df by accident then we must have a copy to read data.

In [3]:
#write code here
df = data.copy()

## Basic Data Exploration

**Getting Shape of the Data**   

dataframe.shape return the size of the array along each dimension.

In [4]:
#write code here
df.shape

(2422, 11)

There are 2422 rows and 11 columns in the data

**Getting a list of all columns in the dataframe**   

dataframe.columns return the names of all columns 

In [5]:
#write code here
df.columns

Index(['index', 'SNo', 'Date', 'StartupName', 'IndustryVertical',
       'SubVertical', 'CityLocation', 'InvestorsName', 'InvestmentType',
       'AmountInUSD', 'Remarks'],
      dtype='object')

**Checking data types of all columns**   

dataframe.dtypes return the types of each column 

In [6]:
#write code here
df.dtypes

index                 int64
SNo                   int64
Date                 object
StartupName          object
IndustryVertical     object
SubVertical          object
CityLocation         object
InvestorsName        object
InvestmentType       object
AmountInUSD         float64
Remarks              object
dtype: object

**Getting top/bottom 5 values**    

By default the **head()** will show the first five rows of the data frame.

In [7]:
#write code here
df.head()

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,


By default the **tail()** will show the last five rows of the data frame.

In [8]:
#write code here
df.tail()

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
2417,45,45,28/07/2017,Jhakaas,Consumer Internet,App-based Aggregator of Offline Businesses,Mumbai,Amen Dhyllon,Seed Funding,,
2418,46,46,28/07/2017,BigStylist,Consumer Internet,Beauty Services Marketplace,Mumbai,Info Edge (India) Ltd,Private Equity,1250000.0,
2419,47,47,28/07/2017,Gympik.com,Consumer Internet,online marketplace for discovering fitness cen...,bangalore,RoundGlass Partners,Seed Funding,,
2420,48,48,01/06/2017,Tripeur,Technology,Mobile based travel ERP platform,Bangalore,"Grace Grace Techno Ventures LLP, Rajul Garg & ...",Seed Funding,,
2421,49,49,02/06/2017,RentOnGo,eCommerce,"Online Marketplace for Renting Bikes, Electron...",Bangalore,TVS Motor Company,Private Equity,,


**Getting top N values**  

head function with specified N arguments, gets the first N rows of data from the data frame. N is actually showing the number of rows.

In [9]:
#write code here
df.head(10)

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,
5,5,5,01/07/2017,Billion Loans,Consumer Internet,Peer to Peer Lending platform,Bangalore,Reliance Corporate Advisory Services Ltd,Seed Funding,1000000.0,
6,6,6,03/07/2017,Ecolibriumenergy,Technology,Energy management solutions provider,Ahmedabad,"Infuse Ventures, JLL",Private Equity,2600000.0,
7,7,7,04/07/2017,Droom,eCommerce,Online marketplace for automobiles,Gurgaon,"Asset Management (Asia) Ltd, Digital Garage Inc",Private Equity,20000000.0,
8,8,8,05/07/2017,Jumbotail,eCommerce,online marketplace for food and grocery,Bangalore,"Kalaari Capital, Nexus India Capital Advisors",Private Equity,8500000.0,
9,9,9,05/07/2017,Moglix,eCommerce,B2B marketplace for Industrial products,Noida,"International Finance Corporation, Rocketship,...",Private Equity,12000000.0,


**Getting Bottom N values**  

tail function with specified N arguments, gets the last N rows of data from the data frame. N is actually showing the number of rows.

In [10]:
#write code here
df.tail(10)

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
2412,40,40,25/07/2017,Byju’s,Consumer Internet,Mobile Learning App,Bangalore,Tencent Holdings,Private Equity,35000000.0,
2413,41,41,26/07/2017,Creator’s Gurukul,Others,Co-Working Space Provider,New Delhi,Yuvraj Singh,Seed Funding,,
2414,42,42,26/07/2017,Fab Hotels,Consumer Internet,Budget hotels brand & Aggregator Platform,New Delhi,Goldman Sachs,Private Equity,25000000.0,
2415,43,43,26/07/2017,ThinkerBell,Consumer Internet,Assisted Learning Startup,Bangalore,"Indian Angel Network, Anand Mahindra",Seed Funding,200000.0,
2416,44,44,27/07/2017,1mg,eCommerce,Online Pharmacy,Gurgaon,"HBM Healthcare Investments, Maverick Capital V...",Private Equity,15000000.0,
2417,45,45,28/07/2017,Jhakaas,Consumer Internet,App-based Aggregator of Offline Businesses,Mumbai,Amen Dhyllon,Seed Funding,,
2418,46,46,28/07/2017,BigStylist,Consumer Internet,Beauty Services Marketplace,Mumbai,Info Edge (India) Ltd,Private Equity,1250000.0,
2419,47,47,28/07/2017,Gympik.com,Consumer Internet,online marketplace for discovering fitness cen...,bangalore,RoundGlass Partners,Seed Funding,,
2420,48,48,01/06/2017,Tripeur,Technology,Mobile based travel ERP platform,Bangalore,"Grace Grace Techno Ventures LLP, Rajul Garg & ...",Seed Funding,,
2421,49,49,02/06/2017,RentOnGo,eCommerce,"Online Marketplace for Renting Bikes, Electron...",Bangalore,TVS Motor Company,Private Equity,,


**Getting a summary of all columns**

describe() function will display the basic stats of the **numerical columns** in the data frame

In [11]:
#write code here
df.describe()

Unnamed: 0,index,SNo,AmountInUSD
count,2422.0,2422.0,1553.0
mean,1161.532205,1161.532205,11923850.0
std,697.598259,697.598259,63466800.0
min,0.0,0.0,16000.0
25%,555.25,555.25,375000.0
50%,1160.5,1160.5,1100000.0
75%,1765.75,1765.75,6000000.0
max,2371.0,2371.0,1400000000.0


**descirbe()** function with argument **include=['O']** will display the basic stats of the **object type** columns in the data frame. 

In [12]:
#Write code here
df.describe( include='O')

Unnamed: 0,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,Remarks
count,2422,2422,2251,1486,2243,2413,2421,419
unique,701,2001,743,1364,71,1885,7,69
top,30/11/2016,Swiggy,Consumer Internet,Online Pharmacy,Bangalore,Undisclosed Investors,Seed Funding,Series A
freq,11,7,795,10,649,33,1300,177



**descirbe()** function with argument **include=['all']** will display the basic stats of all types of columns in the data frame. 

In [13]:
#Write code here
df.describe(include='all')

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
count,2422.0,2422.0,2422,2422,2251,1486,2243,2413,2421,1553.0,419
unique,,,701,2001,743,1364,71,1885,7,,69
top,,,30/11/2016,Swiggy,Consumer Internet,Online Pharmacy,Bangalore,Undisclosed Investors,Seed Funding,,Series A
freq,,,11,7,795,10,649,33,1300,,177
mean,1161.532205,1161.532205,,,,,,,,11923850.0,
std,697.598259,697.598259,,,,,,,,63466800.0,
min,0.0,0.0,,,,,,,,16000.0,
25%,555.25,555.25,,,,,,,,375000.0,
50%,1160.5,1160.5,,,,,,,,1100000.0,
75%,1765.75,1765.75,,,,,,,,6000000.0,


**Getting unique values of  single column**

Display the unique values of the **InvestmentType** column
unique() function will return all unique values that are present in the column.

Note that **PrivateEquity** and **Private Equity** are two unique values.

In [14]:
#write code here
uniqueValues = df['InvestmentType'].unique()
uniqueValues

array(['Private Equity', 'Seed Funding', 'Debt Funding', nan,
       'SeedFunding', 'PrivateEquity', 'Crowd funding', 'Crowd Funding'],
      dtype=object)

## Missing Values

**Checking which columns have missing values and how many**

**isnull()** function return all missing values in the data frame and **sum()** function count the all null values of each column 

In [15]:
#write code here
df.isnull().sum()

index                  0
SNo                    0
Date                   0
StartupName            0
IndustryVertical     171
SubVertical          936
CityLocation         179
InvestorsName          9
InvestmentType         1
AmountInUSD          869
Remarks             2003
dtype: int64

**Treating Missing Values of a AmountInUSD column (Series)**   
Hint: with fillna(0) function 

In [18]:
#write code here
df['AmountInUSD'].fillna(0, inplace=True)

After running the following command you'll see that AmountInUSD is now filled with 0 and it doesn't have any missing/null values.

**isnull()** function return all missing values in the data frame and **sum()** function count the all null values of each column 

In [19]:
#write code here
df.isnull().sum()

index                  0
SNo                    0
Date                   0
StartupName            0
IndustryVertical     171
SubVertical          936
CityLocation         179
InvestorsName          9
InvestmentType         1
AmountInUSD            0
Remarks             2003
dtype: int64

**Treating all Missing Values at once**   
Hint: with **fillna('Others', inplace=True)** inplace=True save the updated data to the same variable

In [20]:
#write code here
df.fillna('Others', inplace=True)

**Checking which columns have missing values and how many**


In [21]:
#write code here
df.isnull().sum()

index               0
SNo                 0
Date                0
StartupName         0
IndustryVertical    0
SubVertical         0
CityLocation        0
InvestorsName       0
InvestmentType      0
AmountInUSD         0
Remarks             0
dtype: int64

Now you can see that all the columns have no **0** or **null values**

## Duplicates

Checking for duplicates in a **StartupName** column   
**duplicated()** return the duplicated values and **sum()** count all dupliacted values in that column 

In [None]:
#write code here


There are **421** StartupName instances that are exactly the same.

**Checking for whole row duplicates**

In [None]:
#write code here 


the above code returns the 50 number of rows that are duplicated.

In [None]:
df[df.duplicated()]

This shows that these are the rows that are duplicated. 

**Deleting duplicates from a specefic column**

In [None]:
df.shape

Following code will delete the duplicates that are present in **StartupName**. The **keep='First'** parameter keeps the first occurance of the instance that is duplicated and deletes the rest. So in resutl only unique instances are left

In [None]:
df.drop_duplicates(['StartupName'], keep='first').shape # Default is keep=first

In [None]:
df.drop_duplicates(['StartupName'], keep='last').shape #keep=last keeps the last instance and deletes the rest

In [None]:
df.drop_duplicates(['StartupName'], keep=False).shape # this deletes all the occurances of the duplicates so none is left.


**NOTE:** In all the above occurances neither did we use the inplace=True parameter nor manually assigned it. So there won't be any change in the df

**Deleting whole row duplicates**   

In [None]:
df.drop_duplicates().shape

In [None]:
df.shape

Now we will use the inplace=True parameter to save the data frame

In [None]:
df.drop_duplicates(inplace=True)
df.shape

## Selecting / Dropping Columns

**Selecting Columns**

In order to select multiple columns **'Date', 'CityLocation', 'AmountInUSD'** we have to pass these names as list in the data frame

In [None]:
#write code here

**Dropping Columns**

In [None]:
df.head()

**drop('index', axis=1)**  drop() function drop that column which we pass in parameter like column **index** and axis=1 stands for columns. by default axis=0, and axis=0 for rows

In [None]:
#write code here

In [None]:
df.shape

use the inplace true paramter to save the data frame

In [None]:
df.drop('index', axis=1,inplace=True)

In [None]:
df.shape

In [None]:
df.head()


## String Operations on whole columns

display the unique value of column **InvestmentType** 

In [None]:
#write code here

result will be like this array(['Private Equity', 'Seed Funding', 'Debt Funding', 'Others',
       'SeedFunding', 'PrivateEquity', 'Crowd funding', 'Crowd Funding'],
      dtype=object)


**String Replacement**

We want to replace *PrivateEquatiy* which is written together to be written as *Private Equatiy*

In [None]:
# first parameter in str.replace is the one which we want to replace and
#the second one is with which we are replacing it
df['InvestmentType'].replace('PrivateEquity', 'Private Equity').unique() 

In [None]:
df['InvestmentType'].unique()

As you can see it is back in the df because we did not re assign it. **Inplace=True** does **NOT** work here so we have to reassign it manually, 

In [None]:
df['InvestmentType'] = df['InvestmentType'].replace('PrivateEquity', 'Private Equity').unique() 

In [None]:
df['InvestmentType'].unique()

Now replace **Seed Funding** and **Crowd Funding** in investmentType

check unique values of **InvestmentType**

**Capitalization**

Display the column name **City Location** with lowercase
The lower() methods returns the lowercased string from the given string. It converts all uppercase characters to lowercase.

In [None]:
#write code here

**Checking if there is a substring in each column value**

In [None]:
df[df['InvestorsName'].str.contains('Khan')]

In [None]:
df[df['InvestorsName'].str.contains(' Khan')]

In [None]:
df['InvestorsName'].str.contains('Khan').sum()

In [None]:
df['InvestorsName'].str.contains('khan').sum()

**What do you observe?**
<br>The str.contains() function is case sensative. If you pass Khan with capital K then it will search it with capital K otherwise if you pass lower k then it will search accordingly.

## Column Transformations / Splits

**Changing Data types**

___1. Normal conversion___

display the data types of all columns

In [None]:
#write code here


Convert **float** to **int** the data type of column name **AmountInUSD**    
**astype(np.int64)** Cast a pandas object to a intiger data type.

In [None]:
#write code here

Display the data of all columns

In [None]:
#write code here

___2. Dates Conversion___

This will return an error as there are unknown string formats present. The below function will only work when date is in a specific format like 1/1/2001

In [None]:
pd.to_datetime(df['Date'])

In [None]:
df['Date'].str.contains('//').sum()

In [None]:
df.head()

we saw in the data that there were some anomolies present. Like in the dates double slashes were used instead of single so we had to replace them
<br> The errors='coerce' parameter ignores the errors and converts the rest.

In [None]:
df['New_Date'] = pd.to_datetime(df['Date'].str.replace('//', '/'), dayfirst=True, errors='coerce')

In [None]:
df.dtypes

In [None]:
df.isnull().sum()

In [None]:
df[df['New Date'].isnull()]

**Now replace both '//' and '.' with '/' in Date column**

In [None]:
#write code here

In [None]:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')

In [None]:
df.dtypes

In [None]:
df.isnull().sum()

In [None]:
df.drop('New Date',axis=1,inplace=True)

**Getting months/year/days as separate columns from a datetime column**

Use **DatetimeIndex.month** attribute to find the months present in the DatetimeIndex and assign to **df['month']** and same as for **day** and **year** as well.    


In [None]:
df['month'] = pd.DatetimeIndex(df['Date']).month

display the first five entries thorugh head().

**Exercise:** Extract day and year

In [None]:
#write code here

Extract current date time

In [None]:
import datetime

datetime.datetime.today()

**Extracting new columns from existing string columns**

split the column **InvestmentType**   
Hint: **split()** method returns a list of strings after breaking the given string by the specified separator.

In [None]:
#write code here

Split the column **InvestmentType** and display 0 index value   
Hint: **str[0]** 

In [None]:
#write code here

## Querying a Dataframe

**Querying based on a single value**

In [None]:
df['AmountInUSD'] < 100000

In [None]:
df[df['AmountInUSD'] < 100000]

**Querying based on multiple columns**   
where **AmountInUSD** is greater than **100000** and **City Location** is **New Delhi**

In [None]:
#write code here

**Querying based on a list of values**

In [None]:
df['CityLocation'].isin(['New Delhi', 'Bangalore', 'Mumbai'])

Querying based on **CityLocation** a list of values **'New Delhi', 'Bangalore', 'Mumbai'**  and along with **AmountInUSD** with greater than value of **100000**

In [None]:
#write code here

## `DataFrame.at` vs. `DataFrame.loc`

In [None]:
df.at[133,'StartupName']="Zubair Labs"

In [None]:
df.at[133,'StartupName']

In [None]:
df.loc[133,['StartupName', 'AmountInUSD']]

In [None]:
%timeit df.at[133,'StartupName']

In [None]:
%timeit df.loc[133,'StartupName']

## Joins

In [None]:
rows1 = df[0:10]
rows2 = df[5:12]
print(rows1.shape)
rows1
table1 = rows1[['SNo', 'Date', 'CityLocation']]
table2 = rows2[['SNo', 'StartupName', 'IndustryVertical', 'InvestmentType']]
print(table1.shape)
table1

In [None]:
print(table2.shape)
table2

table1 merge with table2 
**merge(table, how='right', on='SNo')**  
Most simply, we can explicitly specify the name of the key column using the **on** keyword, which takes a column name or a list of column names   
how keyword are **'inner', 'outer', 'left', and 'right'.** An outer join returns a join over the union of the input columns, and fills in all missing values with NaN: 

In [None]:
table1.merge(table2, how='right', on='SNo')

## Sorting

In [None]:
df.sort_values(by=['CityLocation'])

Sort values of **AmountInUSD** in descending order 

In [None]:
#write code here

## Grouping - Aggregation & Value Counts

**Without grouping**

sum of all values of **AmountInUSD**   
Hint: sum()

In [None]:
#write code here

Display all unique values of **InustrialVertical**   
Hint: unique()

In [None]:
#write code here

count all values of each category in **InvestmentType**    
Hint: value_counts()

In [None]:
#write code here

**Grouping based on a single column**

Group of dataframe on column **InvestmentType** 

In [None]:
#write code here

Group of dataframe on column **InvestmentType** and then sum of this group on **AmountInUSD**

Group of dataframe on column **CityLocation** and then count values of this group on **IndustryVertical**

In [None]:
#write code here 

**Grouping w.r.t. multiple columns**

Group of dataframe on multi columns **CityLocation, InvestmentType** and then sum of this group on **AmountInUSD**

In [None]:
#write code here

## Resetting Index

In [None]:
type(df.groupby(['CityLocation', 'InvestmentType'])['AmountInUSD'].sum())

Group of dataframe on multi columns **CityLocation, InvestmentType** and sum of this group on **AmountInUSD** then this type-series convert into dataframe through method of **reset_index**  


In [None]:
#write code here

In [None]:
type(df.groupby(['CityLocation', 'InvestmentType'])['AmountInUSD'].sum().reset_index())

In [None]:
grouped=df.groupby(['CityLocation', 'InvestmentType'])
grouped

In [None]:
grouped.agg({'AmountInUSD':{'min':np.min,'max':np.max,'mean':np.mean}})

## Multi Indexing

Group of dataframe on multi columns **'IndustryVertical', 'SubVertical'** and assign to 'grouped'

In [None]:
#write code here

and then aggregate(mean,sum) these group on **AmountInUSD** and assign to 'new' 

In [None]:
#write code here

In [None]:
new.head()

In [None]:
new.columns

In [None]:
new.columns=new.columns.droplevel(0)

In [None]:
new.columns

In [None]:
new.head()

In [None]:
new=new.reset_index()

In [None]:
new.head()

In [None]:
new.columns

#### You can slice a MultiIndex by providing multiple indexers.


You can use pandas.IndexSlice to facilitate a more natural syntax using :,



In [None]:
new = df.groupby(['IndustryVertical', 'SubVertical']).agg({'AmountInUSD': {'Mean': np.mean, 'Sum': np.sum}})

In [None]:
new.head()

In [None]:
idx = pd.IndexSlice
new.loc[idx[['3D Printer Manufacturer'], :], idx[['AmountInUSD'], : ]]

In [None]:
df.head()

In [None]:
idx = pd.IndexSlice
new.loc[idx[:, ['AI Based Personal Assistant', 'App based cab aggregator']], idx[:, ['Sum']]]


Run this code before next module

Group of dataframe on multi columns **CityLocation, InvestmentType** and sum of this group on **AmountInUSD** then this type-series convert into dataframe through method of **reset_index** and assign to 'new'  


In [None]:
#write code here

## Pivoting & Melting

**Pivoting**

In [None]:
pivoted=pd.pivot_table(new,index='InvestmentType', columns='CityLocation', values='AmountInUSD',aggfunc='sum',fill_value=0)


In [None]:
pivoted

In [None]:
a=pivoted.reset_index()

In [None]:
a

**Melting**

In [None]:
pd.melt(a, id_vars=['InvestmentType'], value_vars=['Agra','Ahmedabad'], value_name='Amount')

## Column / Row Concatenation

In [None]:
df.columns

In [None]:
part1 = df[['SNo', 'Date', 'StartupName']]
part2 = df[['InvestorsName', 'InvestmentType']]
print(part1.shape)
part1.head()

In [None]:
print(part2.shape)
part2.head()

In [None]:
pd.concat([part1, part2], axis=1)

**Concatenation** 


concatenate function can be used to concatenate two arrays either row-wise or column-wise. Concatenate function can take two or more arrays of the same shape and by default it concatenates row-wise i.e. axis=0.

**Concatenate row wise**
![Image of Yaktocat](http://www.datasciencemadesimple.com/wp-content/uploads/2017/11/rowbind-in-python-pandas-0.png)

**Concatenate Column wise**
![Image of Yaktocat](http://www.datasciencemadesimple.com/wp-content/uploads/2017/11/column-bind-in-python-pandas-1.png)

In [None]:
rows1 = df[:3]
rows2 = df[5:7]
print(rows1.shape)
rows1

In [None]:
print(rows2.shape)
rows2

Concatenate Row1 and Row2

In [None]:
#write code here

## Looping through DataFrame

#### Using Simple for loop

**Enumerate()** method adds a counter to an iterable and returns it in a form of enumerate object. This enumerate object can then be used directly in for loops or be converted into a list of tuples using list() method.

In [None]:
for index,row in enumerate(df.values):
    print(index,row)

#### Using iterrows func

**iterrows()** is a generator that iterates over the rows of the dataframe and returns the index of each row, in addition to an object containing the row itself.

In [None]:
for index, row in df.iterrows():
    print(index,row)
