Loading a csv in Pandas
---------

Description

Create a dataframe from the file ‘marks.csv’ and print the contents of the dataframe. Open the file from the link above and inspect the required elements in the file (header, separator, etc.). If the top row is a regular entry, do not load it as the column header.

Execution Time Limit

15 seconds

In [None]:
import numpy as np
import pandas as pd

In [None]:
url = 'https://media-doselect.s3.amazonaws.com/generic/A08MajL8qN4rq72EpVJbAP1Rw/marks_1.csv'
df = pd.read_csv(url, sep='|', header=None)
df

Unnamed: 0,0,1,2,3,4,5
0,1,Akshay,Mathematics,50,40,80
1,2,Mahima,English,40,33,83
2,3,Vikas,Mathematics,50,42,84
3,4,Abhinav,English,40,31,78
4,5,Mahima,Science,50,40,80
5,6,Akshay,Science,50,49,98
6,7,Abhinav,Mathematics,50,47,94
7,8,Vikas,Science,50,40,80
8,9,Abhinav,Science,50,47,94
9,10,Vikas,English,40,39,98


Loading a csv with index
-----------

Description

Using the file ‘marks.csv’, create a dataframe as shown below.

![ss](https://media-doselect.s3.amazonaws.com/generic/0rjOooeKe4RQwnebLP8pzOaPV/01.%20Coding%20Question.PNG)

You must be able make the first column of the file as the index and name it 'S.No.'. 

Also, the columns must be renamed as shown in the image.

In [None]:
df = pd.read_csv(url, sep='|', header=None, index_col=0)
df.index.name = 'S.No.'
df.columns = ['Name', 'Subject', 'Maximum Marks', 'Marks Obtained', 'Percentage'] 
df

Unnamed: 0_level_0,Name,Subject,Maximum Marks,Marks Obtained,Percentage
S.No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Akshay,Mathematics,50,40,80
2,Mahima,English,40,33,83
3,Vikas,Mathematics,50,42,84
4,Abhinav,English,40,31,78
5,Mahima,Science,50,40,80
6,Akshay,Science,50,49,98
7,Abhinav,Mathematics,50,47,94
8,Vikas,Science,50,40,80
9,Abhinav,Science,50,47,94
10,Vikas,English,40,39,98


Hierarchical Indexing
----------

Description

You are provided with the dataset of a company which has offices across three cities - Mumbai, Bangalore and New Delhi. The dataset contains the rating (out of 5) of all the employees from different departments (Finance, HR, Marketing and Sales). 



Create a hierarchical index based on two columns: Office and Department



Print the first 5 rows as the output. Refer to the image below for your reference.


![ss](https://media-doselect.s3.amazonaws.com/generic/g7aveJBgGJKbypd7pz97GMgXR/04.%20Hierarchical%20Indexing.PNG)




Note: You should not sort or modify values in other columns of the dataframe. Use sort_index(inplace=True) to club the same locations together.

Execution Time Limit

15 seconds

In [None]:
import numpy as np
import pandas as pd

df = pd.read_csv('https://media-doselect.s3.amazonaws.com/generic/NMgEjwkAEGGQZBoNYGr9Ld7w0/rating.csv',
                 index_col=['Office', 'Department'])

df.sort_index(inplace=True)
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,ID,Rating
Office,Department,Unnamed: 2_level_1,Unnamed: 3_level_1
Bangalore,Finance,U2F53,2.7
Bangalore,Finance,U1F53,3.7
Bangalore,Finance,U1F28,3.2
Bangalore,Finance,U1F15,3.3
Bangalore,Finance,U1F14,2.9


DataFrames
--------

Description

Given a dataframe 'df' use the following commands and analyse the result.

    describe() 
    columns
    shape

Execution Time Limit

20 seconds

In [None]:
import numpy as np
import pandas as pd
df = pd.read_csv('https://cdn.upgrad.com/uploads/production/b3467ba4-4e13-44e9-8087-4d7e94cc7586/forestfires.csv')
df

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.00
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.00
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.00
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.00
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
512,4,3,aug,sun,81.6,56.7,665.6,1.9,27.8,32,2.7,0.0,6.44
513,2,4,aug,sun,81.6,56.7,665.6,1.9,21.9,71,5.8,0.0,54.29
514,7,4,aug,sun,81.6,56.7,665.6,1.9,21.2,70,6.7,0.0,11.16
515,1,4,aug,sat,94.4,146.0,614.7,11.3,25.6,42,4.0,0.0,0.00


In [None]:
df.describe()

Unnamed: 0,X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
count,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0
mean,4.669246,4.299807,90.644681,110.87234,547.940039,9.021663,18.889168,44.288201,4.017602,0.021663,12.847292
std,2.313778,1.2299,5.520111,64.046482,248.066192,4.559477,5.806625,16.317469,1.791653,0.295959,63.655818
min,1.0,2.0,18.7,1.1,7.9,0.0,2.2,15.0,0.4,0.0,0.0
25%,3.0,4.0,90.2,68.6,437.7,6.5,15.5,33.0,2.7,0.0,0.0
50%,4.0,4.0,91.6,108.3,664.2,8.4,19.3,42.0,4.0,0.0,0.52
75%,7.0,5.0,92.9,142.4,713.9,10.8,22.8,53.0,4.9,0.0,6.57
max,9.0,9.0,96.2,291.3,860.6,56.1,33.3,100.0,9.4,6.4,1090.84


In [None]:
df.columns

Index(['X', 'Y', 'month', 'day', 'FFMC', 'DMC', 'DC', 'ISI', 'temp', 'RH',
       'wind', 'rain', 'area'],
      dtype='object')

In [None]:
df.shape

(517, 13)

Selecting Columns of a Dataframe
--------

Description

Print out the columns 'month', 'day', 'temp', 'area' from the dataframe 'df'.

In [None]:
import pandas as pd
df = pd.read_csv('https://cdn.upgrad.com/uploads/production/b3467ba4-4e13-44e9-8087-4d7e94cc7586/forestfires.csv')
df[['month', 'day', 'temp', 'area']]

Unnamed: 0,month,day,temp,area
0,mar,fri,8.2,0.00
1,oct,tue,18.0,0.00
2,oct,sat,14.6,0.00
3,mar,fri,8.3,0.00
4,mar,sun,11.4,0.00
...,...,...,...,...
512,aug,sun,27.8,6.44
513,aug,sun,21.9,54.29
514,aug,sun,21.2,11.16
515,aug,sat,25.6,0.00


Indexing Dataframes
---------

Description

Print only the even numbers of rows of the dataframe 'df'.

Note: Don't include the row indexed zero.

In [None]:
import pandas as pd
df = pd.read_csv('https://cdn.upgrad.com/uploads/production/b3467ba4-4e13-44e9-8087-4d7e94cc7586/forestfires.csv')
df[2::2]

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.00
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.00
6,8,6,aug,mon,92.3,88.9,495.6,8.5,24.1,27,3.1,0.0,0.00
8,8,6,sep,tue,91.0,129.5,692.6,7.0,13.1,63,5.4,0.0,0.00
10,7,5,sep,sat,92.5,88.0,698.6,7.1,17.8,51,7.2,0.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
508,1,2,aug,fri,91.0,166.9,752.6,7.1,25.9,41,3.6,0.0,0.00
510,6,5,aug,fri,91.0,166.9,752.6,7.1,18.2,62,5.4,0.0,0.43
512,4,3,aug,sun,81.6,56.7,665.6,1.9,27.8,32,2.7,0.0,6.44
514,7,4,aug,sun,81.6,56.7,665.6,1.9,21.2,70,6.7,0.0,11.16


Applying Conditions on Dataframes
----

Description

Print all the columns and the rows where 'area' is greater than 0, 'wind' is greater than 1 and the 'temp' is greater than 15.


In [None]:
import pandas as pd
df = pd.read_csv('https://cdn.upgrad.com/uploads/production/b3467ba4-4e13-44e9-8087-4d7e94cc7586/forestfires.csv')
df[(df['area'] > 0) & (df['wind'] > 1) & (df['temp'] > 15)]

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
138,9,9,jul,tue,85.8,48.3,313.4,3.9,18.0,42,2.7,0.0,0.36
139,1,4,sep,tue,91.0,129.5,692.6,7.0,21.7,38,2.2,0.0,0.43
140,2,5,sep,mon,90.9,126.5,686.5,7.0,21.9,39,1.8,0.0,0.47
141,1,2,aug,wed,95.5,99.9,513.3,13.2,23.3,31,4.5,0.0,0.55
142,8,6,aug,fri,90.1,108.0,529.8,12.5,21.2,51,8.9,0.0,0.61
...,...,...,...,...,...,...,...,...,...,...,...,...,...
509,5,4,aug,fri,91.0,166.9,752.6,7.1,21.1,71,7.6,1.4,2.17
510,6,5,aug,fri,91.0,166.9,752.6,7.1,18.2,62,5.4,0.0,0.43
512,4,3,aug,sun,81.6,56.7,665.6,1.9,27.8,32,2.7,0.0,6.44
513,2,4,aug,sun,81.6,56.7,665.6,1.9,21.9,71,5.8,0.0,54.29


Employee Training
-------

Description

You are provided with the dataset of a company which has offices across three cities - Mumbai, Bangalore and New Delhi. The dataset contains the rating (out of 5) of all the employees from different departments (Finance, HR, Marketing and Sales). 



The company has come up with a new policy that any individual with a rating equal to or below 3.5 needs to attend a training. Using dataframes, load the dataset and then derive the column ‘Training’ which shows ‘Yes’ for people who require training and ‘No’ for those who do not.



Print the first 5 rows as the output. Refer to the image below for your reference.


![ss](https://media-doselect.s3.amazonaws.com/generic/QRJ27vaKx7XVW3B2nag54Bepv/03.%20Employee%20Rating.PNG)


Note: You should not sort or modify values in other columns of the dataframe.

In [None]:
import numpy as np
import pandas as pd

df = pd.read_csv('https://media-doselect.s3.amazonaws.com/generic/NMgEjwkAEGGQZBoNYGr9Ld7w0/rating.csv')
df

Unnamed: 0,ID,Department,Office,Rating
0,U2F26,Finance,New Delhi,3.4
1,U2M61,Marketing,New Delhi,3.9
2,U1S15,Sales,New Delhi,2.8
3,U1H87,HR,Mumbai,2.1
4,U1S51,Sales,New Delhi,4.6
...,...,...,...,...
528,U3S44,Sales,New Delhi,4.8
529,U2M11,Marketing,Bangalore,2.5
530,U3F53,Finance,Bangalore,3.2
531,U3S46,Sales,Bangalore,2.9


In [None]:
df['Training'] = df['Rating'].apply(lambda x : 'Yes' if x <= 3.5 else 'No')
df.head()

Unnamed: 0,ID,Department,Office,Rating,Training
0,U2F26,Finance,New Delhi,3.4,Yes
1,U2M61,Marketing,New Delhi,3.9,No
2,U1S15,Sales,New Delhi,2.8,Yes
3,U1H87,HR,Mumbai,2.1,Yes
4,U1S51,Sales,New Delhi,4.6,No


Dataframe
--------

In the dataframe created above, find the department that has the most efficient team (the team with minimum percentage of employees who need training).



In [None]:
for i in ['Finance', 'HR', 'Sales', 'Marketing']:
    print(i, len(df[(df['Training'] == 'No') & (df['Department'] == i)]) / len(df[df['Department'] == i]) * 100)

Finance 50.0
HR 57.25190839694656
Sales 49.23076923076923
Marketing 46.3768115942029


Groupby function
---------

What does the function: `df.groupby()` return without any aggregate function?


In [None]:
df.groupby(by=['Training'])

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1e4239e2c0>

------------

- A Pandas object is created

`Feedback:`

It returns a Pandas object, which can be used to perform further desired aggregation function.

------

Dataframe grouping
-------------

Description

Group the dataframe 'df' by 'month' and 'day' and find the mean value for column 'rain' and 'wind'.

In [None]:
import pandas as pd
df = pd.read_csv('https://cdn.upgrad.com/uploads/production/b3467ba4-4e13-44e9-8087-4d7e94cc7586/forestfires.csv')

df_1 = df.groupby(by=['month', 'day'])[['rain','wind']].mean(numeric_only=True)
df_1.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,rain,wind
month,day,Unnamed: 2_level_1,Unnamed: 3_level_1
apr,fri,0.0,3.1
apr,mon,0.0,3.1
apr,sat,0.0,4.5
apr,sun,0.0,5.666667
apr,thu,0.0,5.8
apr,wed,0.0,2.7
aug,fri,0.066667,4.766667
aug,mon,0.0,2.873333
aug,sat,0.0,4.310345
aug,sun,0.025,4.4175


Merges are of the following different types:

    left: This will select the entries only in the first dataframe.
    right: This will consider the entries only in the second dataframe.
    outer: This takes the union of all the entries in the dataframes.
    inner: This will result in the intersection of the keys from both frames.

Depending on the situation, you can use an appropriate method to merge the two DataFrames.

Merging Dataframes
----------

Suppose you are provided with two dataframes:

![ss](https://images.upgrad.com/d6fe6dfc-7cfd-4572-a39b-66273da32bb8-DF_quiz.PNG)

From the two dataframes above, how will you generate the following dataframe?

![ss](https://images.upgrad.com/1742ed65-47c1-4b89-8486-2b5ec956b2ab-resultant_df.PNG)

-----

`df_1.merge(df_2, how = 'inner')`

This is the correct answer. The `inner` option is useful when you want those elements that are common in both the dataframes.

Dataframes Merge
----

Description

Perform an inner merge on two data frames df_1 and df_2 on  'unique_id' and print the combined dataframe.

In [None]:
import pandas as pd

df_1 = pd.read_csv('https://cdn.upgrad.com/uploads/production/1ed2840b-f083-44fe-9eb4-acda009ac620/restaurant-1.csv')
df_2 = pd.read_csv('https://cdn.upgrad.com/uploads/production/c87b99b4-467c-4128-8729-85bff0b0acbe/restaurant-2.csv')

In [None]:
df_1

Unnamed: 0,name,address,city,cuisine,unique_id
0,arnie morton's of chicago,"""435 s. la cienega blvd.""","""los angeles""","""steakhouses""",'0'
1,art's deli,"""12224 ventura blvd.""","""studio city""","""delis""",'1'
2,bel-air hotel,"""701 stone canyon rd.""","""bel air""","""californian""",'2'
3,cafe bizou,"""14016 ventura blvd.""","""sherman oaks""","""french bistro""",'3'
4,campanile,"""624 s. la brea ave.""","""los angeles""","""californian""",'4'
...,...,...,...,...,...
107,mifune,"""1737 post st.""","""san francisco""","""japanese""",'107'
108,plumpjack cafe,"""3127 fillmore st.""","""san francisco""","""american (new)""",'108'
109,postrio,"""545 post st.""","""san francisco""","""californian""",'109'
110,ritz-carlton dining room (san francisco),"""600 stockton st.""","""san francisco""","""french (new)""",'110'


In [None]:
df_2

Unnamed: 0,name_2,address_2,city_2,cuisine_2,unique_id
0,arnie morton's of chicago,"""435 s. la cienega blv.""","""los angeles""","""american""",'0'
1,art's delicatessen,"""12224 ventura blvd.""","""studio city""","""american""",'1'
2,hotel bel-air,"""701 stone canyon rd.""","""bel air""","""californian""",'2'
3,cafe bizou,"""14016 ventura blvd.""","""sherman oaks""","""french""",'3'
4,campanile,"""624 s. la brea ave.""","""los angeles""","""american""",'4'
...,...,...,...,...,...
747,ti couz,"""3108 16th st.""","""san francisco""","""french""",'748'
748,trio cafe,"""1870 fillmore st.""","""san francisco""","""american""",'749'
749,tu lan,"""8 sixth st.""","""san francisco""","""vietnamese""",'750'
750,vicolo pizzeria,"""201 ivy st.""","""san francisco""","""pizza""",'751'


In [None]:
df_3 = df_1.merge(df_2, on=['unique_id'], how = 'inner')
df_3.head(20)

Unnamed: 0,name,address,city,cuisine,unique_id,name_2,address_2,city_2,cuisine_2
0,arnie morton's of chicago,"""435 s. la cienega blvd.""","""los angeles""","""steakhouses""",'0',arnie morton's of chicago,"""435 s. la cienega blv.""","""los angeles""","""american"""
1,art's deli,"""12224 ventura blvd.""","""studio city""","""delis""",'1',art's delicatessen,"""12224 ventura blvd.""","""studio city""","""american"""
2,bel-air hotel,"""701 stone canyon rd.""","""bel air""","""californian""",'2',hotel bel-air,"""701 stone canyon rd.""","""bel air""","""californian"""
3,cafe bizou,"""14016 ventura blvd.""","""sherman oaks""","""french bistro""",'3',cafe bizou,"""14016 ventura blvd.""","""sherman oaks""","""french"""
4,campanile,"""624 s. la brea ave.""","""los angeles""","""californian""",'4',campanile,"""624 s. la brea ave.""","""los angeles""","""american"""
5,chinois on main,"""2709 main st.""","""santa monica""","""pacific new wave""",'5',chinois on main,"""2709 main st.""","""santa monica""","""french"""
6,citrus,"""6703 melrose ave.""","""los angeles""","""californian""",'6',citrus,"""6703 melrose ave.""","""los angeles""","""californian"""
7,fenix at the argyle,"""8358 sunset blvd.""","""w. hollywood""","""french (new)""",'7',fenix,"""8358 sunset blvd. west""","""hollywood""","""american"""
8,granita,"""23725 w. malibu rd.""","""malibu""","""californian""",'8',granita,"""23725 w. malibu rd.""","""malibu""","""californian"""
9,grill the,"""9560 dayton way""","""beverly hills""","""american (traditional)""",'9',grill on the alley,"""9560 dayton way""","""los angeles""","""american"""


# import warnings

In [None]:
import pandas as pd
import warnings

# warnings.simplefilter("ignore")
df = pd.read_csv('https://cdn.upgrad.com/uploads/production/b3467ba4-4e13-44e9-8087-4d7e94cc7586/forestfires.csv')

df_1 = df.groupby(by=['month', 'day'])['rain','wind'].mean()
# df_1 = df.groupby(by=['month', 'day'])[['rain','wind']].mean()
df_1.head(20)

  df_1 = df.groupby(by=['month', 'day'])['rain','wind'].mean()


Unnamed: 0_level_0,Unnamed: 1_level_0,rain,wind
month,day,Unnamed: 2_level_1,Unnamed: 3_level_1
apr,fri,0.0,3.1
apr,mon,0.0,3.1
apr,sat,0.0,4.5
apr,sun,0.0,5.666667
apr,thu,0.0,5.8
apr,wed,0.0,2.7
aug,fri,0.066667,4.766667
aug,mon,0.0,2.873333
aug,sat,0.0,4.310345
aug,sun,0.025,4.4175


Dataframe Append
------------

Description

Append two datasets df_1 and df_2, and print the combined dataframe.

In [None]:
# Suppressing warnings
import warnings
warnings.simplefilter("ignore")

import pandas as pd
df_1 = pd.read_csv('https://cdn.upgrad.com/uploads/production/1ed2840b-f083-44fe-9eb4-acda009ac620/restaurant-1.csv')
df_2 = pd.read_csv('https://cdn.upgrad.com/uploads/production/c87b99b4-467c-4128-8729-85bff0b0acbe/restaurant-2.csv')
df_3 = df_1.append(df_2)

df_3.head()

Unnamed: 0,name,address,city,cuisine,unique_id,name_2,address_2,city_2,cuisine_2
0,arnie morton's of chicago,"""435 s. la cienega blvd.""","""los angeles""","""steakhouses""",'0',,,,
1,art's deli,"""12224 ventura blvd.""","""studio city""","""delis""",'1',,,,
2,bel-air hotel,"""701 stone canyon rd.""","""bel air""","""californian""",'2',,,,
3,cafe bizou,"""14016 ventura blvd.""","""sherman oaks""","""french bistro""",'3',,,,
4,campanile,"""624 s. la brea ave.""","""los angeles""","""californian""",'4',,,,


Operations on multiple dataframes
------

Description

Given three data frames containing the number of gold, silver, and bronze Olympic medals won by some countries, determine the total number of medals won by each country. 

Note: 

All three data frames don’t have all the same countries. So, ensure you use the ‘fill_value’ argument (set it to zero), to avoid getting NaN values. Also, ensure you sort the final data frame, according to the total medal count in descending order. Make sure that the results are in integers.

In [None]:
import numpy as np 
import pandas as pd

# Defining the three dataframes indicating the gold, silver, and bronze medal counts
# of different countries
gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],
                         'Medals': [15, 13, 9]}
                    )
silver = pd.DataFrame({'Country': ['USA', 'Germany', 'Russia'],
                        'Medals': [29, 20, 16]}
                    )
bronze = pd.DataFrame({'Country': ['France', 'USA', 'UK'],
                        'Medals': [40, 28, 27]}
                    )

In [None]:
gold.set_index('Country', inplace = True)
silver.set_index('Country', inplace = True) 
bronze.set_index('Country', inplace = True)

In [None]:
gold

Unnamed: 0_level_0,Medals
Country,Unnamed: 1_level_1
USA,15
France,13
Russia,9


In [None]:
silver

Unnamed: 0_level_0,Medals
Country,Unnamed: 1_level_1
USA,29
Germany,20
Russia,16


In [None]:
bronze

Unnamed: 0_level_0,Medals
Country,Unnamed: 1_level_1
France,40
USA,28
UK,27


In [None]:
total = gold.add(silver, fill_value = 0)
total

Unnamed: 0_level_0,Medals
Country,Unnamed: 1_level_1
France,13.0
Germany,20.0
Russia,25.0
USA,44.0


In [None]:
total = total.add(bronze, fill_value = 0)
total

Unnamed: 0_level_0,Medals
Country,Unnamed: 1_level_1
France,53.0
Germany,20.0
Russia,25.0
UK,27.0
USA,72.0


In [None]:
total = total.sort_values(by = 'Medals', ascending = False)
total

Unnamed: 0_level_0,Medals
Country,Unnamed: 1_level_1
USA,72.0
France,53.0
UK,27.0
Russia,25.0
Germany,20.0


# Pandas DataFrame.set_index()

In [None]:
import pandas as pd

students = [
            ['jack',    34, 'Sydeny',    'Australia', 85.96],
            ['Riti',    30, 'Delhi',     'India',     95.20],
            ['Vansh',   31, 'Delhi',     'India',     85.25],
            ['Nanyu',   32, 'Tokyo',     'Japan',     74.21],
            ['Maychan', 16, 'New York',  'US',        99.63],
            ['Mike',    17, 'las vegas', 'US',        47.28],
           ]

df = pd.DataFrame(students,
                  columns = ['Name', 'Age', 'City', 'Country','Agg_Marks'],
                  index   = ['a', 'b', 'c', 'd', 'e', 'f'],
                 )

df.set_index(['Name', 'Age'], 
              inplace = True,     # True replace DataFrame Object, default is False
              append  = True,     # False replaces index, default is False
              drop    = False,    # False creates new index column, default is True
              )

df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Name,Age,City,Country,Agg_Marks
Unnamed: 0_level_1,Name,Age,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
a,jack,34,jack,34,Sydeny,Australia,85.96
b,Riti,30,Riti,30,Delhi,India,95.2
c,Vansh,31,Vansh,31,Delhi,India,85.25
d,Nanyu,32,Nanyu,32,Tokyo,Japan,74.21
e,Maychan,16,Maychan,16,New York,US,99.63
f,Mike,17,Mike,17,las vegas,US,47.28


------------

# What is the difference between the pivot_table and the groupby? 

[Answer](https://stackoverflow.com/a/34702851/11493297)
-------

- The `groupby` method is generally enough for `two-dimensional` operations, 

- but `pivot_table` is used for `multi-dimensional` grouping operations.

-----------------

[Pandas_Cheat_Sheet.pdf](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)    
------------

# Code :
    
    df.pivot(columns='grouping_variable_col', values='value_to_aggregate', index='grouping_variable_row')

# Meaning :

- Perform `aggregate` function,
- on `values` column,
- for the `Index` column,
- corresponding to each `column`.


    df.pivot_table(values, index, aggfunc={'value_1': np.mean,'value_2': [min, max, np.mean]})

[Pandas.pivot_table()](https://www.geeksforgeeks.org/python-pandas-pivot_table/)
-----------------

In [None]:
# Create a simple dataframe
   
# importing pandas as pd
import pandas as pd
import numpy as np
   
# creating a dataframe
df = pd.DataFrame({'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
      'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
      'C': [27, 23, 21, 23, 24]})
   
df

Unnamed: 0,A,B,C
0,John,Masters,27
1,Boby,Graduate,23
2,Mina,Graduate,21
3,Peter,Masters,23
4,Nicky,Graduate,24


In [None]:
table = pd.pivot_table(df, index = ['A', 'B'])
table

Unnamed: 0_level_0,Unnamed: 1_level_0,C
A,B,Unnamed: 2_level_1
Boby,Graduate,23
John,Masters,27
Mina,Graduate,21
Nicky,Graduate,24
Peter,Masters,23


In [None]:
table = pd.pivot_table(df, 
                       values = 'A', 
                       index = ['B', 'C'], 
                       columns = ['B'], 
                       aggfunc = np.sum,
                       )
table

Unnamed: 0_level_0,B,Graduate,Masters
B,C,Unnamed: 2_level_1,Unnamed: 3_level_1
Graduate,21,Mina,
Graduate,23,Boby,
Graduate,24,Nicky,
Masters,23,,Peter
Masters,27,,John


Dataframe Pivot Table
----------

Description

Group the data 'df' by 'month' and 'day' and find the mean value for column 'rain' and 'wind' using the pivot table command.


In [3]:
import numpy as np
import pandas as pd

url = 'https://cdn.upgrad.com/uploads/production/b3467ba4-4e13-44e9-8087-4d7e94cc7586/forestfires.csv'
df = pd.read_csv(url)

# df_1 = df.pivot_table(values = ['rain', 'wind'], index = ['month', 'day'], aggfunc = np.mean)
df_1 = df.pivot_table(index = ['month','day'], aggfunc = 'mean')[['rain','wind']] # grouped by index

df_1.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,rain,wind
month,day,Unnamed: 2_level_1,Unnamed: 3_level_1
apr,fri,0.0,3.1
apr,mon,0.0,3.1
apr,sat,0.0,4.5
apr,sun,0.0,5.666667
apr,thu,0.0,5.8
