# Pivot tables

Used to reorganise data, refactoring cells based on columns and a new index

**SIMILAR TO GROUPBY()**

- .pivot() method allows to categorise and organise data based on columns
- .pivot_table() method allows an aggregation function on the above organised data 

In [2]:
import pandas as pd
import numpy as np

In [40]:
df = pd.read_csv('Sales_Funnel_CRM.csv')
df

Unnamed: 0,Account Number,Company,Contact,Account Manager,Product,Licenses,Sale Price,Status
0,2123398,Google,Larry Pager,Edward Thorp,Analytics,150,2100000,Presented
1,2123398,Google,Larry Pager,Edward Thorp,Prediction,150,700000,Presented
2,2123398,Google,Larry Pager,Edward Thorp,Tracking,300,350000,Under Review
3,2192650,BOBO,Larry Pager,Edward Thorp,Analytics,150,2450000,Lost
4,420496,IKEA,Elon Tusk,Edward Thorp,Analytics,300,4550000,Won
5,636685,Tesla Inc.,Elon Tusk,Edward Thorp,Analytics,300,2800000,Under Review
6,636685,Tesla Inc.,Elon Tusk,Edward Thorp,Prediction,150,700000,Presented
7,1216870,Microsoft,Will Grates,Edward Thorp,Tracking,300,350000,Under Review
8,2200450,Walmart,Will Grates,Edward Thorp,Analytics,150,2450000,Lost
9,405886,Apple,Cindy Phoner,Claude Shannon,Analytics,300,4550000,Won


----

## .pivot() method

### Checking using pivot table:
### HOW MANY LICENSES GOOGLE PURCHASED FOR EACH PRODUCT TYPE


In [8]:
# STEP 1: Creating new dataframe with only the columns we need
licenses = df[['Company','Product','Licenses']]
licenses

Unnamed: 0,Company,Product,Licenses
0,Google,Analytics,150
1,Google,Prediction,150
2,Google,Tracking,300
3,BOBO,Analytics,150
4,IKEA,Analytics,300
5,Tesla Inc.,Analytics,300
6,Tesla Inc.,Prediction,150
7,Microsoft,Tracking,300
8,Walmart,Analytics,150
9,Apple,Analytics,300


In [10]:
# STEP 2: using .pivot() method

# PARAMETERS
# index = columns name to organise the data by (should be a column with repeated values)
# columns = return the data values by this column 
# values = data values to be returned

product_lic = pd.pivot(data = licenses, 
         index = 'Company', columns = 'Product', values = 'Licenses')
product_lic

Product,Analytics,GPS Positioning,Prediction,Tracking
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Google,150.0,,150.0,300.0
ATT,,,150.0,150.0
Apple,300.0,,,
BOBO,150.0,,,
CVS Health,,,,450.0
Cisco,300.0,300.0,,
Exxon Mobile,150.0,,,
IKEA,300.0,,,
Microsoft,,,,300.0
Salesforce,750.0,,,


In [19]:
# STEP 3: checking names of labeled index

product_lic.index

# observe that there is an extra space before google so we will clean the data

Index([' Google', 'ATT', 'Apple', 'BOBO', 'CVS Health', 'Cisco',
       'Exxon Mobile', 'IKEA', 'Microsoft', 'Salesforce', 'Tesla Inc.',
       'Walmart'],
      dtype='object', name='Company')

In [24]:
# STEP 4: cleaning the labelend index names

product_lic.index = product_lic.index.str.strip()
product_lic.index

Index(['Google', 'ATT', 'Apple', 'BOBO', 'CVS Health', 'Cisco', 'Exxon Mobile',
       'IKEA', 'Microsoft', 'Salesforce', 'Tesla Inc.', 'Walmart'],
      dtype='object', name='Company')

In [27]:
# STEP 5: final step: grabbing the row data for google

product_lic.loc['Google']

Product
Analytics          150.0
GPS Positioning      NaN
Prediction         150.0
Tracking           300.0
Name: Google, dtype: float64

In [38]:
# grabbing companies we sold gps positioning to

product_lic[product_lic['GPS Positioning'].notnull()]

Product,Analytics,GPS Positioning,Prediction,Tracking
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cisco,300.0,300.0,,


----

## .pivot_table() method

Used to perform some aggregate function call

In [41]:
df.head()

Unnamed: 0,Account Number,Company,Contact,Account Manager,Product,Licenses,Sale Price,Status
0,2123398,Google,Larry Pager,Edward Thorp,Analytics,150,2100000,Presented
1,2123398,Google,Larry Pager,Edward Thorp,Prediction,150,700000,Presented
2,2123398,Google,Larry Pager,Edward Thorp,Tracking,300,350000,Under Review
3,2192650,BOBO,Larry Pager,Edward Thorp,Analytics,150,2450000,Lost
4,420496,IKEA,Elon Tusk,Edward Thorp,Analytics,300,4550000,Won


In [63]:
# Returning total number of licenses soldand total sales made for each company

pd.pivot_table(data=df, values = ['Licenses','Sale Price'],
               index = 'Company', aggfunc = 'sum')

Unnamed: 0_level_0,Licenses,Sale Price
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
Google,600,3150000
ATT,300,1050000
Apple,300,4550000
BOBO,150,2450000
CVS Health,450,490000
Cisco,600,4900000
Exxon Mobile,150,2100000
IKEA,300,4550000
Microsoft,300,350000
Salesforce,750,7000000


In [70]:
# above result using groupby()

df.groupby('Company').sum(numeric_only=True)[['Licenses','Sale Price']]

Unnamed: 0_level_0,Licenses,Sale Price
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
Google,600,3150000
ATT,300,1050000
Apple,300,4550000
BOBO,150,2450000
CVS Health,450,490000
Cisco,600,4900000
Exxon Mobile,150,2100000
IKEA,300,4550000
Microsoft,300,350000
Salesforce,750,7000000


----

**Multilevel indexing**

**i.e., setting more than one colums as index**

In [76]:
# Finding total sales by account managers for every contact
# i.e., indexing by account maanager, sub indexing by contact and 
# aggregate function set as sum

pd.pivot_table(df, values = ['Sale Price'],
               index = ['Account Manager','Contact'], 
               aggfunc = 'sum')

Unnamed: 0_level_0,Unnamed: 1_level_0,Sale Price
Account Manager,Contact,Unnamed: 2_level_1
Claude Shannon,Cindy Phoner,7700000
Claude Shannon,Emma Gordian,12390000
Edward Thorp,Elon Tusk,8050000
Edward Thorp,Larry Pager,5600000
Edward Thorp,Will Grates,2800000


----

In [77]:
# sorting the above result by total sales per product
# this is done by use of the parameter 'columns'
# also, set the null values as zero

pd.pivot_table(df, values = ['Sale Price'],
              index = ['Account Manager','Contact'],
              columns = ['Product'], aggfunc = 'sum',
              fill_value = 0)

Unnamed: 0_level_0,Unnamed: 1_level_0,Sale Price,Sale Price,Sale Price,Sale Price
Unnamed: 0_level_1,Product,Analytics,GPS Positioning,Prediction,Tracking
Account Manager,Contact,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Claude Shannon,Cindy Phoner,6650000,0,700000,350000
Claude Shannon,Emma Gordian,11550000,350000,0,490000
Edward Thorp,Elon Tusk,7350000,0,700000,0
Edward Thorp,Larry Pager,4550000,0,700000,350000
Edward Thorp,Will Grates,2450000,0,0,350000


----

In [87]:
# CALLING MORE THAN ONE AGGREGATE FUNCTION

pd.pivot_table(df, values = ['Sale Price'],
               index = ['Account Manager','Contact'], 
               columns = ['Product'], 
               aggfunc = ['sum','mean'],
               fill_value = 0 )

Unnamed: 0_level_0,Unnamed: 1_level_0,sum,sum,sum,sum,mean,mean,mean,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price
Unnamed: 0_level_2,Product,Analytics,GPS Positioning,Prediction,Tracking,Analytics,GPS Positioning,Prediction,Tracking
Account Manager,Contact,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3
Claude Shannon,Cindy Phoner,6650000,0,700000,350000,3325000,0,700000,350000
Claude Shannon,Emma Gordian,11550000,350000,0,490000,5775000,350000,0,490000
Edward Thorp,Elon Tusk,7350000,0,700000,0,3675000,0,700000,0
Edward Thorp,Larry Pager,4550000,0,700000,350000,2275000,0,700000,350000
Edward Thorp,Will Grates,2450000,0,0,350000,2450000,0,0,350000


----

In [88]:
# in the above result, making product as our sub sub index

pd.pivot_table(df, values = ['Sale Price'],
               index = ['Account Manager','Contact','Product'], 
               aggfunc = ['sum','mean'],
               fill_value = 0 )

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sum,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Sale Price,Sale Price
Account Manager,Contact,Product,Unnamed: 3_level_2,Unnamed: 4_level_2
Claude Shannon,Cindy Phoner,Analytics,6650000,3325000
Claude Shannon,Cindy Phoner,Prediction,700000,700000
Claude Shannon,Cindy Phoner,Tracking,350000,350000
Claude Shannon,Emma Gordian,Analytics,11550000,5775000
Claude Shannon,Emma Gordian,GPS Positioning,350000,350000
Claude Shannon,Emma Gordian,Tracking,490000,490000
Edward Thorp,Elon Tusk,Analytics,7350000,3675000
Edward Thorp,Elon Tusk,Prediction,700000,700000
Edward Thorp,Larry Pager,Analytics,4550000,2275000
Edward Thorp,Larry Pager,Prediction,700000,700000


----

In [91]:
# using parameter margins to get a grand total at the end of the pivot table

pd.pivot_table(df, values = ['Sale Price'],
               index = ['Account Manager','Contact','Product'], 
               aggfunc = ['sum','mean'], margins = True,
               fill_value = 0 )

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sum,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Sale Price,Sale Price
Account Manager,Contact,Product,Unnamed: 3_level_2,Unnamed: 4_level_2
Claude Shannon,Cindy Phoner,Analytics,6650000,3325000.0
Claude Shannon,Cindy Phoner,Prediction,700000,700000.0
Claude Shannon,Cindy Phoner,Tracking,350000,350000.0
Claude Shannon,Emma Gordian,Analytics,11550000,5775000.0
Claude Shannon,Emma Gordian,GPS Positioning,350000,350000.0
Claude Shannon,Emma Gordian,Tracking,490000,490000.0
Edward Thorp,Elon Tusk,Analytics,7350000,3675000.0
Edward Thorp,Elon Tusk,Prediction,700000,700000.0
Edward Thorp,Larry Pager,Analytics,4550000,2275000.0
Edward Thorp,Larry Pager,Prediction,700000,700000.0
