# Pivot tables
* Pivot tables allow you to reorganize data, refactoring cells based on columns and a new index
* A DataFrame with repeated values can be pivoted for a reorganization and clarity
* <b> We choose columns to define the new index, columns, and values </b>
    * Pass those columns to define what sets we want for the new indexed columns and values
    * Notice how the choices for index and column should have repeated values
    * df.pivot(index='foo', columns='bar', values='baz')
         * The <b>foo</b> value in the original data frame has repeated values
         * The <b>bar</b> column is defined as the columns after pivot
         * All the information of <b>zoo</b> column is now discarded
    * What values are related for each unique value match up in the index and column choice
        * For example, what lined up with <b>Foo</b> is equal to two and Bar is equal to A
    * while we're calling the pivot method, not the pivot table method, don't show any new information, merely reorganized
    * You should first go through this <b> checklist </b> <i> before </i> running a pivot():
        * What question are you trying to answer?
        * What would a dataframe that answers the question look like? Does it need a pivot()
        * What do you want the resulting pivot to look like?
* Pandas also comes with a pivot_table method that allows for an additional aggregation function to be called
* This could alternatively be done with groupby() method call as well

In [1]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv(r'C:\Users\admin\Desktop\Data Science\Course-2021\03-Pandas\Sales_Funnel_CRM.csv')

In [4]:
df

Unnamed: 0,Account Number,Company,Contact,Account Manager,Product,Licenses,Sale Price,Status
0,2123398,Google,Larry Pager,Edward Thorp,Analytics,150,2100000,Presented
1,2123398,Google,Larry Pager,Edward Thorp,Prediction,150,700000,Presented
2,2123398,Google,Larry Pager,Edward Thorp,Tracking,300,350000,Under Review
3,2192650,BOBO,Larry Pager,Edward Thorp,Analytics,150,2450000,Lost
4,420496,IKEA,Elon Tusk,Edward Thorp,Analytics,300,4550000,Won
5,636685,Tesla Inc.,Elon Tusk,Edward Thorp,Analytics,300,2800000,Under Review
6,636685,Tesla Inc.,Elon Tusk,Edward Thorp,Prediction,150,700000,Presented
7,1216870,Microsoft,Will Grates,Edward Thorp,Tracking,300,350000,Under Review
8,2200450,Walmart,Will Grates,Edward Thorp,Analytics,150,2450000,Lost
9,405886,Apple,Cindy Phoner,Claude Shannon,Analytics,300,4550000,Won


In [6]:
# help(pd.pivot)

How many licenses of each product type that Google purchase



In [10]:
# the columns you want to work with:
# the company
# the licenses
# the product

licenses = df[['Company', 'Product', 'Licenses']]

In [11]:
licenses

Unnamed: 0,Company,Product,Licenses
0,Google,Analytics,150
1,Google,Prediction,150
2,Google,Tracking,300
3,BOBO,Analytics,150
4,IKEA,Analytics,300
5,Tesla Inc.,Analytics,300
6,Tesla Inc.,Prediction,150
7,Microsoft,Tracking,300
8,Walmart,Analytics,150
9,Apple,Analytics,300


In [12]:
# the company - the index
# the columns - Product
# the value - licenses
# Answer the question like what was the type of product we sold license to companies?
pd.pivot(data=licenses, index='Company', columns='Product', values='Licenses')

Product,Analytics,GPS Positioning,Prediction,Tracking
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Google,150.0,,150.0,300.0
ATT,,,150.0,150.0
Apple,300.0,,,
BOBO,150.0,,,
CVS Health,,,,450.0
Cisco,300.0,300.0,,
Exxon Mobile,150.0,,,
IKEA,300.0,,,
Microsoft,,,,300.0
Salesforce,750.0,,,


In [15]:
# perform aggregation
# this will actually perform a sum for every single numeric columns
# if you want specific columns - use the values= - provide a list of columns
pd.pivot_table(df, index='Company', aggfunc='sum', values= ['Licenses', 'Sale Price'])

Unnamed: 0_level_0,Licenses,Sale Price
Company,Unnamed: 1_level_1,Unnamed: 2_level_1
Google,600,3150000
ATT,300,1050000
Apple,300,4550000
BOBO,150,2450000
CVS Health,450,490000
Cisco,600,4900000
Exxon Mobile,150,2100000
IKEA,300,4550000
Microsoft,300,350000
Salesforce,750,7000000


In [14]:
# use the group by
df.groupby('Company').sum()


Unnamed: 0_level_0,Account Number,Licenses,Sale Price
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Google,6370194,600,3150000
ATT,1396064,300,1050000
Apple,405886,300,4550000
BOBO,2192650,150,2450000
CVS Health,902797,450,490000
Cisco,4338998,600,4900000
Exxon Mobile,470248,150,2100000
IKEA,420496,300,4550000
Microsoft,1216870,300,350000
Salesforce,2046943,750,7000000


In [22]:
# use the pivot table with two columns index
# the outside index is Account Manager, inside is Contact manager
# the total some of the sales that an account manager has done per contacts
# the columns= is optional
# fillvalue= parameter to replace the NaN value
# could call multiple aggregation by using the numpy
# margin is boolean object - True -> return the grand total in the bottom
pd.pivot_table(df, index=['Account Manager', 'Contact'], columns=['Product'], values=['Sale Price'], aggfunc=[np.sum, np.mean], fill_value=0,margins=True)


Unnamed: 0_level_0,Unnamed: 1_level_0,sum,sum,sum,sum,sum,mean,mean,mean,mean,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price,Sale Price
Unnamed: 0_level_2,Product,Analytics,GPS Positioning,Prediction,Tracking,All,Analytics,GPS Positioning,Prediction,Tracking,All
Account Manager,Contact,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3
Claude Shannon,Cindy Phoner,6650000,0,700000,350000,7700000,3325000.0,0,700000,350000,1925000.0
Claude Shannon,Emma Gordian,11550000,350000,0,490000,12390000,5775000.0,350000,0,490000,3097500.0
Edward Thorp,Elon Tusk,7350000,0,700000,0,8050000,3675000.0,0,700000,0,2683333.0
Edward Thorp,Larry Pager,4550000,0,700000,350000,5600000,2275000.0,0,700000,350000,1400000.0
Edward Thorp,Will Grates,2450000,0,0,350000,2800000,2450000.0,0,0,350000,1400000.0
All,,32550000,350000,2100000,1540000,36540000,3616667.0,350000,700000,385000,2149412.0


we have two tiers for the index and two tiers for the actual column
* Account and Contact are index
* Sales prices and it broken down in the Product type



In [21]:
# another way to solve the sophisticated pivot table
# use the Product columns as index
pd.pivot_table(df, index=['Account Manager', 'Contact','Product'], values=['Sale Price'], aggfunc=[np.sum, np.mean], fill_value=0)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sum,mean
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Sale Price,Sale Price
Account Manager,Contact,Product,Unnamed: 3_level_2,Unnamed: 4_level_2
Claude Shannon,Cindy Phoner,Analytics,6650000,3325000
Claude Shannon,Cindy Phoner,Prediction,700000,700000
Claude Shannon,Cindy Phoner,Tracking,350000,350000
Claude Shannon,Emma Gordian,Analytics,11550000,5775000
Claude Shannon,Emma Gordian,GPS Positioning,350000,350000
Claude Shannon,Emma Gordian,Tracking,490000,490000
Edward Thorp,Elon Tusk,Analytics,7350000,3675000
Edward Thorp,Elon Tusk,Prediction,700000,700000
Edward Thorp,Larry Pager,Analytics,4550000,2275000
Edward Thorp,Larry Pager,Prediction,700000,700000
