***
# 4.06 Pandas Apply, Map, Applymap, Pivot Table, and Contingency Table
- Pandas Documentation: https://pandas.pydata.org/
***
### Python 4.01. Series
### Python 4.02. Pandas DataFrame, Selection, and Indexing
### Python 4.03. Configuring Options, Data Type Conversion, Working with strings and dates, Missing Data
### Python 4.04. Groupby, Categorizing, and Labeling Data
### Python 4.05. Merging,Joining,and Concatenating
### Python 4.06. Pipe, Apply, Applymap, Map, Pivot Table, and Cotingency Table
### Python 4.07. Data Input and Output
### Python 4.08. Data Visualization
### Python 4.09. Exploratory Data Analysis and Beyond
### Python 4.10. Breakout Group Exercise and Solution
***

## Table of Contents: Pandas Apply, Map, Applymap, Pivot Table and Contingency Table

### Section 1. `apply()` Introduction
### Section 2. `pipe()`, `apply()`, `map()`, and `applymap()`
### Section 3. pivot table: `pivot_table()`
### Section 4. contingency table: `crosstab()` 

In [51]:
import pandas as pd
import numpy as np
import math

In [52]:
# Example
df = pd.DataFrame({'col1':[2,5,6,2],
                   'col2':[22,55,66,22],
                   'col3':['a','d','g','x'],
                   'col4':[34,np.nan,21,np.nan]})
df.head()

Unnamed: 0,col1,col2,col3,col4
0,2,22,a,34.0
1,5,55,d,
2,6,66,g,21.0
3,2,22,x,


#### Info on Unique Values

In [53]:
df['col2'].unique()

array([22, 55, 66], dtype=int64)

In [54]:
len(df['col2'].unique())

3

In [55]:
df['col2'].nunique()

3

In [56]:
df['col4'].value_counts()

34.0    1
21.0    1
Name: col4, dtype: int64

In [57]:
df['col4'].value_counts(dropna=False)

NaN     2
34.0    1
21.0    1
Name: col4, dtype: int64

In [58]:
df['col1']>2

0    False
1     True
2     True
3    False
Name: col1, dtype: bool

In [59]:
#Select from DataFrame using criteria from multiple columns
newdf = df[(df['col1']==2) & (df['col2']==22)]

In [60]:
newdf

Unnamed: 0,col1,col2,col3,col4
0,2,22,a,34.0
3,2,22,x,


In [61]:
df.sort_values('col2')

Unnamed: 0,col1,col2,col3,col4
0,2,22,a,34.0
3,2,22,x,
1,5,55,d,
2,6,66,g,21.0


In [62]:
df.sort_values(by=['col3','col2']) # inplace=False by default

Unnamed: 0,col1,col2,col3,col4
0,2,22,a,34.0
1,5,55,d,
2,6,66,g,21.0
3,2,22,x,


## Section 1.  `apply()` Introduction

In [63]:
df['col1']>2

0    False
1     True
2     True
3    False
Name: col1, dtype: bool

In [64]:
df['col1']

0    2
1    5
2    6
3    2
Name: col1, dtype: int64

In [65]:
def times3(x):
    return x*3

In [66]:
df['col1'].apply(times3)

0     6
1    15
2    18
3     6
Name: col1, dtype: int64

In [67]:
df['col3']

0    a
1    d
2    g
3    x
Name: col3, dtype: object

In [68]:
df['col3'].apply(len)

0    1
1    1
2    1
3    1
Name: col3, dtype: int64

In [69]:
df['col1'].sum()

15

In [70]:
df[['col1','col2']].apply(np.mean, axis=1)

0    12.0
1    30.0
2    36.0
3    12.0
dtype: float64

In [71]:
df[['col1']].apply(np.log)

Unnamed: 0,col1
0,0.693147
1,1.609438
2,1.791759
3,0.693147


In [72]:
df['col2'].apply(lambda x: x*3)

0     66
1    165
2    198
3     66
Name: col2, dtype: int64

In [73]:
df[['col1','col2']].applymap(np.log)

Unnamed: 0,col1,col2
0,0.693147,3.091042
1,1.609438,4.007333
2,1.791759,4.189655
3,0.693147,3.091042


In [74]:
# Example:
def short_desc(i):
    if i ='SERVICES': return 'Sevs'
    elif i = 'MANUFACTURING': return 'Manuf'
    else: return 'Others'
df['sic_div'] = df['major_desc'].str.strip().apply(short_desc)

def NTB12mo_Ind(i):
    if i <=365: return 'NTB_12mo-'
    elif i>365: return 'NTB_12mo+'
    else: return 'Others'
df['NTB_12mo_Ind'] = df['daydiff'].astype('int64').apply(NTB12mo_Ind)     


SyntaxError: invalid syntax (Temp/ipykernel_86012/1799812630.py, line 3)

## Section 2. `pipe()`,`apply()`, `applymap()` , and `map()`

To Apply our own function or some other library’s function, pandas provide three important functions namely `pipe()`, `apply()` and `applymap()`. I also would like to add `map()` as well since it looks similar:

- 1) Table Wise Function Application: pipe()
- 2) Row or Column Wise Function Application: apply()
- 3) Element Wise Function Application: applymap()
- 4) Series Wise Function Application: map()

### 1) Table wise Function Application: pipe()

- The `dataframe.pipe()` method allows you to apply one specific or multiple methods to the entire DataFrame object.

- Syntax: dataframe.pipe(func, *args, *kwargs)

    - Parameters: 
        - func: It is the method to apply to the specified DataFrame.
        - args: It can be iterable, which is optional. It indicates the positional arguments passed into func.
        - kwargs: It indicates mapping, which is optional. A dictionary of keyword arguments passed into func.

In [75]:
# Create own function: use pipe() Function to add value 2 to the entire dataframe
def adder(adder1,adder2):
    return adder1+adder2
#Create a Dictionary of series
d = {'Score_Math':pd.Series([66,57,75,44,31,67,85,33,42,62,51,47]),
     'Score_Science':pd.Series([89,87,67,55,47,72,76,79,44,92,93,69])}
df = pd.DataFrame(d)
df

Unnamed: 0,Score_Math,Score_Science
0,66,89
1,57,87
2,75,67
3,44,55
4,31,47
5,67,72
6,85,76
7,33,79
8,42,44
9,62,92


In [76]:
df.pipe(adder,2)

Unnamed: 0,Score_Math,Score_Science
0,68,91
1,59,89
2,77,69
3,46,57
4,33,49
5,69,74
6,87,78
7,35,81
8,44,46
9,64,94


In [77]:
df.applymap(adder,2)

ValueError: na_action must be 'ignore' or None. Got 2

### 2) Row or Column Wise Function Application: apply()
`df.apply()` performs the custom operation for either row wise or column wise . In below example we will be using apply() Function to find the mean of values across rows and mean of values across columns

#### Apply(): Row wise Function in python pandas

In [78]:
df

Unnamed: 0,Score_Math,Score_Science
0,66,89
1,57,87
2,75,67
3,44,55
4,31,47
5,67,72
6,85,76
7,33,79
8,42,44
9,62,92


In [79]:
# apply() Function to find the mean of values across rows - row wise mean
df.apply(np.mean,axis=1)

0     77.5
1     72.0
2     71.0
3     49.5
4     39.0
5     69.5
6     80.5
7     56.0
8     43.0
9     77.0
10    72.0
11    58.0
dtype: float64

#### Apply(): Column wise Function in python pandas

In [80]:
# apply() Function to find the mean of values across columns - column wise mean
df.apply(np.mean,axis=0)

Score_Math       55.0
Score_Science    72.5
dtype: float64

#### Apply(): allows to harness functions to alter values along an axis in your dataframe or series

In [81]:
# Example
df = pd.DataFrame({'Region':['North','West','East','South','Noth','West','East','South'],
                   'Team':['One','One','One','One','Two','Two','Two','Two'],
                   'Squad':['A','B','C','D','E','F','G','H'],
                   'Revenue':[7500,5500,2750,6400,2300,3750,1900,575],
                   'Cost':[5200,5100,4400,5300,1250,1300,2100,50]})
df

Unnamed: 0,Region,Team,Squad,Revenue,Cost
0,North,One,A,7500,5200
1,West,One,B,5500,5100
2,East,One,C,2750,4400
3,South,One,D,6400,5300
4,Noth,Two,E,2300,1250
5,West,Two,F,3750,1300
6,East,Two,G,1900,2100
7,South,Two,H,575,50


#### Use apply() to alter values along an axis in your `dataframe` or in a `series` 

Lambda function allows to create a function in the apply statement without needing to created it in advance

In [82]:
df['Profit'] = df.apply(lambda x: ('Profit' if x['Revenue'] > x['Cost'] else 'Loss'), axis = 1)
df

Unnamed: 0,Region,Team,Squad,Revenue,Cost,Profit
0,North,One,A,7500,5200,Profit
1,West,One,B,5500,5100,Profit
2,East,One,C,2750,4400,Loss
3,South,One,D,6400,5300,Profit
4,Noth,Two,E,2300,1250,Profit
5,West,Two,F,3750,1300,Profit
6,East,Two,G,1900,2100,Loss
7,South,Two,H,575,50,Profit


In [None]:
# Example:
def fill_age(data):
    age=data[0]
    sex=data[1]
    
    if pd.isnull(age):
        if sex is 'male':
            return 28
        else: return 25
    else:
        return age
training['age'] = training[['age','sex']].apply(fill_age, axis=1)        

### 3) Element wise Function Application in python pandas: applymap()
applymap() Function performs the specified operation for all the elements the dataframe. 

In [84]:
# multiply the all the elements of dataframe by 2 as shown below
df.applymap(lambda x:x*2)

Unnamed: 0,Region,Team,Squad,Revenue,Cost,Profit
0,NorthNorth,OneOne,AA,15000,10400,ProfitProfit
1,WestWest,OneOne,BB,11000,10200,ProfitProfit
2,EastEast,OneOne,CC,5500,8800,LossLoss
3,SouthSouth,OneOne,DD,12800,10600,ProfitProfit
4,NothNoth,TwoTwo,EE,4600,2500,ProfitProfit
5,WestWest,TwoTwo,FF,7500,2600,ProfitProfit
6,EastEast,TwoTwo,GG,3800,4200,LossLoss
7,SouthSouth,TwoTwo,HH,1150,100,ProfitProfit


In [85]:
# find the square root of all the elements of dataframe 
df[['Revenue','Cost']].applymap(lambda x:math.sqrt(x))

Unnamed: 0,Revenue,Cost
0,86.60254,72.111026
1,74.161985,71.414284
2,52.440442,66.332496
3,80.0,72.801099
4,47.958315,35.355339
5,61.237244,36.055513
6,43.588989,45.825757
7,23.979158,7.071068


In [86]:
# find the length of all the elements of dataframe 
df.applymap(lambda x: len(str(x)))

Unnamed: 0,Region,Team,Squad,Revenue,Cost,Profit
0,5,3,1,4,4,6
1,4,3,1,4,4,6
2,4,3,1,4,4,4
3,5,3,1,4,4,6
4,4,3,1,4,4,6
5,4,3,1,4,4,6
6,4,3,1,4,4,4
7,5,3,1,3,2,6


#### If all else fails, use a `for` loop

In [87]:
new_col =[]

for i in range(0,len(df)):
    rev = df['Revenue'][i] / (df[df['Region'] == df.loc[i,'Region']]['Revenue'].sum())
    new_col.append(rev)

In [88]:
df['Revenue Share of Region'] = new_col
df.sort_values(by = 'Region')

Unnamed: 0,Region,Team,Squad,Revenue,Cost,Profit,Revenue Share of Region
2,East,One,C,2750,4400,Loss,0.591398
6,East,Two,G,1900,2100,Loss,0.408602
0,North,One,A,7500,5200,Profit,1.0
4,Noth,Two,E,2300,1250,Profit,1.0
3,South,One,D,6400,5300,Profit,0.917563
7,South,Two,H,575,50,Profit,0.082437
1,West,One,B,5500,5100,Profit,0.594595
5,West,Two,F,3750,1300,Profit,0.405405


### 4) Use map to substitute each value in a `series`, using either a function, dirctionary, or series

- The map() function executes a specified function for each item in an iterable. The item is sent to the function as a parameter.

- Syntax: map(function, iterables)

In [89]:
# Example: Calculate the length of each word in the tuple:

def myfunc(n):
    return len(n)
x = map(myfunc, ('apple', 'banana', 'cherry'))
print(list(x))

[5, 6, 6]


In [90]:
# Example: Make new fruits by sending two iterable objects into the function:

def myfunc(a, b):
    return a + b

x = map(myfunc, ('apple', 'banana', 'cherry'), ('orange', 'lemon', 'pineapple'))
print(x)

<map object at 0x000002A078B3C640>


In [91]:
#convert the map into a list, for readability:
print(list(x))

['appleorange', 'bananalemon', 'cherrypineapple']


In [92]:
# Example
df = pd.DataFrame({'Region':['North','West','East','South','Noth','West','East','South'],
                   'Team':['One','One','One','One','Two','Two','Two','Two'],
                   'Squad':['A','B','C','D','E','F','G','H'],
                   'Revenue':[7500,5500,2750,6400,2300,3750,1900,575],
                   'Cost':[5200,5100,4400,5300,1250,1300,2100,50]})

# Create a team_map dictionary
team_map = {'One':'Red','Two':'Blue'}

# Use the team_map to create a new column team_color based on an existing coloumn Team
df['Team Color'] = df['Team'].map(team_map)  
df

Unnamed: 0,Region,Team,Squad,Revenue,Cost,Team Color
0,North,One,A,7500,5200,Red
1,West,One,B,5500,5100,Red
2,East,One,C,2750,4400,Red
3,South,One,D,6400,5300,Red
4,Noth,Two,E,2300,1250,Blue
5,West,Two,F,3750,1300,Blue
6,East,Two,G,1900,2100,Blue
7,South,Two,H,575,50,Blue


## Section 3. Pivot Table: `pivot_table()` 

In [93]:
data = {'A':['Index1A','Index1A','Index1A','Index1B','Index1B','Index1B'],
        'B':['Index2C','Index2C','Index2D','Index2D','Index2C','Index2C'],
        'C':['x','y','x','y','x','y'],
        'D':[1,3,2,5,4,1]}
df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C,D
0,Index1A,Index2C,x,1
1,Index1A,Index2C,y,3
2,Index1A,Index2D,x,2
3,Index1B,Index2D,y,5
4,Index1B,Index2C,x,4
5,Index1B,Index2C,y,1


#### df.pivot_table()

In [94]:
df.pivot_table(values='D',index=['A', 'B'],columns=['C'])

Unnamed: 0_level_0,C,x,y
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
Index1A,Index2C,1.0,3.0
Index1A,Index2D,2.0,
Index1B,Index2C,4.0,1.0
Index1B,Index2D,,5.0


In [95]:
df.pivot_table(values='D',index=['B', 'A'],columns=['C'])

Unnamed: 0_level_0,C,x,y
B,A,Unnamed: 2_level_1,Unnamed: 3_level_1
Index2C,Index1A,1.0,3.0
Index2C,Index1B,4.0,1.0
Index2D,Index1A,2.0,
Index2D,Index1B,,5.0


In [96]:
df.pivot_table(values='D',index=['C', 'A'],columns=['B'])

Unnamed: 0_level_0,B,Index2C,Index2D
C,A,Unnamed: 2_level_1,Unnamed: 3_level_1
x,Index1A,1.0,2.0
x,Index1B,4.0,
y,Index1A,3.0,
y,Index1B,1.0,5.0


## Section 4. Contingency Table: `crosstab()` 

A contingency table is a type of table that summarizes the relationship between two categorical variables.

To create a contingency table in Python, we can use the pandas.crosstab() function, which uses the following sytax:

`pandas.crosstab(index, columns)`

- index: name of variable to display in the rows of the contingency table
- columns: name of variable to display in the columns of the contingency table

In [97]:
import pandas as pd

#create data
df = pd.DataFrame({'Order': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
                            11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
                   'Product': ['TV', 'TV', 'Comp', 'TV', 'TV', 'Comp',
                               'Comp', 'Comp', 'TV', 'Radio', 'TV', 'Radio', 'Radio',
                               'Radio', 'Comp', 'Comp', 'TV', 'TV', 'Radio', 'TV'],
                   'Country': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
                               'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']})

#view data
df.head()

Unnamed: 0,Order,Product,Country
0,1,TV,A
1,2,TV,A
2,3,Comp,A
3,4,TV,A
4,5,TV,B


In [98]:
#create contingency table
pd.crosstab(index=df['Country'], columns=df['Product'], dropna=False, margins=False)

Product,Comp,Radio,TV
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,1,0,3
B,3,2,3
C,2,3,3


In [99]:
#add margins to contingency table
pd.crosstab(index=df['Country'], columns=df['Product'], margins=True)


Product,Comp,Radio,TV,All
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,1,0,3,4
B,3,2,3,8
C,2,3,3,8
All,6,5,9,20


#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney    

Copyright ©2023 Mei Najim. All rights reserved. 