## Apply and Lambda Transformation

In this notebook we will learn to perform the column data operation through implementation of ```apply()``` and ```lambda``` functionality.

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

####  Load data

In [3]:
titanic = pd.read_csv('data/titanic.csv')
df1 = titanic.set_index('Name')
df1.head(2)

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C


#### 1. Implementation of ```Apply ()``` with ```lambda()``` function

- Apply ```lambda``` functionality to ```age``` column.

In [11]:
df1['remaining-age'] = df1['Age'].apply(lambda x: 100-x).head(5)
df1.head(5)

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,remaining-age
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S,78.0
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C,62.0
"Heikkinen, Miss. Laina",3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S,74.0
"Futrelle, Mrs. Jacques Heath (Lily May Peel)",4,1,1,female,35.0,1,0,113803,53.1,C123,S,65.0
"Allen, Mr. William Henry",5,0,3,male,35.0,0,0,373450,8.05,,S,65.0


- Apply ```lambda``` functionality to ```Fare``` column to transform it to new value.

In [13]:
df1['Fare'].apply(lambda x: (10*x**2 + 2*x +4)/10).head(5)

Name
Braund, Mr. Owen Harris                                  54.412500
Cumings, Mrs. John Bradley (Florence Briggs Thayer)    5095.965519
Heikkinen, Miss. Laina                                   64.790625
Futrelle, Mrs. Jacques Heath (Lily May Peel)           2830.630000
Allen, Mr. William Henry                                 66.812500
Name: Fare, dtype: float64

- Let us write a new function to supply inside the ```apply()``` function.

In [14]:
def newfeature(x):
    return 10 + x/3 + x**2

In [15]:
df1['Fare'].apply(newfeature).head(4)

Name
Braund, Mr. Owen Harris                                  64.979167
Cumings, Mrs. John Bradley (Florence Briggs Thayer)    5115.069959
Heikkinen, Miss. Laina                                   75.447292
Futrelle, Mrs. Jacques Heath (Lily May Peel)           2847.310000
Name: Fare, dtype: float64

#### 2. Column Operation with Lambda function

- Lets create a new random dataframe to play around.

In [20]:
dates = pd.date_range('1/1/2000', periods=100)
df = pd.DataFrame(np.random.randn(100, 4),
                  index=dates, columns=['A', 'B', 'C', 'D'])
df.head()

Unnamed: 0,A,B,C,D
2000-01-01,-0.656738,-0.461095,-0.259647,0.890244
2000-01-02,0.652611,0.906148,-0.527606,-0.106089
2000-01-03,-0.067463,1.407429,1.414694,-1.266369
2000-01-04,0.301058,0.624163,-0.14419,-1.17769
2000-01-05,1.557796,-1.497422,0.545636,-1.006319


- We can directly add, multiply, substract etc among columns if they have same data types.

In [21]:
df['E'] = (df['A'] + df['B'])/df['C']
df.head()

Unnamed: 0,A,B,C,D,E
2000-01-01,-0.656738,-0.461095,-0.259647,0.890244,4.305207
2000-01-02,0.652611,0.906148,-0.527606,-0.106089,-2.954399
2000-01-03,-0.067463,1.407429,1.414694,-1.266369,0.947177
2000-01-04,0.301058,0.624163,-0.14419,-1.17769,-6.41666
2000-01-05,1.557796,-1.497422,0.545636,-1.006319,0.110649


- One can use ```lambda``` functions to transform the columns before the column operation.

In [27]:
df['F'] = df['A'].apply(lambda x : 10+x) + df['E'].apply(lambda x: x+20 if x>0 else x)
df.head()

Unnamed: 0,A,B,C,D,E,F
2000-01-01,-0.656738,-0.461095,-0.259647,0.890244,4.305207,33.648469
2000-01-02,0.652611,0.906148,-0.527606,-0.106089,-2.954399,7.698212
2000-01-03,-0.067463,1.407429,1.414694,-1.266369,0.947177,30.879714
2000-01-04,0.301058,0.624163,-0.14419,-1.17769,-6.41666,3.884397
2000-01-05,1.557796,-1.497422,0.545636,-1.006319,0.110649,31.668445


### References:
1. [Pydata document for Pandas](https://pandas.pydata.org/docs/user_guide/index.html#user-guide)