# Session 8 Functions

### Learning about apply is fundamental in the data cleaning process. It also encapsulates key concepts in programming, mainly writing functions. apply takes a function and “applies” (i.e., runs it) across each row or column of a dataframe “simultaneously.” If you’ve programmed before, then the concept of an “apply” should be familiar. It is similar to writing a for loop across each row or column and calling the function—apply just does it simultaneously. In general, this is the preferred way to apply functions across dataframes, because it typically is much faster than writing a for loop in Python



## This lab will cover:

1. Functions in Python
2. Lambda functions
3. Applying functions on pandas dataframes

In [1]:
import pandas as pd

In [2]:
import numpy as np

# 1. Functions in python

In [47]:
# The following function takes x as an argument, squares it and returns the result

def my_sqare_function(x):
    squared= x**2
    
    return squared

In [48]:
my_sqare_function(3)

9

In [49]:
my_sqare_function(45)

2025

In [50]:
# The following function takes x,y as arguments, computes the average and returns the result

def my_average_function(x,y):
    avg= (x+y)/2
    
    return avg

In [51]:
my_average_function(4,6)

5.0

In [52]:
my_average_function(15,45)

30.0

In [53]:
def my_date_extractor(input_string):
    year=input_string[0:10]
    return year
    

In [54]:
date='2019-06-21 00:00:00+00:00'

In [55]:
my_date_extractor(date)

'2019-06-21'

# 2. Lambda functions in python
learn more about lambda functions: https://www.w3schools.com/python/python_lambda.asp

## Sometimes the function used in the apply method is simple enough that there is no need to create a separate function.

## Lambda functions are extremely useful to process data in pandas-based environments !!

## A lambda function is a small anonymous function.

## A lambda function can take any number of arguments, but can only have one expression.

In [56]:
lambda_add = lambda x, y: x + y

In [57]:
lambda_add(3,4)

7

In [58]:
lambda_add(5,7)

12

In [59]:
lambda_mean= lambda x,y:(x+y)/2

In [60]:
lambda_mean(0,10)

5.0

In [61]:
lambda_date_extractor = lambda date:date[0:10]

In [62]:
lambda_date_extractor('2019-06-21 00:00:00+00:00')

'2019-06-21'

# 2. Apply functions in pandas

## Now that we know how to write a function, how would we use them in Pandas? When working with dataframes, it’s more likely that you want to use a function across rows or columns of your data.

In [28]:
air_quality_no2 = pd.read_csv('https://www.dropbox.com/s/70230oct6p0ovnv/air_quality_no2_long.csv?dl=1',parse_dates=True)

In [29]:
air_quality_no2.head()

Unnamed: 0,city,country,date.utc,location,parameter,value,unit
0,Paris,FR,2019-06-21 00:00:00+00:00,FR04014,no2,20.0,µg/m³
1,Paris,FR,2019-06-20 23:00:00+00:00,FR04014,no2,21.8,µg/m³
2,Paris,FR,2019-06-20 22:00:00+00:00,FR04014,no2,26.5,µg/m³
3,Paris,FR,2019-06-20 21:00:00+00:00,FR04014,no2,24.9,µg/m³
4,Paris,FR,2019-06-20 20:00:00+00:00,FR04014,no2,21.4,µg/m³


## 2.1. Using Apply with your functions 

### apply is useful to process dataframes using your own functions

In [30]:
air_quality_no2['valuesquared']=air_quality_no2['value'].apply(my_sqare_function)

In [31]:
air_quality_no2.head()

Unnamed: 0,city,country,date.utc,location,parameter,value,unit,valuesquared
0,Paris,FR,2019-06-21 00:00:00+00:00,FR04014,no2,20.0,µg/m³,400.0
1,Paris,FR,2019-06-20 23:00:00+00:00,FR04014,no2,21.8,µg/m³,475.24
2,Paris,FR,2019-06-20 22:00:00+00:00,FR04014,no2,26.5,µg/m³,702.25
3,Paris,FR,2019-06-20 21:00:00+00:00,FR04014,no2,24.9,µg/m³,620.01
4,Paris,FR,2019-06-20 20:00:00+00:00,FR04014,no2,21.4,µg/m³,457.96


In [32]:
### for performance and code maintenability is advisable to use native functions whenever possible
air_quality_no2['value'].pow(2).head()

0    400.00
1    475.24
2    702.25
3    620.01
4    457.96
Name: value, dtype: float64

In [33]:
air_quality_no2['date']=air_quality_no2['date.utc'].apply(my_date_extractor)

In [34]:
air_quality_no2.head()

Unnamed: 0,city,country,date.utc,location,parameter,value,unit,valuesquared,date
0,Paris,FR,2019-06-21 00:00:00+00:00,FR04014,no2,20.0,µg/m³,400.0,2019-06-21
1,Paris,FR,2019-06-20 23:00:00+00:00,FR04014,no2,21.8,µg/m³,475.24,2019-06-20
2,Paris,FR,2019-06-20 22:00:00+00:00,FR04014,no2,26.5,µg/m³,702.25,2019-06-20
3,Paris,FR,2019-06-20 21:00:00+00:00,FR04014,no2,24.9,µg/m³,620.01,2019-06-20
4,Paris,FR,2019-06-20 20:00:00+00:00,FR04014,no2,21.4,µg/m³,457.96,2019-06-20


## 2.2. Using Apply with built-in functions

## Indeed it is possible to use apply with python built-in functions

In [35]:
air_quality_no2['value'].apply(np.sqrt).head()

0    4.472136
1    4.669047
2    5.147815
3    4.989990
4    4.626013
Name: value, dtype: float64

## 2.3. Using Apply with lambda functions

In [36]:
# Let's extract year, month and day from date.utc column 
air_quality_no2['date'].apply(lambda date:date[0:10]).head()

0    2019-06-21
1    2019-06-20
2    2019-06-20
3    2019-06-20
4    2019-06-20
Name: date, dtype: object

In [37]:
air_quality_no2['value'].apply(lambda value:value**2).head()

0    400.00
1    475.24
2    702.25
3    620.01
4    457.96
Name: value, dtype: float64

## 2.4 Applying functions to rows and columns

In [41]:
df = pd.DataFrame([[4, 9,8]] * 3, columns=['A', 'B','C'])

In [42]:
df

Unnamed: 0,A,B,C
0,4,9,8
1,4,9,8
2,4,9,8


In [43]:
#Using a numpy universal function (in this case the same as np.sqrt(df)):

df.apply(np.sqrt)

Unnamed: 0,A,B,C
0,2.0,3.0,2.828427
1,2.0,3.0,2.828427
2,2.0,3.0,2.828427


In [44]:
#Using a reducing function on either axis (rows)
df.apply(np.sum, axis=0)

A    12
B    27
C    24
dtype: int64

In [45]:
#Using a reducing function on either axis (columns)
df.apply(np.sum, axis=1)

0    21
1    21
2    21
dtype: int64

In [46]:
#Using a reducing function on either axis (selected columns)
df.apply(lambda row:row['A']+row['B'], axis=1)

0    13
1    13
2    13
dtype: int64