<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:200%;
           font-family:Arial;letter-spacing:0.5px">

<p width = 20%, style="padding: 10px;
              color:white;">
Lambda Functions and Pandas Transformations
              
</p>
</div>

Data Science Cohort Live NYC Feb 2022
<p>Phase 1: Topic 5</p>
<br>
<br>

<div align = "right">
<img src="Images/flatiron-school-logo.png" align = "right" width="200"/>
</div>
    
    

#### Lambda Functions

- Lambda functions: simple way to write small, single use functions.
- Often used as argument in other functions:
    - E.g.,  `.map()` or `.apply()` method in pandas series/DataFrame

Let's see lambda functions aiding us in a sort operation:

#### Lambda functions within the `sort()` function
Sort this list on the last name.


In [1]:
# Without a key
names = ['Miriam Marks','Sidney Baird','Elaine Barrera','Eddie Reeves','Marley Beard',
         'Jaiden Liu','Bethany Martin','Stephen Rios','Audrey Mayer','Kameron Davidson',
         'Carter Wong','Teagan Bennett']
sorted(names)

['Audrey Mayer',
 'Bethany Martin',
 'Carter Wong',
 'Eddie Reeves',
 'Elaine Barrera',
 'Jaiden Liu',
 'Kameron Davidson',
 'Marley Beard',
 'Miriam Marks',
 'Sidney Baird',
 'Stephen Rios',
 'Teagan Bennett']

Hmmm...its sorting on the first character.
- Lambda function as argument: return last name as sorting key

In [2]:
# Sorting by last name
names = ['Miriam Marks','Sidney Baird','Elaine Barrera','Eddie Reeves','Marley Beard',
         'Jaiden Liu','Bethany Martin','Stephen Rios','Audrey Mayer','Kameron Davidson',
'Teagan Bennett']
sorted(names, key=lambda x: x.split()[1])


['Sidney Baird',
 'Elaine Barrera',
 'Marley Beard',
 'Teagan Bennett',
 'Kameron Davidson',
 'Jaiden Liu',
 'Miriam Marks',
 'Bethany Martin',
 'Audrey Mayer',
 'Eddie Reeves',
 'Stephen Rios']

In [3]:
f = lambda x: x**2

In [4]:
f(5)

25

#### Lambda functions with pandas `.map()`
Let's take a look at using lambda expressions on a Yelp ratings dataset.

In [5]:
import pandas as pd
df = pd.read_csv('Data/Yelp_Reviews.csv', index_col=0).reset_index()
df.head(5)

FileNotFoundError: [Errno 2] No such file or directory: 'Data/Yelp_Reviews.csv'

Simple example: naively select the year from the date string rather than convert it to a datetime object.

In [6]:
df['date'].map(lambda x: x[:4]).head()

NameError: name 'df' is not defined

More realistic example:
- Get list of the length of each word in a given review.

In [7]:
df['text'][0]

NameError: name 'df' is not defined

In [8]:
df['text'].map(lambda text: [len(word) for word in text.split()]).head()

NameError: name 'df' is not defined

Variable name you use as parameter in `lambda` expression does not matter:

In [9]:
df['text'].map(lambda banana: [len(word) for word in banana.split()]).head()

NameError: name 'df' is not defined

#### Lambda functions with conditionals
Lambda functions can also accept some conditionals if chained in a list comprehension

In [10]:
df['text'].map(lambda x: 'Good' if any([word in x.lower() for word in ['awesome', 'love', 'good', 'great']]) else 'Bad').head()

NameError: name 'df' is not defined

##### Note
This is ugly, un-Pythonic and not in line with [PEP 8](https://www.python.org/dev/peps/pep-0008/).
- Guidline for max characters in a line: 72 
- Above: 127 characters. 

#### Lambda functions with pandas `.apply()`

Let's go back to our trusty cereal dataset!

In [11]:
cereal_df = pd.read_csv('Data/cereal.csv', index_col = 'name').drop(columns = ['shelf'])
cereal_df.head(2)

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,1.0,1.0,33.983679


Now we want to apply a standardization transformation to the numeric columns of this dataframe:
- For each column subtract by its mean an divide by standard deviation: $$ \hat{x}_i^{col} = \frac{x_i^{col} - \mu^{col} }{s^{col}} $$

- `lambda` expression takes in a column (Series) in the Dataframe
- `.apply()`: applies to each column in DataFrame.


In [15]:
scaled_df = cereal_df.loc[:, 'calories':'rating'].apply(lambda col: (col - col.mean())/col.std(ddof = 1), axis = 0)
scaled_df.describe()

Unnamed: 0,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,weight,cups,rating
count,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0
mean,-6.091808e-17,4.8301910000000005e-17,1.090398e-16,3.3162510000000003e-17,1.348128e-16,2.162772e-18,-4.397637e-17,-9.227828000000001e-17,-1.730218e-17,-2.011378e-16,1.492313e-16,-8.651089000000001e-18
std,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
min,-2.919461,-1.411645,-1.006473,-1.904699,-0.9029037,-3.645142,-1.782291,-1.361794,-1.26426,-3.519548,-2.4538,-1.752855
25%,-0.3532681,-0.4982277,-1.006473,-0.3539844,-0.4833286,-0.6070178,-0.88238,-0.7866521,-0.1453172,-0.1967771,-0.6490266,-0.6756899
50%,0.1599704,0.4151897,-0.01290349,0.2424445,-0.06375361,-0.1396141,0.01753073,-0.08526012,-0.1453172,-0.1967771,-0.3052601,-0.1612765
75%,0.1599704,0.4151897,0.9806656,0.6003018,0.3558214,0.5614915,0.9174414,0.3355751,-0.1453172,-0.1967771,0.76901,0.5810863
max,2.726163,3.155442,3.961373,1.912445,4.971147,1.963703,1.817352,3.281421,3.211511,3.125994,2.91755,3.633385


This is a very important kind of transformation. We'll see it later in greater detail.

#### When to use lambda functions

- Single line of code
- Single use function
- Relatively easy to read.































## When not to use lambda functions

- Several lines of code in lambda expression.
- Multiple conditions, loops, etc in function.
- Want to reuse this function often.

If it's hard for you to read, it's even harder for anyone else.
