**`lambda`** functions are handy to use in cases where you need a one-off function, one that you are not likely to need again, like a particular conversion for one type of data. 

Below is a list of July temperatures represented in Fahrenheit. 

In [None]:
july_temps = [92, 84, 87, 90, 80, 80, 76, 85]

We can create a Python function using the typical **`def()`** syntax to convert each temperature in this list to Celsius...

In [58]:
# converting fahrenheit temperatures to celsius using a regular, named function
def f_to_c(temps_list):
    """takes in a list of temperatures represented in degrees Fahrenheit and converts each to Celsius"""
    # use a for loop to iterate over a list of temperatures
    for temp in temps_list:
    # run the conversion formula on each temp in the list and round the new celsius temp number
        celsius_temp = round((temp - 32) * 5/9)
        # print the converted list
        print(celsius_temp)

f_to_c(july_temps)

33
29
31
32
27
27
24
29


...or, we can convert and print these temperatures on a single line of code using **`lambda`** and the built-in function **`map`**. The **`map`** function takes 2 arguments: 

1) a function (the change you want to make to individual items in an iterable object)  
2) the iterable object 

We need to set a lambda function (the Fahrenheit-to-Celsius conversion formula) as the first argument, and the original list as the second argument. 

In [57]:
# converting the same list of temperatures to celsius on one line of code using 

print(list(map(lambda x: round((x - 32) * 5/9), july_temps)))

[33, 29, 31, 32, 27, 27, 24, 29]


**`lambda`** can also be used with the built-in function **`filter`** to iterate over lists. This works the same way as **`map`** (takes the same 2 arguments). 

Here, the lambda function filters out all items in the list where the temperature was less than 85 degrees Fahrenheit. 

In [60]:
# use lambda to search for all temperatures in the list that are greater than 85 degrees Fahrenheit and use filter to 
# remove all items that do not meet this criteria

print(list(filter(lambda x: x > 85, july_temps)))

[92, 87, 90]


In [None]:
# EXERCISE OPTION 1: For quick practice with lambda, take the below list of height measurements represented in inches 
# and write a lambda expression that converts the measurements to centimeters (1 inch = 2.54 cm).
# Then, pair lambda with the filter function to filter out some of the height measurements based on a given condition.

inches = [65, 73, 53, 48, 67, 70]

Lambda can also help to iterate over string data for sorting. 

Below, **`lambda`** is used to index a list of first and last names by the second name that appears in each string--the last name--rather than the first name, which would be the default key for **`sort()`**. 

In [40]:
hobbits = ["Samwise Gamgee", "Bilbo Baggins", "Pippin Took", "Frodo Baggins", "Merry Brandybuck"]
hobbits.sort(key = lambda name: name.split()[-1])
print(hobbits)

['Bilbo Baggins', 'Frodo Baggins', 'Merry Brandybuck', 'Samwise Gamgee', 'Pippin Took']


In [50]:
# EXERCISE OPTION 2: Using the built-in max and min functions paired with a lambda function to find the longest and 
# shortest names in the hobbits list above. 

# NOTE: This will calculate based on alphabetical values assigned to each letter, NOT numeric length of each string
# You will need to set the lambda function as the key for the max/min function.
# You will also need to set all letters to lowercase to iterate over all elements. 
# See page 186 of the Deitel texbook for help or my separate .ipynb file in this git repo for the answer key. 

In a pandas dataframe, **`lambda`** can be called using **`df.apply()`** to apply a function to each value in a given column.

Below is the practice dataframe we worked through in class last week. 

In [15]:
import numpy as np
import pandas as pd

# use a dictionary to represent "columns"/ series of data as the keys
# and row data contained in a list-like structure as the values

recalls = {
            'tot_recalls':[34,67,89,120,56],
            'severe_recalls':[13,40,67,None,40],
            #'model':['focus', 'ranger', 'f-150', None, None]
          }
year_index = [1999, 2000, 2001, 2002, 2003]
# send our data into the pandas dataframe constructor and get a simple table
recall_df = pd.DataFrame(data=recalls, index=year_index)

print(recall_df.head())

      tot_recalls  severe_recalls
1999           34            13.0
2000           67            40.0
2001           89            67.0
2002          120             NaN
2003           56            40.0


If for some reason we wanted to apply the same formula to each value in one or more columns of that dataframe, we could write a lambda function paired with .apply() to do so. Say you received new information that the old recall data was incorrect and of the total recall values were actually doubled (not likely, but this will help to clearly demo how lambda can work on a dataframe)...

In [19]:
recall_df['new_recalls'] = recall_df.apply(lambda x: x['tot_recalls'] * 2, axis=1)

print(recall_df.head())

      tot_recalls  severe_recalls  new_recalls
1999           34            13.0         68.0
2000           67            40.0        134.0
2001           89            67.0        178.0
2002          120             NaN        240.0
2003           56            40.0        112.0


The above line of code adds a new column to the dataframe, where all of the new columns values = the total recall recorded in that row x 2. 

Below is a more useful example of applying a lambda function to a dataframe. 

In [2]:
import pandas as pd
from matplotlib import pyplot as plt


def get_df():
    """reads in a csv file from data.wprdc.org, cleans it up, and stores it to a 
    pandas dataframe"""
    election_data = pd.read_csv(
    "https://data.wprdc.org/datastore/dump/988b8b2a-4fce-45bc-aba6-438ca78e92f1", 
    index_col = False)
    # read in the csv via URL to get the most current version for analysis 
    # (source: p. 346 of Deitel textbook)
    election_df = (election_data[['contest_name', 'choice_name', 'party_name',
                                    'total_votes', 'percent_of_votes', 'registered_voters']])
    # create dataframe with just the six columns needed for this analysis
    
    return election_df

In [3]:
df = get_df()
print(df.head())

                         contest_name                           choice_name  \
0  Presidential Electors (Vote For 1)  DEM Joseph R. Biden/Kamala D. Harris   
1  Presidential Electors (Vote For 1)     REP Donald J. Trump/Mike R. Pence   
2  Presidential Electors (Vote For 1)   LIB Jo Jorgensen/Jeremy Spike Cohen   
3  Presidential Electors (Vote For 1)                              Write-in   
4       Attorney General (Vote For 1)                      DEM Josh Shapiro   

  party_name  total_votes  percent_of_votes  registered_voters  
0        DEM       430759             59.43             942851  
1        REP       282913             39.03             942851  
2        LIB         8361              1.15             942851  
3        NON         2767              0.38             942851  
4        DEM       443166             62.18             942851  


The dataframe above (imported from a dataset available via the Western PA Regional Data Center) include the total votes for each candidate in the 2020 general election, the percentage of votes each received, and the total registered voters for each geographic voting area. 

In [61]:
# EXERCISE OPTION 3: Using the .apply() function on a pandas dataframe, add a new column to the dataframe that 
# calculates the total votes for each 2020 presidential candidate as a percentage of all registered voters in 
# Allegheny County (total_votes / 942851)

# Rerun print(df.head())) to make sure it worked! 