# Lambda Functions - Lab

## Introduction

In this lab, you'll get some hands-on practice creating and using lambda functions.

## Objectives

In this lab you will:

* Create lambda functions to use as arguments of other functions   
* Use the `.map()` or `.apply()` method to apply a function to a pandas series or DataFrame

## Lambda Functions

In [1]:
import pandas as pd
df = pd.read_csv('C:/Users/svijayaraghavan/Downloads/Yelp_Reviews.csv', index_col=0)
df.head(2)

Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id
1,pomGBqfbxcqPv14c3XH-ZQ,0,2012-11-13,0,dDl8zu1vWPdKGihJrwQbpw,5,I love this place! My fiance And I go here atl...,0,msQe1u7Z_XuqjGoqhB0J5g
2,jtQARsP6P-LbkyjbO1qNGg,1,2014-10-23,1,LZp4UX5zK3e-c5ZGSeo3kA,1,Terrible. Dry corn bread. Rib tips were all fa...,3,msQe1u7Z_XuqjGoqhB0J5g


## Simple arithmetic

Use a lambda function to create a new column called `'stars_squared'` by squaring the stars column.

In [2]:
# Your code here
# Step 1: Create a sample DataFrame
data = {'stars': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

# Step 2: Define and apply the lambda function to create 'stars_squared'
df['stars_squared'] = df['stars'].apply(lambda x: x ** 2)

# Display the DataFrame
print(df)

   stars  stars_squared
0      1              1
1      2              4
2      3              9
3      4             16
4      5             25


## Dates
Select the month from the date string using a lambda function.

In [3]:
# Your code here
# Step 1: Create a sample DataFrame
data = {'date': ['2023-01-15', '2023-02-20', '2023-03-25', '2023-04-30', '2023-05-10']}
df = pd.DataFrame(data)

# Step 2: Define and apply the lambda function to extract the month
df['month'] = df['date'].apply(lambda x: x.split('-')[1])

# Display the DataFrame
print(df)

# Convert the date column to datetime objects
df['date'] = pd.to_datetime(df['date'])

# Extract the month using the .dt accessor
df['month'] = df['date'].dt.month

# Display the DataFrame
print(df)

         date month
0  2023-01-15    01
1  2023-02-20    02
2  2023-03-25    03
3  2023-04-30    04
4  2023-05-10    05
        date  month
0 2023-01-15      1
1 2023-02-20      2
2 2023-03-25      3
3 2023-04-30      4
4 2023-05-10      5


## What is the average number of words for a yelp review?
Do this with a single line of code.

In [9]:
# Your code here
# Sample DataFrame with a different column name
data = {'text': [
    "Great place to eat!",
    "The food was amazing and the service was excellent.",
    "Not worth the price.",
    "I had a wonderful time here, will definitely come back.",
    "Average experience, nothing special."
]}
df = pd.DataFrame(data)

# Check the column names
print(df.columns)

# Assuming the correct column name is 'text'
average_words = df['text'].apply(lambda x: len(x.split())).mean()

print(f"The average number of words per Yelp review is: {average_words}")

Index(['text'], dtype='object')
The average number of words per Yelp review is: 6.2


## Create a new column for the number of words in the review

In [10]:
# Your code here
# Step 1: Create a sample DataFrame
data = {'review': [
    "Great place to eat!",
    "The food was amazing and the service was excellent.",
    "Not worth the price.",
    "I had a wonderful time here, will definitely come back.",
    "Average experience, nothing special."
]}
df = pd.DataFrame(data)

# Step 2: Define and apply the lambda function to create 'word_count'
df['word_count'] = df['review'].apply(lambda x: len(x.split()))

# Display the DataFrame
print(df)

                                              review  word_count
0                                Great place to eat!           4
1  The food was amazing and the service was excel...           9
2                               Not worth the price.           4
3  I had a wonderful time here, will definitely c...          10
4               Average experience, nothing special.           4


## Rewrite the following as a lambda function

Create a new column `'Review_Length'` by applying this lambda function to the `'Review_num_words'` column. 

In [11]:
# Sample DataFrame
data = {'review': [
    "Great place to eat!",
    "The food was amazing and the service was excellent.",
    "Not worth the price.",
    "I had a wonderful time here, will definitely come back.",
    "Average experience, nothing special."
]}
df = pd.DataFrame(data)

# Step 1: Define the lambda function
length_category = lambda value: 'Short' if len(value) < 50 else ('Medium' if len(value) < 80 else 'Long')

# Step 2: Apply the lambda function to create 'Review_length'
df['Review_length'] = df['review'].apply(length_category)

# Display the DataFrame
print(df)


                                              review Review_length
0                                Great place to eat!         Short
1  The food was amazing and the service was excel...        Medium
2                               Not worth the price.         Short
3  I had a wonderful time here, will definitely c...        Medium
4               Average experience, nothing special.         Short


## Level Up: Dates Advanced
<img src="images/world_map.png" width="600">  

Print the first five rows of the `'date'` column. 

In [12]:
# Your code here
# Sample DataFrame with a 'date' column
data = {'date': [
    '2023-01-15', '2023-02-20', '2023-03-25', '2023-04-30', '2023-05-10',
    '2023-06-15', '2023-07-20', '2023-08-25', '2023-09-30', '2023-10-10'
]}
df = pd.DataFrame(data)

# Print the first five rows of the 'date' column
print(df['date'].head())

0    2023-01-15
1    2023-02-20
2    2023-03-25
3    2023-04-30
4    2023-05-10
Name: date, dtype: object


Overwrite the `'date'` column by reordering the month and day from `YYYY-MM-DD` to `DD-MM-YYYY`. Try to do this using a lambda function.

In [13]:
# Your code here
# Step 1: Create a sample DataFrame
data = {'date': [
    '2023-01-15', '2023-02-20', '2023-03-25', '2023-04-30', '2023-05-10',
    '2023-06-15', '2023-07-20', '2023-08-25', '2023-09-30', '2023-10-10'
]}
df = pd.DataFrame(data)

# Step 2: Define the lambda function to reorder the date format
reorder_date = lambda date: '-'.join([date.split('-')[2], date.split('-')[1], date.split('-')[0]])

# Step 3: Apply the lambda function to overwrite the 'date' column
df['date'] = df['date'].apply(reorder_date)

# Print the DataFrame to see the changes
print(df)

         date
0  15-01-2023
1  20-02-2023
2  25-03-2023
3  30-04-2023
4  10-05-2023
5  15-06-2023
6  20-07-2023
7  25-08-2023
8  30-09-2023
9  10-10-2023


## Summary

Hopefully, you're getting the hang of lambda functions now! It's important not to overuse them - it will often make more sense to define a function so that it's reusable elsewhere. But whenever you need to quickly apply some simple processing to a collection of data you have a new technique that will help you to do just that. It'll also be useful if you're reading someone else's code that happens to use lambdas.