# Lesson 3C: Data Manipulation

## 🎯Learning Objectives:
1. Create new calculated columns for data analysis.
2. Encode categorical data into numerical data using label encoding.
3. Convert dates into datetime format and extract data into new columns.


## ▶️ Getting Started
We will get started by loading pandas and the restaurant_transaction_clean.csv file.

In [3]:
import pandas as pd

# Load the dataset
file_path = "Cleaned_Restaurant_Transactions_Dataset.csv"
df = pd.read_csv(file_path)

# Display basic info
df.info()

# Display the first few rows
df.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 989 entries, 0 to 988
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Customer_ID     989 non-null    int64  
 1   Food_Item       989 non-null    object 
 2   Category        989 non-null    object 
 3   Date_of_Visit   989 non-null    object 
 4   Time            989 non-null    object 
 5   Weather         989 non-null    object 
 6   Price           989 non-null    float64
 7   Weekend         989 non-null    object 
 8   Public_Holiday  989 non-null    object 
dtypes: float64(1), int64(1), object(7)
memory usage: 69.7+ KB


Unnamed: 0,Customer_ID,Food_Item,Category,Date_of_Visit,Time,Weather,Price,Weekend,Public_Holiday
0,1075,Smoothie,Cold,24/03/2023,12:30,Sunny,14.88,No,No
1,1030,Soup,Hot,19/03/2023,14:30,Raining,6.7176,Yes,No
2,1055,Ice Cream,Cold,24/03/2023,10:30,Sunny,14.27,No,No
3,1058,Ice Cream,Cold,05/03/2023,22:00,Sunny,14.688,Yes,No
4,1084,Smoothie,Cold,29/03/2023,16:30,Sunny,8.844,No,Yes


## 1️⃣ Creating a new calculated column
When preparing data, sometimes you might need to create new custom calculated columns. Pandas make it really easy to create new columns with df['COLUMN NAME'].

In the example below, we will create a new column called Discounted Price by applying a 10% discount to all prices.

In [8]:
# Apply 10% discount and create a new column 'Final_Price'
df['Discounted_Price'] = (df['Price'] * 0.90).round(2)

# Display the updated DataFrame
df.head()


Unnamed: 0,Customer_ID,Food_Item,Category,Date_of_Visit,Time,Weather,Price,Weekend,Public_Holiday,Year,Month,Day,Day_of_Week,Discounted_Price
0,1075,Smoothie,Cold,2023-03-24,12:30,Sunny,14.88,No,No,2023,3,24,Friday,13.39
1,1030,Soup,Hot,2023-03-19,14:30,Raining,6.7176,Yes,No,2023,3,19,Sunday,6.05
2,1055,Ice Cream,Cold,2023-03-24,10:30,Sunny,14.27,No,No,2023,3,24,Friday,12.84
3,1058,Ice Cream,Cold,2023-03-05,22:00,Sunny,14.688,Yes,No,2023,3,5,Sunday,13.22
4,1084,Smoothie,Cold,2023-03-29,16:30,Sunny,8.844,No,Yes,2023,3,29,Wednesday,7.96


## 2️⃣: Encoding Categorical Data into Numerical Data  

Sometimes, we might turn categorical data into numerical data through label encoding. This helps to prepare data, particularly for machine learning purposes.

In the example below, we are converting 'Cold' into 0 and 'Hot' into 1.  


In [9]:
# Define your custom mapping for categories
category_mapping = {
    'Cold': 0,
    'Hot': 1
}

# Apply the mapping
df['Category_Encoded'] = df['Category'].map(category_mapping)

# Display the result
df.head()


Unnamed: 0,Customer_ID,Food_Item,Category,Date_of_Visit,Time,Weather,Price,Weekend,Public_Holiday,Year,Month,Day,Day_of_Week,Discounted_Price,Category_Encoded
0,1075,Smoothie,Cold,2023-03-24,12:30,Sunny,14.88,No,No,2023,3,24,Friday,13.39,0
1,1030,Soup,Hot,2023-03-19,14:30,Raining,6.7176,Yes,No,2023,3,19,Sunday,6.05,1
2,1055,Ice Cream,Cold,2023-03-24,10:30,Sunny,14.27,No,No,2023,3,24,Friday,12.84,0
3,1058,Ice Cream,Cold,2023-03-05,22:00,Sunny,14.688,Yes,No,2023,3,5,Sunday,13.22,0
4,1084,Smoothie,Cold,2023-03-29,16:30,Sunny,8.844,No,Yes,2023,3,29,Wednesday,7.96,0


## 3️⃣Extract Features from Date Column  

Dates hold **valuable insights**! Instead of using raw dates, we will **extract features** such as:  
✔ **Year** – Tracks long-term trends  
✔ **Month** – Identifies seasonal patterns  
✔ **Day of the Week** – Captures weekday vs. weekend trends  


In [6]:
import pandas as pd

# Ensure 'Date_of_Visit' is in datetime format
df['Date_of_Visit'] = pd.to_datetime(df['Date_of_Visit'])

# Extract useful date-based features
df['Year'] = df['Date_of_Visit'].dt.year
df['Month'] = df['Date_of_Visit'].dt.month
df['Day'] = df['Date_of_Visit'].dt.day
df['Day_of_Week'] = df['Date_of_Visit'].dt.day_name()

# Display the transformed dataset
df[['Date_of_Visit', 'Year', 'Month', 'Day', 'Day_of_Week']].head()


Unnamed: 0,Date_of_Visit,Year,Month,Day,Day_of_Week
0,2023-03-24,2023,3,24,Friday
1,2023-03-19,2023,3,19,Sunday
2,2023-03-24,2023,3,24,Friday
3,2023-03-05,2023,3,5,Sunday
4,2023-03-29,2023,3,29,Wednesday


## 4️⃣ Final Check & Save Processed Data  

Let’s review our transformed dataset and save it for future use! 🚀  


In [10]:
# Display the first few rows of the transformed dataset
df.head()

# Save the processed dataset
df.to_csv("restaurant_transaction_processed_01.csv", index=False)


## Summary  

In this guided practice, we:   
✔ **Encoded categorical variables** with One-Hot & Label Encoding

✔ **Created New Column** with discounted price

✔ **Extracted meaningful features** from the date column  

