#Machine Learning Practice Week 1
Practice Assignment

# 📝Flight Price Prediction

## 📌 Notes:
- ✅ **Expected dataset shape**: `(9450, 13)`  
  > _If you're not getting this shape, your data may not have uploaded correctly to Colab._
- 🎯 **Target variable**: `'Price'`
- 🧪 **Use `random_state = 42`** wherever applicable for reproducibility.
- ⚠️ **Ignore all warnings** while executing the code.

---

## 📂 Dataset Metadata:

| Column Name       | Description                                                    |
|-------------------|----------------------------------------------------------------|
| **Airline**       | The name of the airline                                        |
| **Source**        | The source from which the service begins                       |
| **Destination**   | The destination where the service ends                         |
| **Route**         | Route the flight took                                          |
| **Dep_Time**      | Time when the journey starts from the source                   |
| **Arrival_Time**  | Time of arrival at the destination                             |
| **Duration**      | Total duration of the flight                                   |
| **Total_Stops**   | Total stops between source and destination                     |
| **Additional_Info** | Additional information about the flight                     |
| **Price**         | 💰 **Target variable** – the price of the ticket               |
| **Month**         | Month of the journey                                           |
| **WeekDay**       | Day on which the journey started                               |
| **Day**           | Date of the start of the journey                               |

---

🔗 **[Click here to view the dataset](https://drive.google.com/file/u/2/d/1zP38WCdZQ9StsAHTTLk72763QJQkhWrG/view?usp=drive_link)**  


In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("/content/Preprocessing1.csv")

In [3]:
df.shape

(9450, 13)

In [None]:
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,Month,WeekDay,Day
0,Jet Airways,Delhi,Banglore,DEL → BOM → COK,20:00,04:25 10 Jun,26h 35m,1 stop,In-flight meal not included,14924,6,Thursday,6.0
1,Jet Airways,Delhi,Cochin,DEL → BOM → COK,16:00,19:00 10 Jun,27h,1 stop,In-flight meal not included,10577,6,Sunday,9.0
2,Jet Airways,Mumbai,Hyderabad,BOM → HYD,19:35,21:05,1h 30m,non-stop,No info,5678,3,Friday,15.0
3,Multiple carriers,Delhi,Banglore,DEL → BOM → COK,18:55,01:30 16 Jun,15h 10m,1 stop,In-flight meal not included,7408,5,Monday,6.0
4,Air India,Delhi,Cochin,DEL → COK,17:10,17:55,8h 20m,non-stop,No info,6724,6,Monday,24.0


What is the average of the flight ticket price? Write your answer correct to two decimal places.

In [None]:
df["Price"].mean()

np.float64(9027.895555555555)

During which month did the highest number of flights occur? Months are represented by numerical codes, with January corresponding to 1, February to 2, and so forth.

In [None]:
df["Month"].value_counts()

Unnamed: 0_level_0,count
Month,Unnamed: 1_level_1
5,3092
6,3044
3,2388
4,926


Is the average price of flight tickets higher on weekends (Saturday and Sunday) or on weekdays (Remaining 5 days)?

In [None]:
weekend_price = df[df["WeekDay"].isin(["Saturday","Sunday"])]["Price"].mean()
weekday_price = df[df["WeekDay"].isin(["Monday","Tuesday","Wednesday","Thursday","Friday"])]["Price"].mean()
if weekday_price > weekend_price:
  print("Weekdays")
else:
  print("Weekends")

Weekends



Two of the entries in the 'Additional_Info' column are 'No info' and 'No Info'. Replace all occurrences of 'No Info' with 'No info'. How many flights fall under airline 'IndiGo' and have 'No info' as additional information?


In [None]:
df["Additional_Info"] = df["Additional_Info"].replace("No Info", "No info")
df[(df["Airline"]=="IndiGo") & (df["Additional_Info"]=="No info")].shape[0]

1650


Convert the values of 'Duration' into seconds. Enter the average duration (in seconds) of a flight. Enter your answer correct to two decimal places

In [None]:
def convert_to_seconds(duration):
    hours = 0
    minutes = 0
    parts = duration.split()
    for part in parts:
        if 'h' in part:
            hours = int(part[:-1])
        elif 'm' in part:
            minutes = int(part[:-1])
    return hours * 3600 + minutes * 60

df['Duration_seconds'] = df['Duration'].apply(convert_to_seconds)
average_duration = df['Duration_seconds'].mean()
average_duration

np.float64(38957.93650793651)

## 🕐 Time Column Transformation Instructions

### 🔧 Task:
Apply the following transformations to the columns **`Dep_Time`** and **`Arrival_Time`**:

1. **Extract Hour Component:**
   - Convert the time values to represent only the **hour** part.
   - Example:  
     - `'10:05'` → `10`  
     - `'22:45'` → `22`

2. **Create Time of Day Categories:**
   Based on the extracted hour, categorize the time into one of the following:

   | Time Range           | Category   |
   |----------------------|------------|
   | `5 <= hour < 12`     | Morning ☀️ |
   | `12 <= hour < 17`    | Afternoon 🌤️ |
   | `17 <= hour < 20`    | Evening 🌆 |
   | `20 <= hour < 5`     | Night 🌙 |

---

📌 **Note:**  
Make sure these transformations are done **directly within the dataset**, and use this **modified dataset** for all the upcoming tasks.


In [None]:
def category(time_str):
    try:
        hour = int(str(time_str).split(':')[0])

        if 5 <= hour < 12:
            return "Morning"
        elif 12 <= hour < 17:
            return "Afternoon"
        elif 17 <= hour < 20:
            return "Evening"
        else:
            return "Night"
    except:
        return "Unknown"


How many flights started in the Morning and arrived the destination at Evening?


In [None]:
df.columns = df.columns.str.strip()
count = df[
    (df["Dep_Time"].apply(category) == "Morning") &
    (df["Arrival_Time"].apply(category) == "Evening")
].shape[0]

count

922

Encode the values of column 'WeekDay' as follows:
Weekends (Sunday, Saturday) = 1
all remaining five days = 0
What is the most frequent (mode) WeekDay?

In [None]:
df['WeekDay_Encoded'] = df['WeekDay'].apply(lambda x: 1 if x in ['Saturday', 'Sunday'] else 0)


In [None]:
most_frequent_day = df['WeekDay'].mode()[0]

print("Most frequent WeekDay:", most_frequent_day)


Most frequent WeekDay: Wednesday
