# ✈️ Flight Price Prediction — EDA & Feature Engineering

---

## 🧠 Introduction

In this section, we will perform **Exploratory Data Analysis (EDA)** and **Feature Engineering**  
on the **Flight Price Prediction dataset** available on **Kaggle**.  

The goal is to understand how various features such as **airline, source, destination, date, duration, and stops**  
influence the final **flight ticket price** — and to prepare clean, meaningful features for model training.

---

## 📂 Dataset Overview

The dataset contains flight details for multiple airlines operating on different routes.  
Typical features include:

| Feature | Description |
|----------|--------------|
| Airline | Name of the airline company |
| Date_of_Journey | The date on which the passenger traveled |
| Source | Starting city of the flight |
| Destination | Final city of the flight |
| Route | Route of the flight (stop-wise path) |
| Dep_Time | Departure time |
| Arrival_Time | Arrival time |
| Duration | Total flight duration |
| Total_Stops | Number of stops between source and destination |
| Price | Flight price (target variable) |

---

## 🎯 Objective

We aim to:
1. Explore the data and identify patterns, trends, and anomalies.  
2. Handle missing values, outliers, and duplicates.  
3. Extract useful information from date and time features (Feature Engineering).  
4. Encode categorical features like `Airline`, `Source`, and `Destination`.  
5. Prepare a clean dataset ready for **Machine Learning model training**.

Formally, we are trying to model:

$$
f(\text{Airline, Source, Destination, Duration, Stops, Time}) \rightarrow \text{Price}
$$

---

## 🔍 Key Analytical Questions

1. Which airlines are the most and least expensive?  
2. How does flight duration affect price?  
3. Does the number of stops significantly influence ticket cost?  
4. Which routes or cities have the highest flight fares?  
5. How do prices vary by time of day, weekday, and month?

---

## ⚙️ EDA & Feature Engineering Plan

1. **Data Loading and Cleaning**  
   - Handle missing values and duplicates.  
   - Check column data types.  

2. **Feature Engineering**  
   - Extract day, month, and weekday from `Date_of_Journey`.  
   - Extract hours and minutes from `Dep_Time` and `Arrival_Time`.  
   - Convert `Duration` into total minutes.  
   - Encode categorical variables using One-Hot or Label Encoding.

3. **Univariate & Bivariate Analysis**  
   - Analyze feature distributions using histograms and boxplots.  
   - Study correlations between features and `Price`.

4. **Outlier Detection & Transformation**  
   - Identify high-priced outliers and apply log transformation if needed.

---

## 📊 Mathematical Note: Correlation with Target

We will analyze the strength of the relationship between numerical features and price using Pearson correlation:

$$
r_{xy} = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}
{\sqrt{\sum (x_i - \bar{x})^2} \sqrt{\sum (y_i - \bar{y})^2}}
$$

Where:
- $x$ = Feature values (e.g., Duration)
- $y$ = Target variable (Price)

---

## 🌐 See More

For the original dataset and competition details, visit:  
🔗 [Flight Price Prediction — Kaggle Dataset](https://www.kaggle.com/datasets/shubhambathwal/flight-price-prediction)

---

✅ **Next Step:**  
Let’s start by loading the dataset, checking for missing values, and getting a basic overview using `.info()`, `.describe()`, and `.shape`.
