This repository contains a comprehensive data analysis and machine learning project built around airline passenger data.
The notebook combines Python and SQL techniques to explore, clean, analyze, and model customer satisfaction with the airline service.
The aim of this project is to:
- Analyze airline customer satisfaction and behavior.
- Perform Exploratory Data Analysis (EDA) to identify trends.
- Preprocess and clean raw data for better model performance.
- Build and evaluate predictive machine learning models.
- Integrate SQL queries for data handling and reporting.
- Visualize insights with clear and interactive charts.
| File/Folder | Description |
|---|---|
Airline Data ML and Data Analysis using Python,Sql.ipynb |
Main Jupyter notebook with analysis and ML models. |
README.md |
This file β full project description. |
requirements.txt (optional) |
Python dependencies for easy setup. |
- Source: Airline passenger satisfaction dataset (Kaggle or company-provided).
- Key features include:
Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,
Inflight wifi service,Seat comfort,Inflight entertainment,On-board service,
Leg room service,Baggage handling, etc. - Target variable:
Satisfaction(Satisfied / Neutral or Dissatisfied).
- Programming Languages: Python, SQL
- Python Libraries:
- Data analysis:
pandas,numpy - Visualization:
matplotlib,seaborn - Machine Learning:
scikit-learn - Web scraping/ETL (if included):
BeautifulSoup,scrapy
- Data analysis:
- SQL Integration:
sqlite3/MySQLconnector (for queries within the notebook) - Jupyter Notebook for interactive analysis.
-
Data Loading & Cleaning
- Import CSV or database table.
- Handle missing values, outliers, and data types.
-
Exploratory Data Analysis (EDA)
- Univariate & bivariate analysis.
- Count plots, pie charts, and correlation heatmaps.
- Grouping & aggregation using SQL and pandas.
-
Feature Engineering
- Encoding categorical variables.
- Scaling numerical features.
- Creating flight distance and age groups.
-
Model Building
- Train-test split.
- Model selection (e.g., RandomForestClassifier).
- Evaluation metrics (accuracy, classification report, confusion matrix).
-
Visualization
- Professional charts with seaborn/matplotlib.
- Clear labeling and annotations.
-
SQL Queries
- Example queries for insights.
- Integration of SQL outputs into Python workflows.
- Achieved high accuracy predictive model for customer satisfaction.
- Created dashboards and charts showing key service drivers.
- Automated repetitive analysis tasks with Python & SQL integration.