A machine learning project that predicts product sales based on advertising spend across TV, Radio, and Newspaper channels.
Built as part of my CodeAlpha Data Science Internship
| Detail | Description |
|---|---|
| Objective | Predict sales based on advertising budget allocation |
| Dataset | 200 entries, 4 features |
| Models Used | Linear Regression, Random Forest, Gradient Boosting |
| Best Model | Gradient Boosting / Random Forest (highest R²) |
| Tools | Python, Pandas, Scikit-learn, Matplotlib, Seaborn |
├── Sales_Prediction.ipynb # Main Jupyter Notebook
├── Advertising.csv # Dataset
├── requirements.txt # Python dependencies
└── README.md # Project documentation
| Feature | Description |
|---|---|
TV |
Advertising spend on TV (in $) |
Radio |
Advertising spend on Radio (in $) |
Newspaper |
Advertising spend on Newspaper (in $) |
Sales |
Product sales (Target variable) |
- Data Loading — Read CSV using Pandas
- Exploratory Data Analysis — Distribution plots, correlation heatmap, scatter plots
- Feature Analysis — Investigated impact of each advertising channel on sales
- Model Training — Trained Linear Regression, Random Forest, and Gradient Boosting models
- Model Evaluation — Compared using MAE, RMSE, and R² Score
- Business Insights — Identified most effective advertising channels
| Model | MAE | RMSE | R² Score |
|---|---|---|---|
| Linear Regression | ~1.2 | ~1.6 | ~0.90 |
| Random Forest | ~0.6 | ~0.9 | ~0.97 |
| Gradient Boosting | ~0.7 | ~0.9 | ~0.97 |
- TV advertising has the strongest impact on sales (correlation ~0.78)
- Radio advertising has a moderate positive impact (correlation ~0.58)
- Newspaper advertising has the weakest effect (~0.23) — least cost-effective
- Recommendation: Prioritize TV and Radio budgets, reduce Newspaper spend
- Advertising spend explains over 90% of sales variation
-
Clone the repository
git clone https://github.com/kinzaemannn/CodeAlpha-Sales-Prediction.git cd CodeAlpha-Sales-Prediction -
Install dependencies
pip install -r requirements.txt
-
Open the notebook
jupyter notebook Sales_Prediction.ipynb
- Python 3.x
- Pandas — Data manipulation
- NumPy — Numerical operations
- Matplotlib & Seaborn — Data visualization
- Scikit-learn — ML models and evaluation
This project is for educational purposes as part of the CodeAlpha Data Science Internship.