#  Customer Retail Purchase Analysis & Prediction (2023)

##  Problem Statement
Using the *Customer Retail Purchase Data (2023)*, we aim to build a **Linear Regression model** that predicts the **Total Amount Spent** by a customer during a transaction. The model will use customer demographics and purchase behavior to make predictions.

---

## Objective
- Predict the `Total Amount` using various features such as:
  - Gender
  - Age
  - Product Category
  - Quantity
  - Price per Unit

---

##  Dataset Description

| Column Name       | Description                                                        |
|-------------------|--------------------------------------------------------------------|
| `Date`            | Date of the transaction                                            |
| `Gender`          | Gender of the customer                                             |
| `Age`             | Age of the customer                                                |
| `Product Category`| Category of the purchased item (e.g., Beauty, Clothing, Electronics)|
| `Quantity`        | Number of units purchased                                          |
| `Price per Unit`  | Cost of one unit of the item                                       |
| `Total Amount`    | Total spending = Quantity × Price per Unit                         |

---

##  Features and Label

- **Features (X)**:
  - `Gender` (encoded: Male = 0, Female = 1)
  - `Age`
  - `Product Category` (encoded: Beauty = 0, Clothing = 1, Electronics = 2)
  - `Quantity`
  - `Price per Unit`

- **Label (y)**:
  - `Total Amount`

---

##  Step 1: Data Cleaning & Preprocessing

-  Check for **null/missing values**
-  Check for **duplicate entries**
-  Detect and handle **outliers**

---

##  Step 2: Exploratory Data Analysis (EDA)

### Distribution Plots
- Age distribution
- Price per Unit distribution
- Quantity and Total Amount distribution

###  Gender-wise Analysis
- Total Amount by Gender
- Average Quantity purchased by Gender

###  Category Trends
- Most popular Product Categories
- Monthly or seasonal trends (extract from Date)

---

##  Step 3: Feature Scaling

- Apply `StandardScaler` to scale numerical features:
  - Age
  - Quantity
  - Price per Unit

---

##  Step 4: Train-Test Split

- Split data into:
  - Training set (e.g., 80%)
  - Testing set (e.g., 20%)
- Use `train_test_split` with `random_state=42`

---

##  Step 5: Linear Regression Model

- Fit the **Linear Regression** model on training data
- Predict on test set

---

##  Step 6: Model Evaluation

Calculate and compare the following metrics:
- 🔹 `R² Score`
- 🔹 `Mean Absolute Error (MAE)`
- 🔹 `Mean Squared Error (MSE)`
- 🔹 `Root Mean Squared Error (RMSE)`

---

##  Step 7: Regularization for Overfitting Check

###  Lasso Regression
- Use L1 regularization
- Evaluate performance and feature selection

###  Ridge Regression
- Use L2 regularization
- Compare training/testing performance

---

##  Step 8: Model Comparison

| Model Type       | Train R² | Test R² | MAE  | MSE  | RMSE |
|------------------|----------|---------|------|------|------|
| Linear Regression|          |         |      |      |      |
| Lasso Regression |          |         |      |      |      |
| Ridge Regression |          |         |      |      |      |

- Comment on whether the model is overfitting or underfitting based on results

---

## Conclusion

- Summary of insights from the data
- Final selected model and performance
- Suggestions for improvement or future work (e.g., more features, advanced models like XGBoost, time series forecasting)
