# EDA – Portfolio Project

## Project Overview
You're a marketing analyst and you've been told by the Chief Marketing Officer that recent marketing campaigns have not been as effective as they were expected to be. You need to analyze the dataset to understand this problem and propose data-driven solutions.

---

## Dataset Provided
**File:** `ifood_df.csv`  
The dataset consists of 2206 customers of XYZ company with data on:

- Customer profiles  
- Product preferences  
- Campaign successes/failures  
- Channel performance  

### Column Details:
- **ID**: Customer's Unique Identifier  
- **Year_Birth**: Customer's Birth Year  
- **Education**: Customer's education level  
- **Marital_Status**: Customer's marital status  
- **Income**: Customer's yearly household income  
- **Kidhome**: Number of children in customer's household  
- **Teenhome**: Number of teenagers in customer's household  
- **Dt_Customer**: Date of customer's enrollment with the company  
- **Recency**: Number of days since customer's last purchase  
- **MntWines**: Amount spent on wine in the last 2 years  
- **MntFruits**: Amount spent on fruits in the last 2 years  
- **MntMeatProducts**: Amount spent on meat in the last 2 years  
- **MntFishProducts**: Amount spent on fish in the last 2 years  
- **MntSweetProducts**: Amount spent on sweets in the last 2 years  
- **MntGoldProds**: Amount spent on gold in the last 2 years  
- **NumDealsPurchases**: Number of purchases made with a discount  
- **NumWebPurchases**: Number of purchases made through the company's website  
- **NumCatalogPurchases**: Number of purchases made using a catalogue  
- **NumStorePurchases**: Number of purchases made directly in stores  
- **NumWebVisitsMonth**: Number of visits to company's website in the last month  
- **AcceptedCmp1**: 1 if customer accepted the offer in the 1st campaign, 0 otherwise  
- **AcceptedCmp2**: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise  
- **AcceptedCmp3**: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise  
- **AcceptedCmp4**: 1 if customer accepted the offer in the 4th campaign, 0 otherwise  
- **AcceptedCmp5**: 1 if customer accepted the offer in the 5th campaign, 0 otherwise  
- **Response**: 1 if customer accepted the offer in the last campaign, 0 otherwise  
- **Complain**: 1 if customer complained in the last 2 years, 0 otherwise  
- **Country**: Customer's location  

---

## Objectives

### Section 1: Data Analysis and Preprocessing
1. **Null Values and Outliers**
   - Are there any null values or outliers? How will you handle them?  
     Examples:  
     - Removing rows with outliers  
     - Removing or imputing missing values with a constant value (e.g., mean, median)

2. **Type Transformations**
   - Are there any variables that require type transformations?

3. **Unique Values**
   - What are the unique values in each column?

4. **Feature Engineering**
   - Are there any useful variables that you can engineer with the given data?  
     Examples:  
     - `Age`: Replace `Year_Birth` with calculated age.  
     - `Revenue_Generated`: Total sum of the amount spent on the 6 product categories.  
     - `Total_Purchases`: Sum of all purchase-related features.  
     - `TotalAmount_Spent`: Sum of all `Mnt*` features for each customer.  
     - `Family`: Sum of `Kidhome` + `Teenhome` + `Marital_Status`.  
     - `Marital_Status` mapping:  
       `{ 'Divorced': 1, 'Single': 1, 'Married': 2, 'Together': 2, 'Widow': 1, 'YOLO': 1, 'Alone': 1, 'Absurd': 1 }`  
     - `Educational_Years`: Total number of years of education based on diploma.  
     - `TotalCampaignsAcc`: Total acceptance of advertising campaigns.

5. **Patterns or Anomalies**
   - Do you notice any patterns or anomalies in the data? Can you plot them?

---

### Section 2: Exploratory Data Analysis (EDA)
1. **Univariate Analysis**
   - Analyze individual variables to understand their distribution, central tendency, and variability.  
     Techniques include:  
     - Descriptive statistics  
     - Histogram  
     - Boxplot  

2. **Bivariate Analysis**
   - Analyze relationships between two variables to identify patterns and trends.  
     Techniques include:  
     - Scatter plot  
     - Correlation analysis  

---

### Section 3: Data Visualization
1. Plot and visualize the answers to the following questions:
   - Which marketing campaign is most successful?  
   - Display the total amount spent by a customer in each product category.  
   - What is the average spending in each age group?  
   - Which products are performing best, and which are performing the least in terms of revenue?  
     - Analyze and plot a graph to display revenue generated by different products.  
   - Which country has the greatest number of customers who accepted the last campaign?  

2. Bring together insights from **Sections 1-3** and provide **data-driven recommendations/suggestions**.


In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [3]:
df=pd.read_csv('ifood_df_raw.csv')
df.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Response,Complain,Country
0,1826,1970,Graduation,Divorced,"$84,835.00",0,0,6/16/14,0,189,...,6,1,0,0,0,0,0,1,0,SP
1,1,1961,Graduation,Single,"$57,091.00",0,0,6/15/14,0,464,...,7,5,0,0,0,0,1,1,0,CA
2,10476,1958,Graduation,Married,"$67,267.00",0,1,5/13/14,0,134,...,5,2,0,0,0,0,0,0,0,US
3,1386,1967,Graduation,Together,"$32,474.00",1,1,5/11/14,0,10,...,2,7,0,0,0,0,0,0,0,AUS
4,5371,1989,Graduation,Single,"$21,474.00",1,0,4/8/14,0,6,...,2,7,1,0,0,0,0,1,0,SP
