In [1]:
#importing the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import mean_squared_error as MSE
from math import sqrt
import warnings
warnings.simplefilter('ignore')
with warnings.catch_warnings():
    warnings.filterwarnings("ignore")
plt.style.use('ggplot')
%matplotlib inline
import seaborn as sns

# Business Understanding


## Overview
As EURO 2024 approaches, Adidas aims to strategically engage football enthusiasts in Great Britain, a key market for sports merchandise. With the excitement building, adidas seeks to harness data-driven insights to optimize its marketing strategies and product offerings specifically for football fans in this region. By focusing on customer segmentation and a recommendation system, the objective is to understand consumer behavior, tailor marketing efforts, and enhance customer experiences, ultimately driving sales and brand loyalty during this high-profile event.

## Problem Statement 
As EURO 2024 approaches, Adidas faces steep competition from their competitors who also seek to engage football enthusiasts during this season. Adidas needs to find an informed way to optimize resource allocation and marketing strategies to stay ahead in this competitive market. The company needs to understand the diverse characteristics and behaviors of its customer base, particularly football fans in Great Britain, to optimize marketing efforts and product offerings

## Challenges
- Data Quality and Availability:
Inconsistent or missing data could affect the accuracy of the customer segmentation and recommendation models.
Limited data for new users (cold start problem) may make it difficult to generate personalized recommendations.
- Customer Segmentation:
Determining the optimal number of customer segments can be challenging, as too few or too many clusters may lead to poor segmentation.
Ensuring that the segments are actionable and meaningful in a business context.
- Real-Time Recommendations:
Implementing a recommendation system that can respond quickly and accurately in a real-time environment.
Ensuring that the system scales well under varying loads, especially during high-traffic periods like during Euro 2024.
- Deployment and Integration:
Linking the recommendation system with the dummy website in a way that accurately simulates real-world usage.
Managing dependencies and ensuring that all components work seamlessly together, particularly in a containerized environment.
- User Experience and Adoption:
Creating a user-friendly interface that showcases the recommendation system effectively.
Ensuring that recommendations are perceived as relevant and helpful by end-users.

## Proposed Solution
It involves integrating and cleaning Great Britain-specific sales, customer, and engagement data to create a unified dataset.
We will perform demographic and engagement analysis to identify key customer segments through clustering techniques.
A recommendation system combining collaborative and content-based filtering will then be developed to deliver personalized product suggestions to these segments.
Continuous evaluation using metrics such as Silhouette Score, Precision, and Recall will ensure the effectiveness of the segmentation and recommendation system, ultimately driving increased sales, customer engagement, and satisfaction.

## Success Metrics 
Model Accuracy: Maintain an overall model accuracy rate of 80% or higher in predicting user preferences.
Functional Storefront: Ensure the website accurately simulates a real e-commerce platform, showcasing product recommendations with names and descriptions.
Model Integration: Successfully integrate the recommendation model into the website, allowing it to dynamically generate and display personalized product suggestions for each user.

## Conclusion
By focusing on Great Britain and leveraging customer segmentation and recommendation systems, Adidas can better understand and engage football fans in this key market. The proposed solution will enable more personalized marketing strategies and product recommendations, driving higher sales and stronger customer loyalty in the lead-up to EURO 2024



# Data Understanding

## Data sources 
There are three datasets that will be used:
- (ConsTable_EU.csv), that contains consumer information.

- (SalesTable_EU.csv), that contains Sales information.

- (EngagementTable_GB.csv) that contains data on customer engagement for Great Britain.

In [3]:
# Load all datasets
cons_eu = pd.read_csv('data/ConsTable_EU.csv')
sales_eu = pd.read_csv('data/SalesTable_EU.csv')
engagement_gb = pd.read_csv('data/EngagementTable_GB.csv')


In [5]:
# Consumer Information
print('Consumer Information'.center(50, '-'))
print(f'Shape: {cons_eu.shape}')
print(f'Info:\n{cons_eu.info()}')
print(f'Description:\n{cons_eu.describe()}')

print('\n' + '-'*50 + '\n')

# Sales
print('Sales'.center(50, '-'))
print(f'Shape: {sales_eu.shape}')
print(f'Info:\n{sales_eu.info()}')
print(f'Description:\n{sales_eu.describe()}')

print('\n' + '-'*50 + '\n')

# Engagement Data
print('Engagement Data'.center(50, '-'))
print(f'Shape: {engagement_gb.shape}')
print(f'Info:\n{engagement_gb.info()}')
print(f'Description:\n{engagement_gb.describe()}')

---------------Consumer Information---------------
Shape: (355461, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 355461 entries, 0 to 355460
Data columns (total 8 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   acid                       355461 non-null  object 
 1   loyalty_memberid           266450 non-null  object 
 2   birth_year                 133642 non-null  float64
 3   consumer_gender            355461 non-null  object 
 4   market_name                355461 non-null  object 
 5   first_signup_country_code  355461 non-null  object 
 6   member_latest_tier         266335 non-null  object 
 7   member_latest_points       266335 non-null  float64
dtypes: float64(2), object(6)
memory usage: 21.7+ MB
Info:
None
Description:
          birth_year  member_latest_points
count  133642.000000         266335.000000
mean     1987.942346            150.370961
std        13.421753           1241.023133
m

 1. **Consumer Information**
- **Shape**: (355,461 rows, 8 columns)
- **Key Columns**:
  - `acid`: Unique identifier for each consumer (non-null).
  - `loyalty_memberid`: Membership ID (missing for about 25% of consumers).
  - `birth_year`: Year of birth (available for about 38% of consumers).
  - `consumer_gender`, `market_name`, `first_signup_country_code`: Demographic and location data.
  - `member_latest_tier`, `member_latest_points`: Loyalty program data, available for around 75% of consumers.
- **Notable Statistics**:
  - `birth_year`: Average year of birth is ~1988, with a range from 1882 to 2009.
  - `member_latest_points`: Points range widely, with some negative values and a max of 377,850.4 points.

 2. **Sales Data**
- **Shape**: (178,334 rows, 20 columns)
- **Key Columns**:
  - `acid`, `order_no`, `order_date`: Order identifiers and dates (non-null).
  - `market_name`, `country`: Geographic data.
  - `quantity_ordered`, `quantity_returned`, `quantity_cancelled`, `quantity_delivered`: Metrics on order fulfilment.
  - `exchange_rate_to_EUR`, `order_item_unit_price_net`: Financial data related to orders.
- **Notable Statistics**:
  - `quantity_ordered`: Average slightly above 1 item per order.
  - `quantity_returned`: 21% of items are returned on average.
  - `order_item_unit_price_net`: Prices range from -€45.76 to €14,628.10, indicating some anomalies.

 3. **Engagement Data**
- **Shape**: (33,148 rows, 29 columns)
- **Key Columns**:
  - `acid`: Consumer ID.
  - `year`, `quarter_of_year`, `month_of_year`, `week_of_year`: Temporal data for tracking engagement.
  - Various `freq_*` columns: Metrics capturing the frequency of consumer interactions (e.g., signups, app usage, purchases).
- **Notable Statistics**:
  - `freq_signup`, `freq_sportsapp`, `freq_email`, etc.: Majority of engagement metrics have low averages, indicating most consumers interact sporadically.
  - `freq_dotcom`, `freq_flagshipapp`: Show more consistent engagement, with some consumers interacting very frequently (e.g., up to 399 times on the flagship app).
