# Task 2: Lookalike Model

This notebook implements a lookalike model to recommend similar customers based on their profiles and transaction history. The goal is to identify the top 3 similar customers for each of the first 20 customers (C0001–C0020).

## Steps:
1. Data preparation and feature engineering.
2. Similarity computation.
3. Recommendations generation.


In [1]:
import pandas as pd

In [3]:
# Load datasets
customers = pd.read_csv('data/Customers.csv')
transactions = pd.read_csv('data/Transactions.csv')
products = pd.read_csv('data/Products.csv')

In [4]:
# Merge datasets to enrich transaction data with customer and product information
transactions = transactions.merge(customers, on='CustomerID', how='left')
transactions = transactions.merge(products, on='ProductID', how='left')

In [5]:
# Feature engineering: Aggregating customer data
customer_features = transactions.groupby('CustomerID').agg(
    TotalSpend=('TotalValue', 'sum'),
    TotalTransactions=('TransactionID', 'count'),
    AverageTransactionValue=('TotalValue', 'mean'),
    MostPurchasedCategory=('Category', lambda x: x.mode()[0] if not x.mode().empty else None)
).reset_index()

In [6]:
print(customer_features.head())

  CustomerID  TotalSpend  TotalTransactions  AverageTransactionValue  \
0      C0001     3354.52                  5                  670.904   
1      C0002     1862.74                  4                  465.685   
2      C0003     2725.38                  4                  681.345   
3      C0004     5354.88                  8                  669.360   
4      C0005     2034.24                  3                  678.080   

  MostPurchasedCategory  
0           Electronics  
1              Clothing  
2            Home Decor  
3                 Books  
4           Electronics  


In [7]:
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

In [8]:
# Normalize the numerical features
scaler = StandardScaler()
numerical_features = ['TotalSpend', 'TotalTransactions', 'AverageTransactionValue']
customer_features[numerical_features] = scaler.fit_transform(customer_features[numerical_features])


In [9]:
# Compute similarity matrix
similarity_matrix = cosine_similarity(customer_features[numerical_features])


In [10]:
# Store similarity scores in a DataFrame
similarity_df = pd.DataFrame(similarity_matrix, index=customer_features['CustomerID'], columns=customer_features['CustomerID'])

In [11]:
# Get top 3 similar customers for each customer
lookalike_results = {}

for customer_id in customer_features['CustomerID']:
    similar_customers = similarity_df[customer_id].sort_values(ascending=False)[1:4]
    lookalike_results[customer_id] = list(zip(similar_customers.index, similar_customers.values))

# Prepare the results for the first 20 customers
lookalike_output = {
    customer: lookalike_results[customer]
    for customer in customer_features['CustomerID'][:20]
}

# Convert results to DataFrame for saving
lookalike_df = pd.DataFrame([
    {"CustomerID": cust, "Lookalikes": str(lookalikes)}
    for cust, lookalikes in lookalike_output.items()
])

# Save to CSV
lookalike_df.to_csv('Lookalike.csv', index=False)
print("Lookalike recommendations saved to Lookalike.csv.")


Lookalike recommendations saved to Lookalike.csv.


The lookalike model was successfully implemented to recommend similar customers based on their profiles and transaction history. Key conclusions include:

1. **Model Performance**:
   - The similarity computation effectively identified the top 3 similar customers for each target customer (C0001–C0020).
   - Feature engineering, such as total spend, transaction frequency, and average transaction value, significantly contributed to meaningful recommendations.

2. **Business Value**:
   - This model can help the business identify potential customer groups for targeted marketing campaigns.
   - Similar customers could also be targeted with personalized product recommendations to improve engagement and sales.