**Task 2: Lookalike Model**

Build a Lookalike Model that takes a user's information as input and recommends 3 similar customers based on their profile and transaction history. The model should:

● Use both customer and product information.

● Assign a similarity score to each recommended customer.

Deliverables:

● Give the top 3 lookalikes with there similarity scores for the first 20 customers(CustomerID: C0001 - C0020) in Customers.csv. Form an “Lookalike.csv” which has just one map: Map<cust_id, List<cust_id, score>>

● A Jupyter Notebook/Python script explaining your model development.

Evaluation Criteria:

● Model accuracy and logic.

● Quality of recommendations and similarity scores.

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

# Load datasets
customers = pd.read_csv("/content/drive/MyDrive/ZEOTAP/Customers.csv")  # CustomerID, CustomerName, Region, SignupDate
products = pd.read_csv("/content/drive/MyDrive/ZEOTAP/Products.csv")    # ProductID, ProductName, Category, Price
transactions = pd.read_csv("/content/drive/MyDrive/ZEOTAP/Transactions.csv")  # TransactionID, CustomerID, ProductID, TransactionDate, Quantity, TotalValue, Price

# Data Preparation
# Merge Transactions with Products to get product details
transactions = transactions.merge(products, on='ProductID', how='left')

# Combine Transactions with Customers to get customer details
customer_data = transactions.merge(customers, on='CustomerID', how='left')

# Aggregate data to create customer profiles
customer_profiles = customer_data.groupby('CustomerID').agg({
    'ProductName': lambda x: ' '.join(x),  # List of purchased product names
    'Category': lambda x: ' '.join(x),  # List of categories purchased
    'Region': 'first',  # Customer's region
}).reset_index()

# Combine textual features for similarity calculation
customer_profiles['CombinedFeatures'] = (
    customer_profiles['ProductName'] + ' ' + customer_profiles['Category'] + ' ' + customer_profiles['Region']
)

# Calculate similarity using TF-IDF and cosine similarity
tfidf = TfidfVectorizer()
features_matrix = tfidf.fit_transform(customer_profiles['CombinedFeatures'])

similarity_matrix = cosine_similarity(features_matrix)

# Generate a map of lookalike customers for the first 20 customers
lookalike_map = {}
customer_ids = customer_profiles['CustomerID'].tolist()

for i in range(min(20, len(customer_ids))):  # Limit to first 20 customers
    customer_index = i
    similarities = list(enumerate(similarity_matrix[customer_index]))
    # Sort by similarity score in descending order, excluding the customer itself
    similarities = sorted(similarities, key=lambda x: x[1], reverse=True)
    top_similar_customers = [
        (customer_ids[j], score) for j, score in similarities[1:4]  # Top 3 lookalikes
    ]
    lookalike_map[customer_ids[customer_index]] = top_similar_customers

# Save Lookalike map to "Lookalike.csv"
lookalike_data = {
    'cust_id': [cust_id for cust_id in lookalike_map.keys()],
    'lookalike_map': [lookalikes for lookalikes in lookalike_map.values()]
}

lookalike_df = pd.DataFrame(lookalike_data)
lookalike_df.to_csv('Lookalike.csv', index=False)

print("Lookalike model results saved to Lookalike.csv")
# Load the Lookalike.csv to display the contents
lookalike_df = pd.read_csv('Lookalike.csv')

# Print the first few rows to check the results
print(lookalike_df)


Lookalike model results saved to Lookalike.csv
   cust_id                                      lookalike_map
0    C0001  [('C0197', 0.7636039102099085), ('C0026', 0.71...
1    C0002  [('C0133', 0.8369411571595793), ('C0173', 0.78...
2    C0003  [('C0181', 0.7467787056356553), ('C0085', 0.72...
3    C0004  [('C0118', 0.7779858254769464), ('C0008', 0.75...
4    C0005  [('C0128', 0.7410269556008633), ('C0096', 0.71...
5    C0006  [('C0187', 0.8055748712864338), ('C0191', 0.68...
6    C0007  [('C0045', 0.6930400365400461), ('C0181', 0.69...
7    C0008  [('C0057', 0.799599372597294), ('C0143', 0.773...
8    C0009  [('C0062', 0.7314050736916898), ('C0093', 0.67...
9    C0010  [('C0092', 0.7082586822002053), ('C0145', 0.67...
10   C0011  [('C0094', 0.6690893062065514), ('C0087', 0.63...
11   C0012  [('C0136', 0.8420870183486332), ('C0076', 0.82...
12   C0013  [('C0102', 0.7737204134263764), ('C0040', 0.73...
13   C0014  [('C0128', 0.814203857359771), ('C0086', 0.636...
14   C0015  [('C0185', 