<a href="https://colab.research.google.com/github/ragavkish/e-com_transaction/blob/main/Ragavkishore_DM_Lookalike.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Import Libraries and Load Data**

In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler
import numpy as np

In [2]:
customers = pd.read_csv("/content/dataset/Customers.csv")
products = pd.read_csv("/content/dataset/Products.csv")
transactions = pd.read_csv("/content/dataset/Transactions.csv")

In [3]:
print("Customers:")
print(customers.head())
print("\nProducts:")
print(products.head())
print("\nTransactions:")
print(transactions.head())

Customers:
  CustomerID        CustomerName         Region  SignupDate
0      C0001    Lawrence Carroll  South America  2022-07-10
1      C0002      Elizabeth Lutz           Asia  2022-02-13
2      C0003      Michael Rivera  South America  2024-03-07
3      C0004  Kathleen Rodriguez  South America  2022-10-09
4      C0005         Laura Weber           Asia  2022-08-15

Products:
  ProductID              ProductName     Category   Price
0      P001     ActiveWear Biography        Books  169.30
1      P002    ActiveWear Smartwatch  Electronics  346.30
2      P003  ComfortLiving Biography        Books   44.12
3      P004            BookWorld Rug   Home Decor   95.69
4      P005          TechPro T-Shirt     Clothing  429.31

Transactions:
  TransactionID CustomerID ProductID      TransactionDate  Quantity  \
0        T00001      C0199      P067  2024-08-25 12:38:23         1   
1        T00112      C0146      P067  2024-05-27 22:23:54         1   
2        T00166      C0127      P067  2024

##**Data Preprocessing**

In [4]:
transactions = transactions.merge(products, on="ProductID", how="left")
transactions = transactions.merge(customers, on="CustomerID", how="left")

In [5]:
customer_features = transactions.groupby("CustomerID").agg({
    'TotalValue': ['sum', 'mean'],
    'Quantity': 'sum',
    'ProductID': lambda x: x.nunique(),
    'TransactionDate': lambda x: (pd.to_datetime(x).max() - pd.to_datetime(x).min()).days
}).reset_index()

In [6]:
customer_features.columns = ['CustomerID', 'TotalSpent', 'AvgTransactionValue', 'TotalQuantity', 'UniqueProducts', 'Recency']

In [7]:
region_dummies = pd.get_dummies(customers[['CustomerID', 'Region']], columns=['Region'], drop_first=True)
customer_features = customer_features.merge(region_dummies, on="CustomerID", how="left")

In [8]:
scaler = StandardScaler()
scaled_features = scaler.fit_transform(customer_features.drop(columns=['CustomerID']))

##**Computation of Similarities**

In [9]:
similarity_matrix = cosine_similarity(scaled_features)

In [10]:
customer_indices = {id_: idx for idx, id_ in enumerate(customer_features['CustomerID'])}

##**Generating recommendations**

In [11]:
def get_top_lookalikes(customer_id, top_n=3):
    idx = customer_indices[customer_id]
    similarity_scores = similarity_matrix[idx]
    similar_indices = similarity_scores.argsort()[::-1][1:top_n+1]
    return [(customer_features['CustomerID'][i], similarity_scores[i]) for i in similar_indices]

In [12]:
lookalike_map = {}
for customer_id in customers['CustomerID'][:20]:
    lookalike_map[customer_id] = get_top_lookalikes(customer_id)

##**Saving Output**

In [13]:
output_data = []

In [14]:
for cust_id, lookalikes in lookalike_map.items():
    for lookalike_id, score in lookalikes:
        output_data.append((cust_id, lookalike_id, score))

In [15]:
lookalike_df = pd.DataFrame(output_data, columns=['CustomerID', 'LookalikeID', 'SimilarityScore'])
lookalike_df.to_csv("/content/dataset/Ragavkishore_DM_Lookalike.csv", index=False)

In [16]:
print("Top lookalikes:")
print(lookalike_df.head(20))

Top lookalikes:
   CustomerID LookalikeID  SimilarityScore
0       C0001       C0152         0.984164
1       C0001       C0137         0.979203
2       C0001       C0011         0.963812
3       C0002       C0142         0.977554
4       C0002       C0088         0.833603
5       C0002       C0027         0.810909
6       C0003       C0190         0.934538
7       C0003       C0052         0.880136
8       C0003       C0191         0.877310
9       C0004       C0113         0.979667
10      C0004       C0099         0.946891
11      C0004       C0169         0.937131
12      C0005       C0159         0.980546
13      C0005       C0178         0.953050
14      C0005       C0146         0.943515
15      C0006       C0168         0.969564
16      C0006       C0158         0.948784
17      C0006       C0187         0.942694
18      C0007       C0140         0.920227
19      C0007       C0078         0.879623
