<a href="https://colab.research.google.com/github/kalyaanrr/ECommerceTransactionDataset/blob/main/Kalyaan_Mahendar_Lookalike.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Model Used for the given task is **Cosine Similarity**.

*   **Definition**: Cosine Similarity is a model, that measures the cosing of the angle between two vectors in a multidimensional space.

*   **Formula**: Cosine Similarity = (A.B) / (||A||.||B||)
             
Where **A** and **B** are feature vectors of two customers.

**Steps to use Cosine Similarity**

*   Convert each customer's data into a feature vector.

*   Calculate pairwise cosine similarity scores of all customer pairs.

*   For each customer, rank all other customers basedon their similarity score.

*   Recommend the top 3 similar customers along with their scores.

**Why Cosine Similarity is been used ?**


*   **Scalability** : Computationallyefficient and works with sparse or high-dimensional data (Transaction history).
*   **Interpretability:** The scores aree normalized betweeen 0 and 1, making them easy to interpret.
*   **Relevance**: Focuses on relationship between features, aligns well with the objective of finding similar customers.





In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

customers = pd.read_csv('Customers.csv')
products = pd.read_csv('Products.csv')
transactions = pd.read_csv('Transactions.csv')

merged_data = pd.merge(transactions,customers,on='CustomerID')
merged_data = pd.merge(merged_data,products,on='ProductID')
merged_data.rename(columns={'Price_x':'ProductPrice','Price_y':'TransactionPrice'},inplace=True)

In [3]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler

customer_product = merged_data.pivot_table(
    index = 'CustomerID', columns = 'ProductName', values = 'TotalValue',aggfunc='sum', fill_value = 0
)

scalar = MinMaxScaler()
normalized_data = scalar.fit_transform(customer_product)
similarity_score = cosine_similarity(normalized_data)
similarity_df = pd.DataFrame(similarity_score,index=customer_product.index,columns=customer_product.index)
lookalik = {}
for customer in customer_product.index[:20]:
  similar_customers = similarity_df[customer].sort_values(ascending=False).iloc[1:4]
  lookalik[customer] =[(idx,score) for idx,score in similar_customers.items()]
lookalike_df = pd.DataFrame([
    {'CustomerID':cust,'Lookalikes':str(data)}
    for cust,data in lookalik.items()
])
lookalike_df.to_csv("Kalyaan_Mahendar_Lookalike.csv",index=False)