<a href="https://colab.research.google.com/github/manasiwaghmare18/eCommerce-Transactions-Dataset-Analysis/blob/main/Mansi_Waghmare_Lookalike.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Process Flow:**

**1. Data Loading:** The code starts by loading data from three CSV files: Customers.csv, Products.csv, and Transactions.csv. This data contains information about customers, products, and their purchase history.

**2. Data Preparation:** The ECommerceAnalyzer class is initialized with the loaded data. It performs data cleaning, feature engineering, and data transformation steps to prepare the data for analysis. This involves:

- Converting dates to datetime objects.
- Creating customer features like total transactions, total spend, and average transaction value.
- Creating combined features that include customer demographics and product categories purchased.

**3. Lookalike Identification:**
The core logic lies in the find_lookalikes method. This method calculates the cosine similarity between customers based on their features. It then identifies the most similar customers (lookalikes) for a given customer ID.

**4. Output Generation:** The generate_lookalikes_csv method iterates through a specified number of customers and finds their lookalikes. It then creates a CSV file named Mansi_Waghmare_Lookalike.csv containing the customer ID and their corresponding lookalikes with similarity scores.

**Lookalike Modeling:**

Lookalike modeling is a technique used in marketing and data science to identify new potential customers who share similar characteristics with your existing high-value customers. It is a powerful tool for:

**Customer Acquisition:** By targeting lookalike audiences, you can expand your customer base by reaching individuals who are likely to be interested in your products or services.
Targeted Marketing: Lookalike modeling enables you to tailor your marketing campaigns to specific audience segments with higher conversion rates.
Personalized Recommendations: You can leverage lookalike information to provide more relevant and personalized product recommendations to your customers.

**How it works:**

**Lookalike modeling typically involves the following steps:**

- Define your seed audience: This is your group of existing high-value customers.
- Data Collection and Feature Engineering: Gather data about your seed audience and create relevant features that represent their characteristics.
- Model Training: Train a machine learning model (like cosine similarity in this case) to identify patterns and similarities within the seed audience data.
- Lookalike Audience Generation: Apply the trained model to a larger pool of potential customers to identify those who share similar features with your seed audience.
- Targeting and Evaluation: Use the identified lookalike audience for targeted marketing and track the performance of your campaigns.

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from google.colab import files



In [None]:
class ECommerceAnalyzer:
    def __init__(self, customers_df, products_df, transactions_df):
        self.customers_df = customers_df.copy()
        self.products_df = products_df.copy()
        self.transactions_df = transactions_df.copy()
        self.prepare_data()

    def prepare_data(self):
        # Convert dates
        self.customers_df['SignupDate'] = pd.to_datetime(self.customers_df['SignupDate'])
        self.transactions_df['TransactionDate'] = pd.to_datetime(self.transactions_df['TransactionDate'])

        # Create customer features
        self.customer_features = self.create_customer_features()
        self.combined_features = self.create_combined_features()

    def create_customer_features(self):
        # Aggregate transaction data by customer
        customer_aggs = self.transactions_df.groupby('CustomerID').agg({
            'TransactionID': 'count',
            'Quantity': 'sum',
            'TotalValue': 'sum'
        }).rename(columns={
            'TransactionID': 'total_transactions',
            'Quantity': 'total_items',
            'TotalValue': 'total_spend'
        })

        # Add average transaction value
        customer_aggs['avg_transaction_value'] = (
            customer_aggs['total_spend'] / customer_aggs['total_transactions']
        )

        # Merge with customer data
        customer_features = self.customers_df.merge(
            customer_aggs,
            how='left',
            left_on='CustomerID',
            right_index=True
        )

        # Fill NaN values for customers with no transactions
        customer_features = customer_features.fillna({
            'total_transactions': 0,
            'total_items': 0,
            'total_spend': 0,
            'avg_transaction_value': 0
        })

        return customer_features

    def create_combined_features(self):
        # Merge transactions with products to get product categories
        transactions_with_products = self.transactions_df.merge(self.products_df[['ProductID', 'Category']], on='ProductID', how='left')

        # Create customer-product features
        customer_product_features = transactions_with_products.groupby(['CustomerID', 'Category'])['Quantity'].sum().reset_index()

        # Pivot the table to create a customer-category matrix
        customer_product_matrix = customer_product_features.pivot(index='CustomerID', columns='Category', values='Quantity').fillna(0)

        # Join customer features with customer-product matrix
        combined_features = self.customer_features.merge(customer_product_matrix, left_on='CustomerID', right_index=True, how='left').fillna(0)

        return combined_features

    def find_lookalikes(self, customer_id, n_recommendations=3):
    # Select features for similarity calculation
    # Exclude CustomerID, SignupDate, and Gender
    # Only include numerical features
        features = self.combined_features.select_dtypes(include=np.number).columns[2:]

    # Normalize features
        scaler = StandardScaler()
        scaled_features = scaler.fit_transform(self.combined_features[features])

    # Calculate similarity
        similarity_matrix = cosine_similarity(scaled_features)

        # Get customer index
        customer_idx = self.combined_features[self.combined_features['CustomerID'] == customer_id].index[0]

        # Get similar customers
        similar_scores = similarity_matrix[customer_idx]
        similar_indices = np.argsort(similar_scores)[::-1][1:n_recommendations + 1]

        recommendations = []
        for idx in similar_indices:
            customer = self.combined_features.iloc[idx]
            recommendations.append({
                'customer_id': customer['CustomerID'],
                'similarity_score': similar_scores[idx]
            })

        return recommendations

    def generate_lookalikes_csv(self, num_customers=20, output_file="Mansi_Waghmare_Lookalike.csv"):
        lookalikes = {}
        for customer_id in self.customers_df['CustomerID'][:num_customers]:
            lookalikes[customer_id] = self.find_lookalikes(customer_id)

        # Create a DataFrame for lookalikes
        lookalikes_df = pd.DataFrame([(k, v) for k, v in lookalikes.items()],
                                     columns=['cust_id', 'lookalikes'])

        # Explode the lookalikes list
        lookalikes_df = lookalikes_df.explode('lookalikes')

        # Extract customer_id and similarity_score
        lookalikes_df[['lookalike_id', 'similarity_score']] = lookalikes_df['lookalikes'].apply(pd.Series)

        # Drop the original lookalikes column
        lookalikes_df = lookalikes_df.drop(columns=['lookalikes'])

       # Group by cust_id and aggregate lookalikes
       # Use a list to select multiple columns instead of a tuple
        lookalikes_grouped = lookalikes_df.groupby('cust_id')[['lookalike_id', 'similarity_score']].apply(lambda x: x.to_dict('records')).reset_index(name='lookalikes')

        # Save to CSV
        lookalikes_grouped.to_csv(output_file, index=False)
        print(f"Lookalike data saved to {output_file}")



In [None]:
# Load data
customers_df = pd.read_csv('/content/Customers.csv')
products_df = pd.read_csv('/content/Products.csv')
transactions_df = pd.read_csv('/content/Transactions.csv')

# Initialize analyzer
analyzer = ECommerceAnalyzer(customers_df, products_df, transactions_df)

# Generate Lookalike.csv
analyzer.generate_lookalikes_csv()

Lookalike data saved to Mansi_Waghmare_Lookalike.csv
