## **BUSINESS CASE 3: Recheio Recommendation System**  


## 🎓 Master’s Program in Data Science & Advanced Analytics 
**Nova IMS** | March 2025   
**Course:** Business Cases with Data Science

## 👥 Team **Group A**  
- **Alice Viegas** | 20240572  
- **Bernardo Faria** | 20240579  
- **Dinis Pinto** | 20240612  
- **Daan van Holten** | 20240681
- **Philippe Dutranoit** | 20240518

## 📊 Project Overview  
This notebook uses the Case3_Recheio_2025 (1).xlsx dataset to build a recommendation system that helps Recheio suggest better products to existing customers.

It addresses two key challenges:<br>
—  How to enrich customer data for more accurate recommendations.<br>
—  How to deliver relevant suggestions across available sales channels. <br>

## 📊 Goal of the notebook

In this notebook, we develop product recommendations for the checkout stage. These recommendations address the "Did You Forget?" problem by suggesting additional products based on the items already in the customer's basket, using product similarity and historical purchasing habits. <br>

# Imports

In [78]:
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances

In [79]:
transactions = pd.read_csv('../Data/df.csv')   
clients_with_transactions = pd.read_csv('../Data/clients_with_transactions.csv')
clients_without_transactions = pd.read_csv('../Data/clients_without_transactions.csv')
products = pd.read_csv("../Data/products_fixed.csv")

# 'Did you forget?' pipeline 

The solution to the "Did You Forget?" problem is designed for the checkout stage of the customer journey, when the client already has products in their basket and is about to pay. For this, we created a function that behaves differently depending on whether the client has transaction history:

—  For **clients with past transactions**, we use item-based collaborative filtering with the Dice similarity metric. The function analyzes the client's previous purchases to identify products frequently bought together. Based on the items currently in the basket, it recommends additional products that tend to co-occur with them.

—  For **clients without past transactions**, the function compares the current basket to those of existing clients using cosine similarity. It identifies similar customers based on basket contents and recommends products commonly bought by those similar clients.

For both, a boost of 50% was applied to the own brand products scores since increase in sales of these items is a concern of the company.

In [80]:
def dyf_with_history(client_id, current_basket, df=transactions):
    """
    Generate 'Did You Forget?' product recommendations at checkout for clients with past transactions.

    This function uses item-based collaborative filtering with the Dice similarity metric. It boosts
    recommendations for products that are own brand by 50%. 
    It identifies products frequently bought together in a client’s transaction history and 
    recommends items related to those currently in the basket.

    Parameters:
    - client_id: ID of the client
    - current_basket: List of product IDs currently in the basket
    - df: DataFrame of past transactions (default is 'transactions')

    Returns:
    - List of up to 5 recommended product IDs
    """
    
    # Get client's past transactions
    client_data = df[df['Client ID'] == client_id]
    
    # Create a binary matrix: rows = transactions, cols = products
    transaction_product = pd.crosstab(client_data['Date'], client_data['ID Product'])
    transaction_product = transaction_product.applymap(lambda x: 1 if x > 0 else 0)

    if transaction_product.shape[1] <= 1:
        return []  # Not enough data to compute similarities

    # Transpose for item-item similarity
    product_matrix = transaction_product.T

    # Compute Dice similarity
    similarity = 1 - pairwise_distances(product_matrix.values, metric='dice')
    similarity_df = pd.DataFrame(similarity, index=product_matrix.index, columns=product_matrix.index)

    # Get whether each product is own brand
    own_brand_map = df[['ID Product', 'Own Brand']].drop_duplicates().set_index('ID Product')['Own Brand'].to_dict()

    # Aggregate similarity scores for products in current basket
    scored_items = {}
    for item in current_basket:
        if item in similarity_df.columns:
            similar_scores = similarity_df[item]
            for prod, score in similar_scores.items():
                if prod not in current_basket and prod != item:
                    boost = 1.5 if own_brand_map.get(prod, 0) == 1 else 1                   # 50% boost for own brand
                    scored_items[prod] = scored_items.get(prod, 0) + score * boost

    # Sort and return top 5 items
    sorted_items = sorted(scored_items.items(), key=lambda x: x[1], reverse=True)
    recommendations = [item for item, score in sorted_items[:5]]
    return recommendations



In [81]:
def dyf_without_history(current_basket, df=transactions):
    """
    Generate 'Did You Forget?' product recommendations for new clients with no transaction history.

    This function uses item-based collaborative filtering based on the current basket.
    It compares the basket to those of existing clients using the Dice similarity metric,
    finds the most similar clients, and recommends popular items from their purchases
    that are not yet in the current basket. It also boosts recommendation of own brand items by 50%. 

    Parameters:
    - current_basket: List of product IDs currently in the basket
    - df: DataFrame of past transactions (default is 'transactions')

    Returns:
    - List of up to 5 recommended product IDs
    """
    # Create client-product binary matrix
    client_product = pd.crosstab(df['Client ID'], df['ID Product'])
    client_product = client_product.applymap(lambda x: 1 if x > 0 else 0)

    # Create a binary vector for the current basket
    all_products = client_product.columns
    basket_vector = pd.Series(0, index=all_products)
    for prod in current_basket:
        if prod in basket_vector.index:
            basket_vector[prod] = 1
    basket_vector = basket_vector.values.reshape(1, -1)
    
    # Compute Dice similarity to all existing clients
    distances = pairwise_distances(basket_vector, client_product.values, metric='dice')[0]
    most_similar_indices = distances.argsort()[:5]
    similar_clients = client_product.index[most_similar_indices]
    
    # Aggregate products bought by similar clients
    similar_purchases = df[df['Client ID'].isin(similar_clients)]['ID Product']
    # Filter out items already in basket
    recommendations = similar_purchases[~similar_purchases.isin(current_basket)]

    # Merge with own brand info
    merged = pd.DataFrame(recommendations, columns=['ID Product'])
    merged = merged.merge(df[['ID Product', 'Own Brand']].drop_duplicates(), on='ID Product', how='left')

    # Calculate scores with a boost for own brand
    merged['score'] = 1 * (1.5 * merged['Own Brand'].fillna(0) + 1)                # 50% boost for own brand
    top_recommendations = (
        merged.groupby('ID Product')['score']
        .sum()
        .sort_values(ascending=False)
        .head(5)
        .index
        .tolist()
    )

    return top_recommendations


In [82]:
def did_you_forget(client_id, current_basket, df=transactions):
    """
    Main 'Did You Forget?' recommendation function that returns a DataFrame
    with product metadata instead of just product IDs.
    """
    if client_id in clients_with_transactions['Client ID'].values:
        recommended_ids = dyf_with_history(client_id, current_basket, df)
    elif client_id in clients_without_transactions['Client ID'].values:
        recommended_ids = dyf_without_history(current_basket, df)
    else:
        return pd.DataFrame()  # Return empty DataFrame if client is unknown

    # Filter the products DataFrame to get info for the recommended IDs
    recommended_df = products[products['ID Product'].isin(recommended_ids)]

    return recommended_df



## Examples of usage

In [85]:
# Client with history:

current_basket = [621958, 906800, 56019, 224780,46551] 
did_you_forget(210100263, current_basket)

  transaction_product = transaction_product.applymap(lambda x: 1 if x > 0 else 0)


Unnamed: 0,ID Product,Product Description,ID Product Category,Own Brand
49,26322,LEITE UHT MGORDO PARMALAT 1LT,LEITE UHT REGULAR,0
250,101583,CALDOS CARNE KNORR 1KG,CONDIMENTOS,0
1087,662082,"CARAMELO LÍQUIDO MCHEF 1,3KG",PRODUTOS PARA DOÇARIA,1
1127,667277,VINAGRE MCHEF V.BRANCO PET 2LT,VINAGRES,1
1525,735588,MOLHO PIZZA GULOSO SACO 3KG -BRIX 10/12,CONSERVAS VEGETAIS,0


In [86]:
# Client without history:

did_you_forget(210100016, current_basket) 

  client_product = client_product.applymap(lambda x: 1 if x > 0 else 0)


Unnamed: 0,ID Product,Product Description,ID Product Category,Own Brand
252,101590,CALDOS CARNE KNORR FRASCO 960GR,CONDIMENTOS,0
282,104671,ARROZ CAÇAROLA AGULHA 1 KG,ARROZ,0
907,578318,OLEO ALIMENTAR MCHEF 10 LT,ÓLEOS,1
1659,757619,"GEL.CAT.GOURMES BAUNILHA 4,5LT",GELADOS CATERING,1
2452,879894,ACUCAR AMANH BCO PAP KG,AÇÚCAR,0
