# **BUSINESS CASE 3: Recheio Rocommendation System**  


## 🎓 Master’s Program in Data Science & Advanced Analytics 
**Nova IMS** | March 2025   
**Course:** Business Cases with Data Science

## 👥 Team **Group A**  
- **Alice Viegas** | 20240572  
- **Bernardo Faria** | 20240579  
- **Dinis Pinto** | 20240612  
- **Daan van Holten** | 20240681
- **Philippe Dutranoit** | 20240518

## 📊 Project Overview  
This notebook utilizes the following datasets:  
- Case3_Recheio_2025 (1).xlsx <br>
- The goal of the project is to design a recomendation system so that the company can propose better products to existing costumers.

## 📊 Goal of the notebook

In this notebook we will make recomendations for the clients for with we have no transaction information. <br>

**Table of Contents** <br>
* [1. Initial Setup and DataLoading](#setup)
* [2. Best Seller Recommendations](#best)
* [3. Export](#export)

<hr>
<a class="anchor" id="setup">

 ## 1. Initial Setup and Data Loading
 </a>

In [39]:
#Packages
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

In [None]:
clients_no_transactions = pd.read_csv("../Data/clients_without_transactions.csv")
df_transactions = pd.read_csv("../Data/df.csv")
products = pd.read_csv("../Data/products_fixed.csv")

<hr>
<a class="anchor" id="best">

 ## 2. Best Seller Recommendations
 </a>

In [42]:
# Exploring some populatity rankings for 'Best Selling Products'


df = df_transactions.copy()
df['Date'] = pd.to_datetime(df['Date'])

# Define time reference
today = df['Date'].max()
df['days_ago'] = (today - df['Date']).dt.days

# Create masks for time windows
mask_7d = df['days_ago'] <= 7
mask_14d = df['days_ago'] <= 14
mask_30d = df['days_ago'] <= 30

# Count appearances per product in each period
sales_7d = df[mask_7d]['ID Product'].value_counts().rename('sales_7d')
sales_14d = df[mask_14d]['ID Product'].value_counts().rename('sales_14d')
sales_30d = df[mask_30d]['ID Product'].value_counts().rename('sales_30d')

# Combine
ranking_df = pd.concat([sales_7d, sales_14d, sales_30d], axis=1).fillna(0)

# Add weekly/monthly hybrid count
ranking_df['hybrid_7_30'] = ranking_df['sales_7d'] * 0.6 + ranking_df['sales_30d'] * 0.4

# Testing some rankings of products
ranking_df['rank_14d'] = ranking_df['sales_14d'].rank(ascending=False, method='min')
ranking_df['rank_hybrid'] = ranking_df['hybrid_7_30'].rank(ascending=False, method='min')


print(ranking_df.sort_values('rank_hybrid').head(10))


            sales_7d  sales_14d  sales_30d  hybrid_7_30  rank_14d  rank_hybrid
ID Product                                                                    
621958         234.0      470.0       1009        544.0       1.0          1.0
879894         176.0      351.0        709        389.2       2.0          2.0
733725          72.0      170.0        344        180.8       3.0          3.0
890937          64.0      147.0        344        176.0       6.0          4.0
578318          77.0      148.0        314        171.8       4.0          5.0
915238          74.0      145.0        303        165.6       7.0          6.0
749441          69.0      148.0        310        165.4       4.0          7.0
659149          66.0      141.0        312        164.4       8.0          8.0
665738          65.0      131.0        298        158.2      11.0          9.0
370149          62.0      133.0        288        152.4      10.0         10.0


Having compared a pure 14-day window with a ***hybrid 7-/30-day approach***, we opted for the latter. 

Prioritizing 7-day sales lets us capture fast-moving items and short-term spikes—driven by promotions, seasonality, or weather—while still incorporating 30-day data to smooth out volatility and spotlight consistently popular products. This blend keeps our recommendations both responsive and reliable, ensuring they respond to emerging trends without sacrificing the stability of proven best-sellers—perfect for a supermarket with varied, rapidly changing demand. 

In [None]:
# Sorting products by the hybrid ranking and getting the top 10 
ranking_df.sort_values('rank_hybrid', ascending=True, inplace=True)
top10_IDs = ranking_df.head(10).index

In [47]:
top10 = products[products['ID Product'].isin(top10_IDs)]
top10

Unnamed: 0,ID Product,Product Description,ID Product Category
7,370149,CENOURA SC10KG (CAL25/40) RCH,LEGUMES FRESCOS
10,621958,LEITE MCHEF UHT M/GORDO LT,LEITE UHT REGULAR
26,578318,OLEO ALIMENTAR MCHEF 10 LT,ÓLEOS
52,879894,ACUCAR AMANH BCO PAP KG,AÇÚCAR
104,915238,DET LOICA MCHEF 10LT,DETERGENTE LOIÇA
157,665738,VINAGRE AMANHECER DE VINHO BRANCO 1000M,VINAGRES
170,890937,LIXIVIA MCHEF TRADICIONAL 5LT,LIXÍVIAS TRADICIONAIS
289,659149,FARINHA AMANH S/FERMENTO 1KG,FARINHAS
1454,749441,AGUA MINERAL CALDAS DE PENACOVA 5LT,ÁGUAS LISAS
3708,733725,OVO AGRO OVO M 15DUZ IND,OVOS GAIOLA


For clients without any transaction history, these will be the top 10 recommended products based on overall popularity. This list will update dynamically over time to reflect the most recent trends.

<hr>
<a class="anchor" id="export">

## 3. Export
</a>

In [48]:
top10.to_csv('../Data/recommendations_clients_no_transactions.csv', index=False)