# Modelos Baseline

En este notebook se implementaran los modelos baseline del proyecto y se guardaran las métricas con el mismo dataset que se utilizará para el modelo principal para hacer bentchmarking

## Lectura de los datos

In [51]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from utils.recomender_metrics import prepare_ground_truth, evaluate_recommendations, print_evaluation_results
from collections import defaultdict
from utils.UKnn import UserKNN, calculate_mae, calculate_rmse

In [52]:
data_folder = '../data/raw/ml-100k/'

In [53]:
r_cols = ['user_id', 'item_id', 'rating', 'timestamp']

# Read the training and testing sets
train_df = pd.read_csv(f'{data_folder}u1.base', sep='\t', names=r_cols, encoding='latin-1')
test_df = pd.read_csv(f'{data_folder}u1.test', sep='\t', names=r_cols, encoding='latin-1')

train_df['rating'] = train_df['rating'].astype(int)
test_df['rating'] = test_df['rating'].astype(int)

train_df['rating'] = pd.to_numeric(train_df['rating'], errors='coerce').astype('Int64')
test_df['rating'] = pd.to_numeric(test_df['rating'], errors='coerce').astype('Int64')

print("Training Data Head:")
print(train_df.head())

Training Data Head:
   user_id  item_id  rating  timestamp
0        1        1       5  874965758
1        1        2       3  876893171
2        1        3       4  878542960
3        1        4       3  876893119
4        1        5       3  889751712


In [54]:
i_cols = [
    'item_id', 'title', 'release_date', 'video_release_date', 'IMDb_URL',
    'unknown', 'Action', 'Adventure', 'Animation', 'Children\'s', 'Comedy',
    'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
    'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western'
]

movies_df = pd.read_csv(f'{data_folder}u.item', sep='|', names=i_cols, encoding='latin-1')

print("\nMovie Data Head:")
print(movies_df.head())


Movie Data Head:
   item_id              title release_date  video_release_date  \
0        1   Toy Story (1995)  01-Jan-1995                 NaN   
1        2   GoldenEye (1995)  01-Jan-1995                 NaN   
2        3  Four Rooms (1995)  01-Jan-1995                 NaN   
3        4  Get Shorty (1995)  01-Jan-1995                 NaN   
4        5     Copycat (1995)  01-Jan-1995                 NaN   

                                            IMDb_URL  unknown  Action  \
0  http://us.imdb.com/M/title-exact?Toy%20Story%2...        0       0   
1  http://us.imdb.com/M/title-exact?GoldenEye%20(...        0       1   
2  http://us.imdb.com/M/title-exact?Four%20Rooms%...        0       0   
3  http://us.imdb.com/M/title-exact?Get%20Shorty%...        0       1   
4  http://us.imdb.com/M/title-exact?Copycat%20(1995)        0       0   

   Adventure  Animation  Children's  ...  Fantasy  Film-Noir  Horror  Musical  \
0          0          1           1  ...        0          0     

In [55]:
u_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']

users_df = pd.read_csv(f'{data_folder}u.user', sep='|', names=u_cols, encoding='latin-1')

# Display the first few rows of the user data
print("\nUser Data Head:")
print(users_df.head())


User Data Head:
   user_id  age gender  occupation zip_code
0        1   24      M  technician    85711
1        2   53      F       other    94043
2        3   23      M      writer    32067
3        4   24      M  technician    43537
4        5   33      F       other    15213


## Estructuras para métricas

Para calcular las métricas, necesitamos saber, por ejemplo, el `ground truth` o la popularidad de los items

In [56]:
# Agrupamos el DataFrame de test por usuario y convertimos los item_id de cada grupo en un conjunto (set)
ground_truth = test_df[test_df['rating'] > 4].groupby('user_id')['item_id'].apply(set).to_dict()

k_values = [10, 20, 50]

# Contamos popularidad como suma de ratings para dar más peso a los items mejor puntuados
item_popularity = train_df.groupby('item_id')['rating'].sum().to_dict()

# Creamos el diccionario iterando sobre el dataframe de películas
item_features = {}
for index, row in movies_df.iterrows():
    item_id = row['item_id']
    # Creamos un conjunto con los nombres de las columnas de género donde el valor es 1
    genres = {genre for genre in i_cols if row[genre] == 1}
    item_features[item_id] = genres

all_items = set(movies_df['item_id'])

## Random model

Generaremos, para cada usuario, aleatoriamente una lista de hasta 50 recomendaciones para poder evaluar las métricas a distintos puntos.

In [57]:
# Obtener todos los IDs de películas únicos
all_movie_ids = movies_df['item_id'].unique().tolist()

# Crear el diccionario de ítems vistos por usuario (SOLO con datos de entrenamiento)
user_seen_items = train_df.groupby('user_id')['item_id'].apply(set).to_dict()

# Obtener la lista de usuarios para los que generaremos recomendaciones
users_in_train = train_df['user_id'].unique().tolist()

In [58]:
print(f"Total de películas: {len(all_movie_ids)}")
print(f"Total de usuarios en el set de entrenamiento: {len(users_in_train)}")
print(f"Películas vistas por el usuario 1: {user_seen_items[1]}") 

Total de películas: 1682
Total de usuarios en el set de entrenamiento: 943
Películas vistas por el usuario 1: {1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 15, 16, 18, 19, 21, 22, 25, 26, 28, 29, 30, 32, 34, 35, 37, 38, 40, 41, 42, 43, 45, 46, 48, 50, 52, 55, 57, 58, 59, 63, 66, 68, 71, 75, 77, 79, 83, 87, 88, 89, 93, 94, 95, 99, 101, 105, 106, 109, 110, 111, 115, 116, 119, 122, 123, 124, 126, 127, 131, 133, 135, 136, 137, 138, 139, 141, 142, 144, 146, 147, 149, 152, 153, 156, 158, 162, 165, 166, 167, 168, 169, 172, 173, 176, 178, 179, 181, 182, 187, 191, 192, 194, 195, 197, 198, 199, 203, 204, 205, 207, 211, 216, 217, 220, 223, 231, 234, 237, 238, 239, 240, 244, 245, 246, 247, 249, 251, 256, 257, 261, 263, 268, 269, 270, 271}


In [59]:
print(len(train_df), len(test_df))

80000 20000


In [60]:
def generate_random_recommendations(users_to_recommend, all_movie_ids, user_seen_items, max_recommendations=50, random_state=42):
    """
    Genera recomendaciones aleatorias para una lista de usuarios,
    asegurándose de no recomendar ítems que ya han visto.
    """
    np.random.seed(random_state)
    recommendations = {}
    
    for user_id in users_to_recommend:
        # Obtener el conjunto de ítems que el usuario ya ha visto (del diccionario)
        seen_items = user_seen_items.get(user_id, set())
        
        # Calcular los ítems candidatos (todos menos los ya vistos)
        candidate_items = list(set(all_movie_ids) - seen_items)
        
        # Determinar cuántas recomendaciones generar
        n_recommendations = min(max_recommendations, len(candidate_items))
        
        # Si hay candidatos, seleccionar aleatoriamente
        if n_recommendations > 0:
            recommended_items = np.random.choice(candidate_items, size=n_recommendations, replace=False).tolist()
            recommendations[user_id] = recommended_items
        else:
            # En el caso improbable de que un usuario haya visto todo
            recommendations[user_id] = []
            
    return recommendations

In [61]:
random_recs = generate_random_recommendations(
    users_to_recommend=users_in_train,
    all_movie_ids=all_movie_ids,
    user_seen_items=user_seen_items,
    max_recommendations=50, # Generar hasta 50 recomendaciones por usuario
    random_state=42
)

In [62]:
# Run evaluation
results = evaluate_recommendations(
    recommendations=random_recs,
    ground_truth=ground_truth,
    k_values=k_values,
    item_popularity=item_popularity,
    all_items=all_items,
    item_features=item_features
)

print_evaluation_results(results)


RECOMMENDATION EVALUATION RESULTS

CATALOG COVERAGE:
  @10: 0.9970
  @20: 1.0000
  @50: 1.0000

F1:
  @10: 0.0057
  @20: 0.0069
  @50: 0.0097

INTRA LIST SIMILARITY:
  @10: 0.1807
  @20: 0.1807
  @50: 0.1820

MAP:
  @10: 0.0020
  @20: 0.0017
  @50: 0.0023

MRR:
  @10: 0.0171
  @20: 0.0204
  @50: 0.0240

NDCG:
  @10: 0.4288
  @20: 0.3621
  @50: 0.2803

NOVELTY:
  @10: 12.4810
  @20: 12.4729
  @50: 12.4791

PRECISION:
  @10: 0.0069
  @20: 0.0062
  @50: 0.0060

RECALL:
  @10: 0.0048
  @20: 0.0078
  @50: 0.0247



## Most popular items

En este caso, todas las recomendaciones seran iguales para todos los usuarios: recomendaremos las 50 películas más populares

In [63]:
# Agarramos el diccionario de popularidad y sacamos las 50 peliculas mas populares
most_popular_items = sorted(item_popularity, key=item_popularity.get, reverse=True)
most_popular_items = most_popular_items[:50]

# Ahora llenamos las recomendaciones con las mismas peliculas para todos los usuarios
pop_recs = {user_id: most_popular_items for user_id in users_in_train}

In [64]:
# Run evaluation
results = evaluate_recommendations(
    recommendations=pop_recs,
    ground_truth=ground_truth,
    k_values=k_values,
    item_popularity=item_popularity,
    all_items=all_items,
    item_features=item_features
)

print_evaluation_results(results)


RECOMMENDATION EVALUATION RESULTS

CATALOG COVERAGE:
  @10: 0.0059
  @20: 0.0119
  @50: 0.0297

F1:
  @10: 0.0882
  @20: 0.1005
  @50: 0.0978

INTRA LIST SIMILARITY:
  @10: 0.1497
  @20: 0.1935
  @50: 0.1776

MAP:
  @10: 0.0548
  @20: 0.0562
  @50: 0.0660

MRR:
  @10: 0.2258
  @20: 0.2377
  @50: 0.2420

NDCG:
  @10: 0.5616
  @20: 0.4781
  @50: 0.4241

NOVELTY:
  @10: 7.5304
  @20: 7.7083
  @50: 8.0123

PRECISION:
  @10: 0.0770
  @20: 0.0698
  @50: 0.0578

RECALL:
  @10: 0.1032
  @20: 0.1793
  @50: 0.3175



## User-based KNN

Como modelo informado base, escogemos el user-based KNN. Partiendo siempre desde la hipotesis de que tenemos ya bastentes datos de nuestros usuarios y ahora nuestro objetivo es recomendarles películas relevantes.

In [65]:
train_df_str = train_df.copy()
train_df_str['user_id'] = train_df_str['user_id'].astype(str)
train_df_str['item_id'] = train_df_str['item_id'].astype(str)

test_df_str = test_df.copy()
test_df_str['user_id'] = test_df_str['user_id'].astype(str)
test_df_str['item_id'] = test_df_str['item_id'].astype(str)

trainset = [tuple(row) for row in train_df_str[['user_id', 'item_id', 'rating']].values]
testset = [tuple(row) for row in test_df_str[['user_id', 'item_id', 'rating']].values]

In [66]:
myUserKnn = UserKNN(k=7, similarity='cosine')
myUserKnn.fit(trainset)

In [67]:
# Predicciones para todo el testset
predictions = myUserKnn.predict_all(testset)
rmse = calculate_rmse(predictions)
mae = calculate_mae(predictions)
print(f"RMSE: {rmse:.4f}, MAE: {mae:.4f}")

Item 599 unknown, returning user mean.
Item 711 unknown, returning user mean.
Item 814 unknown, returning user mean.
Item 830 unknown, returning user mean.
Item 852 unknown, returning user mean.
Item 857 unknown, returning user mean.


Item 1156 unknown, returning user mean.
Item 1236 unknown, returning user mean.
Item 1309 unknown, returning user mean.
Item 1310 unknown, returning user mean.
Item 1320 unknown, returning user mean.
Item 1343 unknown, returning user mean.
Item 1348 unknown, returning user mean.
Item 1364 unknown, returning user mean.
Item 1373 unknown, returning user mean.
Item 1457 unknown, returning user mean.
Item 1458 unknown, returning user mean.
Item 1492 unknown, returning user mean.
Item 1493 unknown, returning user mean.
Item 1498 unknown, returning user mean.
Item 1505 unknown, returning user mean.
Item 1520 unknown, returning user mean.
Item 1533 unknown, returning user mean.
Item 1536 unknown, returning user mean.
Item 1543 unknown, returning user mean.
Item 1557 unknown, returning user mean.
Item 1561 unknown, returning user mean.
Item 1562 unknown, returning user mean.
Item 1563 unknown, returning user mean.
Item 1565 unknown, returning user mean.
Item 1582 unknown, returning user mean.


In [None]:
users_in_train = train_df['user_id'].astype(str).unique().tolist()
top_n_all = myUserKnn.get_top_n(user_ids=users_in_train, n=50)

In [None]:
# Convertir top_n al formato correcto (solo item_ids) y keys a int
uknn_recs = {
    int(uid): [int(item_id) for item_id, _ in items]
    for uid, items in top_n_all.items()
}


In [None]:
results = evaluate_recommendations(
    recommendations=uknn_recs,
    ground_truth=ground_truth,
    k_values=k_values,
    item_popularity=item_popularity,
    all_items=all_items,
    item_features=item_features
)

print_evaluation_results(results)


RECOMMENDATION EVALUATION RESULTS

CATALOG COVERAGE:
  @10: 0.1367
  @20: 0.2111
  @50: 0.3502

F1:
  @10: 0.0180
  @20: 0.0303
  @50: 0.0710

INTRA LIST SIMILARITY:
  @10: 0.3709
  @20: 0.3400
  @50: 0.2392

MAP:
  @10: 0.0158
  @20: 0.0126
  @50: 0.0142

MRR:
  @10: 0.1080
  @20: 0.1198
  @50: 0.1300

NDCG:
  @10: 0.5182
  @20: 0.4280
  @50: 0.3553

NOVELTY:
  @10: 13.2394
  @20: 13.1858
  @50: 12.2810

PRECISION:
  @10: 0.0410
  @20: 0.0442
  @50: 0.0608

RECALL:
  @10: 0.0115
  @20: 0.0231
  @50: 0.0853



## DeepFM

Como modelo híbrido, utilizamos DeepFM, que combina Factorization Machines (FM) y Deep Neural Networks (DNN). El componente FM capta interacciones simples entre usuarios e ítems, mientras que el DNN aprende patrones complejos y no lineales. Gracias a los embeddings compartidos, DeepFM modela ambos niveles (patrones simples y complejos) asi evitando sobreajuste para un dataset como el nuestro


In [None]:
from utils.DeepFM import DeepFM


In [None]:
deepfm_model = DeepFM(
    embedding_dim=16,
    dnn_hidden_units=(128, 64),
    dnn_dropout=0.2,
    learning_rate=0.001,
    epochs=10,
    batch_size=2048,
)


In [None]:
# Filtrar test_df para solo incluir items que están en train
train_items = set(train_df['item_id'].unique())
test_df_filtered = test_df[test_df['item_id'].isin(train_items)].copy()

history = deepfm_model.fit(train_df, val_df=test_df_filtered)



cpu
Train on 80000 samples, validate on 19968 samples, 40 steps per epoch
Epoch 1/10
1s - loss:  0.6845 - auc:  0.6259 - val_auc:  0.7576
Epoch 2/10
0s - loss:  0.6461 - auc:  0.7674 - val_auc:  0.7630
Epoch 3/10
0s - loss:  0.5911 - auc:  0.7794 - val_auc:  0.7706
Epoch 4/10
0s - loss:  0.5702 - auc:  0.7894 - val_auc:  0.7742
Epoch 5/10
0s - loss:  0.5589 - auc:  0.7934 - val_auc:  0.7750
Epoch 6/10
0s - loss:  0.5524 - auc:  0.7945 - val_auc:  0.7767
Epoch 7/10
1s - loss:  0.5484 - auc:  0.7968 - val_auc:  0.7772
Epoch 8/10
1s - loss:  0.5459 - auc:  0.7979 - val_auc:  0.7773
Epoch 9/10
1s - loss:  0.5441 - auc:  0.7979 - val_auc:  0.7779
Epoch 10/10
0s - loss:  0.5425 - auc:  0.7964 - val_auc:  0.7791


In [None]:
# Convertir user_ids a string para consistencia
users_for_deepfm = train_df['user_id'].astype(str).unique().tolist()

# Crear DataFrame de test con todas las combinaciones user-item no vistas
user_seen_items_int = train_df.groupby('user_id')['item_id'].apply(set).to_dict()
all_items_list = train_df['item_id'].unique().tolist()

test_pairs = []
for user_id in train_df['user_id'].unique():
    seen = user_seen_items_int.get(user_id, set())
    unseen = [item for item in all_items_list if item not in seen]
    for item_id in unseen:
        test_pairs.append({'user_id': user_id, 'item_id': item_id, 'rating': 0})

test_for_pred = pd.DataFrame(test_pairs)

# Obtener top-N recomendaciones
deepfm_top_n = deepfm_model.get_top_n(test_for_pred, n=50)



In [None]:
# Convertir keys a int
deepfm_recs = {
    int(uid): items
    for uid, items in deepfm_recs.items()
}


In [None]:
results_deepfm = evaluate_recommendations(
    recommendations=deepfm_recs,
    ground_truth=ground_truth,
    k_values=k_values,
    item_popularity=item_popularity,
    all_items=all_items,
    item_features=item_features
)

print_evaluation_results(results_deepfm)



RECOMMENDATION EVALUATION RESULTS

CATALOG COVERAGE:
  @10: 0.0161
  @20: 0.0297
  @50: 0.0672

F1:
  @10: 0.0234
  @20: 0.0473
  @50: 0.0748

INTRA LIST SIMILARITY:
  @10: 0.2713
  @20: 0.2677
  @50: 0.2192

MAP:
  @10: 0.0142
  @20: 0.0162
  @50: 0.0149

MRR:
  @10: 0.0659
  @20: 0.0776
  @50: 0.0858

NDCG:
  @10: 0.4088
  @20: 0.3864
  @50: 0.3432

NOVELTY:
  @10: 11.0536
  @20: 10.7322
  @50: 11.1388

PRECISION:
  @10: 0.0566
  @20: 0.0696
  @50: 0.0617

RECALL:
  @10: 0.0147
  @20: 0.0358
  @50: 0.0950




## Ensamble híbrido

Implementamos un ensemble híbrido que combina las predicciones de User-KNN y DeepFM mediante ponderación. User-KNN aporta interpretabilidad y diversidad, mientras que DeepFM ofrece precisión al capturar interacciones no lineales. Con un peso 70/30 se buscaun equilibrio entre cobertura y exactitud, esperando un buen rendimiento individual de cada modelo.


In [None]:
from utils.ensemble import HybridEnsemble


In [None]:
# User-KNN ya tiene scores en top_n_all
uknn_with_scores = {
    int(uid): [(int(item_id), score) for item_id, score in items]
    for uid, items in top_n_all.items()
}

# DeepFM también tiene scores en deepfm_top_n
deepfm_with_scores = {
    int(uid): [(int(float(item_id)), score) for item_id, score in items]
    for uid, items in deepfm_top_n.items()
}


In [None]:
ensemble_weighted = HybridEnsemble(strategy='weighted', weights=[0.7, 0.3])
weighted_recs = ensemble_weighted.combine(deepfm_with_scores, uknn_with_scores, n=50)

results_weighted = evaluate_recommendations(
    recommendations=weighted_recs,
    ground_truth=ground_truth,
    k_values=k_values,
    item_popularity=item_popularity,
    all_items=all_items,
    item_features=item_features
)

print("=== ENSEMBLE WEIGHTED (70% DeepFM, 30% User-KNN) ===")
print_evaluation_results(results_weighted)


=== ENSEMBLE WEIGHTED (70% DeepFM, 30% User-KNN) ===

RECOMMENDATION EVALUATION RESULTS

CATALOG COVERAGE:
  @10: 0.0404
  @20: 0.1439
  @50: 0.3502

F1:
  @10: 0.0287
  @20: 0.0532
  @50: 0.0710

INTRA LIST SIMILARITY:
  @10: 0.2868
  @20: 0.2693
  @50: 0.2392

MAP:
  @10: 0.0177
  @20: 0.0173
  @50: 0.0153

MRR:
  @10: 0.1045
  @20: 0.1196
  @50: 0.1238

NDCG:
  @10: 0.4332
  @20: 0.3909
  @50: 0.3689

NOVELTY:
  @10: 11.7159
  @20: 11.6239
  @50: 12.2810

PRECISION:
  @10: 0.0638
  @20: 0.0708
  @50: 0.0608

RECALL:
  @10: 0.0185
  @20: 0.0426
  @50: 0.0853

