# 3er Entregable

Integrantes:
- Araoz, Tania
- Bajo, Pablo
- Barrera, Manuel

## Carga de librerias a utilizar 

In [59]:
import pandas as pd
from datetime import datetime
from scipy.sparse import csr_matrix
from lightfm import LightFM
import numpy as np
from sklearn.model_selection import GridSearchCV

## Carga de datasets

In [60]:
movies = pd.read_csv("../data/ml-latest/movies.csv")
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


Dataset de ratings con interacciones

In [61]:
ratings = pd.read_csv("../data/ml-latest/ratings.csv").sample(300000, random_state=42).sort_values(by='timestamp', ascending=True).reset_index(drop=True)
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,86847,34,5.0,822873600
1,268919,23,2.0,823185233
2,25445,52,4.0,823255313
3,80271,70,3.0,823264587
4,311037,74,4.0,823867612


> Como el data set es muy grande, se recorta a 300mil registros, se utiliza el método sample de pandas con random state para tener siempre el mismo dataset.

In [62]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   userId     300000 non-null  int64  
 1   movieId    300000 non-null  int64  
 2   rating     300000 non-null  float64
 3   timestamp  300000 non-null  int64  
dtypes: float64(1), int64(3)
memory usage: 9.2 MB


> El dataset contiene 300000 interacciones.

> El timestamp está en formato int64, se debe convertir a formato fecha para poder trabajar.

In [63]:
ratings.isna().sum()

userId       0
movieId      0
rating       0
timestamp    0
dtype: int64

> No hay valores nulos

In [64]:
ratings['userId'].nunique()

119409

> El dataset tiene 119409 ususarios.

In [65]:
ratings['movieId'].nunique()

17006

> el dataset contiene ratings de 17006 peliculas.

In [66]:
ratings['rating'].sort_values(ascending=True).unique()

array([0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])

> Los valores posibles de ratings van del 0.5 al 5, con un incremento de 0.5. 

## Preprocesado

Convertimos el timestamp numerico en formato fecha

In [67]:
ratings["timestamp"] = ratings["timestamp"].apply(lambda x: datetime.utcfromtimestamp(x).strftime('%Y/%m/%d'))

In [68]:
ratings["timestamp"]

0         1996/01/29
1         1996/02/01
2         1996/02/02
3         1996/02/02
4         1996/02/09
             ...    
299995    2023/07/19
299996    2023/07/19
299997    2023/07/20
299998    2023/07/20
299999    2023/07/20
Name: timestamp, Length: 300000, dtype: object

> Vemos que la fecha tiene un formato de fecha, pero la columna es de tipo object

Utilizando pandas convertimos a un formato de fechas que permita el filtrado

In [69]:
ratings["timestamp"] = pd.to_datetime(ratings['timestamp'], format='%Y/%m/%d')

In [70]:
ratings["timestamp"]

0        1996-01-29
1        1996-02-01
2        1996-02-02
3        1996-02-02
4        1996-02-09
            ...    
299995   2023-07-19
299996   2023-07-19
299997   2023-07-20
299998   2023-07-20
299999   2023-07-20
Name: timestamp, Length: 300000, dtype: datetime64[ns]

> Vemos que la columna tiene el formato datetime64

In [71]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,86847,34,5.0,1996-01-29
1,268919,23,2.0,1996-02-01
2,25445,52,4.0,1996-02-02
3,80271,70,3.0,1996-02-02
4,311037,74,4.0,1996-02-09


> Nos queda un dataset con formato de fecha de fácil manipulación

Vemos el rango de fechas del dataset

In [72]:
ratings.timestamp.min()

Timestamp('1996-01-29 00:00:00')

In [73]:
ratings.timestamp.max()

Timestamp('2023-07-20 00:00:00')

> Vemos que el rango de fechas va desde el 29/03/1996 al 24/09/24

### Dividimos dataset en train, test y validation
Vemos la catidad de ratings por año

In [74]:
plot_df = ratings.copy()
plot_df["year"] = ratings.timestamp.dt.year
plot_df = plot_df.groupby("year", as_index=False).count()[["year", "userId"]]
plot_df.columns = ["year", "reviews_count"]
plot_df.head(25)

Unnamed: 0,year,reviews_count
0,1996,15406
1,1997,6824
2,1998,2927
3,1999,10944
4,2000,18405
5,2001,10987
6,2002,8038
7,2003,9622
8,2004,10693
9,2005,16230


> Tomamos una proporción 80/20 para dividir el dataset en train - test

In [75]:
train = ratings[(ratings.timestamp < datetime(year=2017, month=1, day=1))]
train.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,86847,34,5.0,1996-01-29
1,268919,23,2.0,1996-02-01
2,25445,52,4.0,1996-02-02
3,80271,70,3.0,1996-02-02
4,311037,74,4.0,1996-02-09


In [76]:
train.shape

(217239, 4)

> Dataset de train quedan 217239 interacciones

In [77]:
test = ratings[ratings.timestamp >= datetime(year=2017, month=1, day=1)]
test.head()

Unnamed: 0,userId,movieId,rating,timestamp
217239,291055,5218,4.0,2017-01-01
217240,291055,135887,4.5,2017-01-01
217241,291055,7451,2.0,2017-01-01
217242,291055,5444,2.5,2017-01-01
217243,178800,40339,2.5,2017-01-01


In [78]:
test.shape

(82761, 4)

In [79]:
test.userId.nunique()

30230

> El dataset de test tiene 30230 ususarios.

In [80]:
test.movieId.nunique()

11954

> el dataset test contiene ratings de 11954 peliculas.

In [81]:
plot_df = train.copy()
plot_df["year"] = train.timestamp.dt.year
plot_df = plot_df.groupby("year", as_index=False).count()[["year", "userId"]]
plot_df.columns = ["year", "reviews_count"]
plot_df.head(25)

Unnamed: 0,year,reviews_count
0,1996,15406
1,1997,6824
2,1998,2927
3,1999,10944
4,2000,18405
5,2001,10987
6,2002,8038
7,2003,9622
8,2004,10693
9,2005,16230


> Definimos el conjunto de validación, en función de nuestro conjunto de entrenamiento. Con cantidad 90/10.

In [82]:
validation = train[train.timestamp >= datetime(year=2015, month=1, day=1)]
validation.head()

Unnamed: 0,userId,movieId,rating,timestamp
182910,201998,4701,3.5,2015-01-01
182911,61766,63859,3.0,2015-01-01
182912,61766,74685,0.5,2015-01-01
182913,215460,5218,2.5,2015-01-01
182914,188012,112552,5.0,2015-01-01


In [83]:
validation.shape

(34329, 4)

In [84]:
validation.userId.nunique()

14731

> El dataset de validacion tiene 14731 ususarios.

In [85]:
validation.movieId.nunique()

6010

> el dataset validación contiene ratings de 6010 peliculas.

In [86]:
plot_df = validation.copy()
plot_df["year"] = validation.timestamp.dt.year
plot_df = plot_df.groupby("year", as_index=False).count()[["year", "userId"]]
plot_df.columns = ["year", "reviews_count"]
plot_df.head(25)

Unnamed: 0,year,reviews_count
0,2015,16411
1,2016,17918


Redefinimos el conjunto de entrenamiento.

In [87]:
train = train[(train.timestamp < datetime(year=2015, month=1, day=1))]
train.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,86847,34,5.0,1996-01-29
1,268919,23,2.0,1996-02-01
2,25445,52,4.0,1996-02-02
3,80271,70,3.0,1996-02-02
4,311037,74,4.0,1996-02-09


In [88]:
plot_df = train.copy()
plot_df["year"] = train.timestamp.dt.year
plot_df = plot_df.groupby("year", as_index=False).count()[["year", "userId"]]
plot_df.columns = ["year", "reviews_count"]
plot_df.head(25)

Unnamed: 0,year,reviews_count
0,1996,15406
1,1997,6824
2,1998,2927
3,1999,10944
4,2000,18405
5,2001,10987
6,2002,8038
7,2003,9622
8,2004,10693
9,2005,16230


In [89]:
train.shape

(182910, 4)

In [90]:
train.userId.nunique()

78748

> El dataset train tiene 78748 ususarios.

In [91]:
train.movieId.nunique()

9542

> el dataset train contiene ratings de 9542 peliculas.

> La matriz resultante deberá tener 78748 filas representando a los usuarios, y 9542 columnas representando las películas

### CooldStart
Analizamos los casos de coldstart en cada dataset

In [92]:
test[~test.userId.isin(train.userId.unique())].userId.nunique()

28749

> Tenemos 28749 Usuarios que se encuentra en el dataset de test y no en el de train.

In [93]:
validation[~validation.userId.isin(train.userId.unique())].userId.nunique()

13275

> Tenemos 13275 Usuarios que se encuentra en el dataset de validation y no en el de train.

### Matriz de Interacciones
Para el armado de la matriz de interaccion nos quedamos con las columnas que vamos a usar

In [94]:
interactions_train = train[["userId", "movieId", "rating"]].copy()
interactions_train.head()

Unnamed: 0,userId,movieId,rating
0,86847,34,5.0
1,268919,23,2.0
2,25445,52,4.0
3,80271,70,3.0
4,311037,74,4.0


Armamos la matriz con metodo pivot de pandas

In [95]:
interactions_matrix = interactions_train.pivot(index="userId", columns="movieId", values="rating")

In [96]:
interactions_matrix.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,117865,118101,118198,118344,118492,118696,118866,119141,119804,120128
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,,,,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
14,,,,,,,,,,,...,,,,,,,,,,
21,,,,,,,,,,,...,,,,,,,,,,
24,,,,,,,,,,,...,,,,,,,,,,


Completamos los valores nulos cons 0

In [97]:
interactions_matrix = interactions_matrix.fillna(0)

MemoryError: Unable to allocate 5.60 GiB for an array with shape (78748, 9542) and data type float64

In [None]:
interactions_matrix.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,117865,118101,118198,118344,118492,118696,118866,119141,119804,120128
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
21,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
interactions_matrix.shape

(78748, 9542)

> Nos queda una matriz de 78748 usuarios y 9542 películas, coincidiendo con lo visto anteriormente.

Reducimos el tamaño en memoria de la matriz para poder entrenar el modelo

In [None]:
interactions_matrix_csr = csr_matrix(interactions_matrix.values)

Creamos un diccionario para mapear el id de usuario con su índice en la matriz

In [None]:
user_ids = list(interactions_matrix.index)
user_map = {}
counter = 0
for i in user_ids:
    user_map[i] = counter
    counter += 1

In [None]:
user_map

{6: 0,
 7: 1,
 14: 2,
 21: 3,
 24: 4,
 25: 5,
 31: 6,
 35: 7,
 39: 8,
 41: 9,
 43: 10,
 45: 11,
 48: 12,
 51: 13,
 53: 14,
 62: 15,
 63: 16,
 64: 17,
 69: 18,
 72: 19,
 73: 20,
 79: 21,
 82: 22,
 88: 23,
 101: 24,
 111: 25,
 113: 26,
 117: 27,
 118: 28,
 119: 29,
 123: 30,
 124: 31,
 126: 32,
 128: 33,
 135: 34,
 137: 35,
 141: 36,
 142: 37,
 144: 38,
 148: 39,
 149: 40,
 153: 41,
 155: 42,
 157: 43,
 160: 44,
 164: 45,
 174: 46,
 176: 47,
 177: 48,
 181: 49,
 183: 50,
 187: 51,
 189: 52,
 190: 53,
 193: 54,
 198: 55,
 202: 56,
 207: 57,
 212: 58,
 214: 59,
 222: 60,
 223: 61,
 224: 62,
 227: 63,
 229: 64,
 231: 65,
 233: 66,
 240: 67,
 241: 68,
 246: 69,
 249: 70,
 255: 71,
 260: 72,
 262: 73,
 263: 74,
 265: 75,
 267: 76,
 270: 77,
 283: 78,
 284: 79,
 289: 80,
 311: 81,
 314: 82,
 315: 83,
 318: 84,
 322: 85,
 326: 86,
 328: 87,
 330: 88,
 332: 89,
 335: 90,
 339: 91,
 347: 92,
 349: 93,
 367: 94,
 381: 95,
 384: 96,
 385: 97,
 392: 98,
 408: 99,
 412: 100,
 417: 101,
 423: 102,
 42

## Entrenamiento de Modelo
Utilizamos GridSearch para encontrar los mejores hiperparametros

In [None]:
%%time
model = LightFM()

param_grid = {
    'loss': ['logistic'],
    'no_components': [30, 50, 70, 90],
    'learning_rate': [0.03, 0.05, 0.1, 0.2],
}

grid_search = GridSearchCV(model, param_grid, cv=5, scoring='precision', n_jobs=-1)

grid_search.fit(interactions_matrix_csr, epochs=10)

best_params = grid_search.best_params_



Wall time: 7min 53s


Vemos cuales son los mejores hiperparametros

In [None]:
best_params

{'learning_rate': 0.03, 'loss': 'logistic', 'no_components': 30}

#### Entrenamiento de modelo
Entrenamos el modelo con los mejores hiperparametros

In [None]:
%%time
model = LightFM(no_components=30, random_state=100, learning_rate=0.03, loss='logistic')

Wall time: 0 ns


In [None]:
%%time
model = model.fit(interactions_matrix_csr, epochs=50)

Wall time: 1min 40s


In [None]:
model

<lightfm.lightfm.LightFM at 0x15e048f68d0>

### Metodos auxiliares
Metodo para obtener las recomendaciones en caso de ColdStart

In [None]:
def getColdStarRecomm(dataset, no_recom): 
    '''
    Esta funcion recibe el dataset y retonar una lista de tamaño no_recom con recomendaciones mas populares 
    '''
    recomm = dataset.groupby("movieId", as_index=False).agg({"userId":"nunique"}).sort_values(by="userId", ascending=False)
    recomm.columns=["movieId", "popularity"]
    return recomm.movieId.values[:no_recom]

In [None]:
def getColdStarRandomRecomm(dataset, no_recom):
    '''
    Esta funcion recibe el dataset y retonar una lista de tamaño no_recom con recomendaciones al azar 
    '''
    recomm = dataset[[ 'movieId', 'userId']].sample(20, random_state=42)
    recomm.columns=["movieId", "popularity"]
    return recomm.movieId.values[:no_recom]

Metodo para obtener los scores a partir del modelo

In [None]:
def predict(user):
    '''
    Esta funcion recibe un usuario y retorna lista de scores
    '''
    user_predic = user_map[user]
    n_users, n_items = interactions_matrix.shape
    item_ids = np.arange(n_items)
    return model.predict(user_ids=user_predic, item_ids = item_ids)

Metodo para obtener las recomendaciones ordenadas

In [None]:
def getOrderedMoviesId(preds, no_recom):
    '''
    Esta funcion recibe los scores predecidos y retorna lista ordenada de moviesId
    '''
    recomm = pd.Series(preds)
    recomm.index = interactions_matrix.columns
    return list(pd.Series(recomm.sort_values(ascending=False).index))[:no_recom]

Metodo para eliminar de las recomendaciones las peliculas ya vistas

In [None]:
def getNotWatchedMovieId(user, recomm):
    '''
    Esta funcion recibe las recomendaciones y retorna lista de recomendaciones sin las peliculas vistas por el usuario
    '''
    watched = train[train.userId == user].movieId.unique()
    return [x for x in recomm if x not in watched][:20]

In [None]:
def recomm(user, no_recom, strategy):
    result = None
    if user in list(interactions_matrix.index):
        score = predict(user)
        recomm = getOrderedMoviesId(score, no_recom)
        result = getNotWatchedMovieId(user, recomm)
    else:
        if strategy == 1:
            result = getColdStarRecomm(train, 20)
        if strategy == 2:
            result = getColdStarRandomRecomm(train, 20)
    return result

### Recomendaciones
Generamos recomendaciones para todos los ususarios de validation variando la estrategia de ColdStart

Primero usamos recomendaciones con coldstart de mas populares 

In [None]:
def recommAll_Validation(no_recom, strategy):
    recomms_dict = {
        'user_id': [],
        'recomms': []
    }
    for user in validation.userId.unique():
        recomms_dict['user_id'].append(user)
        recomms_dict['recomms'].append(recomm(user, no_recom, strategy))
    return pd.DataFrame(recomms_dict)


In [None]:
%%time
pd.options.display.max_colwidth = None
df_popular = recommAll_Validation(100, 1)#Populares
df_popular.head(3)

Recomendaciones con coldstar random

In [None]:
%%time
df_random = recommAll_Validation(100, 2)#Random
df_random.head(3)

### Comparación
Primero generamos la recomendaciones ideales del conjunto de validación.

In [None]:
interactions_validation = validation[["userId", "movieId", "rating"]].copy()
interactions_validation.head()

In [None]:
ideal_recomms = interactions_validation.sort_values(by=["userId", "rating"], ascending=False)\
                  .groupby("userId", as_index=False)\
                  .agg({"movieId": "unique"})
ideal_recomms.head(5)

Proximo paso sería hacer nuevos dataframe, en donde combinariamos lo recomendado con la lista ideal para poder hacer la comparación.

Primero con recomendaciones con estrategia de coldstart con películas populares

In [None]:
merged_df_popular = pd.merge(df_popular, ideal_recomms, left_on='user_id', right_on='userId', how='inner')
merged_df_popular = merged_df_popular.rename(columns={'recomms': 'recomms_df', 'movieId': 'recomms_ideal'})
merged_df_popular = merged_df_popular.drop(columns=['userId'])
merged_df_popular.head()

Hacemos lo mismo pero con las recomendaciones con estrategia de ColdStart de películas al azar

In [None]:
merged_df_random = pd.merge(df_random, ideal_recomms, left_on='user_id', right_on='userId', how='inner')
merged_df_random = merged_df_random.rename(columns={'recomms': 'recomms_df', 'movieId': 'recomms_ideal'})
merged_df_random = merged_df_random.drop(columns=['userId'])
merged_df_random.head()

## MAP - Métricas
Utilizamos dos algoritmos de cálculos de métrica Map

In [None]:
def map(dataframe):
  aps = []
  for pred, label in dataframe[["recomms_ideal", "recomms_df"]].values:
    n = len(pred)
    arange = np.arange(n, dtype=np.int32) + 1.
    rel_k = np.in1d(pred[:n], label)
    tp = np.ones(rel_k.sum(), dtype=np.int32).cumsum()
    denom = arange[rel_k]
    ap = (tp / denom).sum() / len(label)
    aps.append(ap)
  return aps

Algoritmo investigado como alternativa

In [None]:
def calculate_map(recommendations, test):
    average_precisions = []
    for i in range(len(test)):
        actual = test[i]
        predicted = recommendations[i]
        relevant_indices = np.nonzero(actual)[0]
        if len(relevant_indices) == 0:
            continue
        precision_sum = 0.0
        num_hits = 0
        for j, item in enumerate(predicted):
            if item in relevant_indices:
                num_hits += 1
                precision_sum += num_hits / (j + 1)
        average_precision = precision_sum / len(relevant_indices)
        average_precisions.append(average_precision)
    return np.mean(average_precisions)

Utilizamos ambos algoritmos para la estrategia de coldstart de peliculas populares

In [None]:
MAP_1 = np.mean(map(merged_df_popular))
print(f'Mean Average Precision = {round(MAP_1, 5)}')

In [None]:
MAP_2 = calculate_map(merged_df_popular["recomms_df"], merged_df_popular["recomms_ideal"])
print(f'Mean Average Precision = {round(MAP_2, 5)}')

Realizamos lo mismo pero con estrategia de coldstart con peliculas al azar

In [None]:
MAP_1 = np.mean(map(merged_df_random))
print(f'Mean Average Precision = {round(MAP_1, 5)}')

In [None]:
MAP_2 = calculate_map(merged_df_random["recomms_df"], merged_df_random["recomms_ideal"])
print(f'Mean Average Precision = {round(MAP_2, 5)}')

> Vemos que, con la estrategia de ColdStart con peliculas populares obtenemos mejores metricas 

### Evaluamos las metricas con dataset de test
Como vimos que la mejor estrategia es la de ColdStart de películas populares, utilizamos esta para evaluar con dataset de test

In [None]:
def recommAll_test(no_recom, strategy):
    recomms_dict = {
        'user_id': [],
        'recomms': []
    }
    for user in test.userId.unique():
        recomms_dict['user_id'].append(user)
        recomms_dict['recomms'].append(recomm(user, no_recom, strategy))
    return pd.DataFrame(recomms_dict)

Generamos las recomendaciones

In [None]:
%%time
pd.options.display.max_colwidth = None
df_test = recommAll_test(100, 1)#Populares
df_test

Recomendaciones ideales del conjunto de test.

In [None]:
interactions_test = test[["userId", "movieId", "rating"]].copy()
interactions_test.head()

In [None]:
ideal_recomms2 = interactions_test.sort_values(by=["userId", "rating"], ascending=False)\
                  .groupby("userId", as_index=False)\
                  .agg({"movieId": "unique"})
ideal_recomms2

Dataframe, en donde combinariamos lo recomendado con la lista ideal para poder hacer la comparación.

In [None]:
merged_df_test = pd.merge(df_test, ideal_recomms2, left_on='user_id', right_on='userId', how='inner')
merged_df_test = merged_df_test.rename(columns={'recomms': 'recomms_df', 'movieId': 'recomms_ideal'})
merged_df_test = merged_df_test.drop(columns=['userId'])

merged_df_test.head()

Generamos las métricas

In [None]:
MAP_1 = np.mean(map(merged_df_test))
print(f'Mean Average Precision = {round(MAP_1, 5)}')

In [None]:
MAP_2 = calculate_map(merged_df_test["recomms_df"], merged_df_test["recomms_ideal"])
print(f'Mean Average Precision = {round(MAP_2, 5)}')