In [1]:
import pandas as pd
import numpy as np
import KNN.similarities as sim
import KNN.utils as utils
from tqdm import tqdm

# Calcular matrices de similitud

En este notebook calculamos las matrices de similitud que se utilizan en la evaluación del sistema. Cada una de las matrices se guarda en un fichero CSV.

## Matriz de similitud de propiedades de películas

Creamos una matriz de similitud entre las películas sobre sus propiedades. Para la matriz de similitud se usa la función *Jaccard*.

In [2]:
matrix_jaccard = sim.get_binary_similarity_matrix()
matrix_jaccard.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,154,155,156,157,158,159,160,161,162,163
0,0.0,0.428571,0.117647,0.125,0.176471,0.166667,0.125,0.125,0.117647,0.176471,...,0.125,0.117647,0.117647,0.133333,0.1875,0.125,0.125,0.111111,0.125,0.125
1,0.428571,0.0,0.117647,0.125,0.176471,0.166667,0.125,0.125,0.117647,0.176471,...,0.125,0.117647,0.117647,0.133333,0.1875,0.125,0.125,0.111111,0.125,0.125
2,0.117647,0.117647,0.0,0.133333,0.266667,0.111111,0.133333,0.307692,0.285714,0.117647,...,0.133333,0.285714,0.125,0.142857,0.125,0.133333,0.133333,0.117647,0.133333,0.214286
3,0.125,0.125,0.133333,0.0,0.125,0.055556,0.142857,0.142857,0.214286,0.2,...,0.230769,0.133333,0.307692,0.153846,0.214286,0.142857,0.230769,0.2,0.230769,0.230769
4,0.176471,0.176471,0.266667,0.125,0.0,0.235294,0.285714,0.285714,0.1875,0.176471,...,0.125,0.266667,0.117647,0.133333,0.1875,0.125,0.125,0.111111,0.125,0.2


In [3]:
matrix_jaccard.to_csv('data/similarity_data/jaccard_matrix.csv', index = False)

## Matrices de similitud sobre Q

A continuación, calculamos las matrices de similitud de cada usuario aplicando su matriz Q. Se utilizarán las funciones de similitud `euclidean`, `cosine` y `manhattan`.

In [4]:
# Cargamos a los usuarios
ratings_DF = pd.read_csv('data/experiment_data/trainset.csv')
users = list(set(ratings_DF['userId']))
functions = ['euclidean', 'cosine', 'manhattan']

In [5]:
for u in tqdm(range(len(users))):
    for f in range(len(functions)):
        # Calculamos la matriz filtrada de similitud
        if functions[f] == 'euclidean':
            matrix = sim.get_similarity_matrix(users[u], sim.euclidean_sim)
            matrix.to_csv('data/similarity_data/sim_{}_user_{}.csv'.format(functions[f], users[u]), index=False)
        elif functions[f] == 'cosine':
            matrix = sim.get_similarity_matrix(users[u], sim.cosine_sim)
            matrix.to_csv('data/similarity_data/sim_{}_user_{}.csv'.format(functions[f], users[u]), index=False)
        elif functions[f] == 'manhattan':
            matrix = sim.get_similarity_matrix(users[u], sim.manhattan_sim)
            matrix.to_csv('data/similarity_data/sim_{}_user_{}.csv'.format(functions[f], users[u]), index=False)

100%|██████████| 584/584 [05:59<00:00,  1.61it/s]


## Siguiente Notebook

Ya tenemos las matrices de similitud preparadas. A continuación, hay que evaluar el sitema. Para ello, ejecute el notebook `evaluar_KNN`.