# Tutorial ALS con Implicit I

MAN 3160 - Sistemas Recomendadores

En este tutorial vamos a utilizar la biblioteca de Python [implicit](https://implicit.readthedocs.io/en/latest/quickstart.html) para implementar un sistema recomendador ALS.


## Importar Librerías

In [1]:
# Instalamos librerías para descarcar y descomprimir archivos.

!pip install wget
!pip install zipfile36


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


In [3]:
!pip3 install implicit --upgrade

Collecting implicit
  Downloading implicit-0.7.2-cp311-cp311-macosx_11_0_arm64.whl (761 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m761.6/761.6 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: implicit
Successfully installed implicit-0.7.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


In [4]:
import pandas as pd
import zipfile
import numpy as np
import implicit
import scipy.sparse as sparse

## Descarga del dataset

Al igual que en la tarea 1, para este tutorial utilizaremos el dataset MovieLens-100k proporcionado de [la Universidad de Minnesota](https://grouplens.org/). 

In [5]:
!wget http://files.grouplens.org/datasets/movielens/ml-100k.zip

--2024-04-06 19:09:51--  http://files.grouplens.org/datasets/movielens/ml-100k.zip
Resolviendo files.grouplens.org (files.grouplens.org)... 128.101.65.152
Conectando con files.grouplens.org (files.grouplens.org)[128.101.65.152]:80... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: 4924029 (4.7M) [application/zip]
Grabando a: «ml-100k.zip»


2024-04-06 19:09:53 (2.56 MB/s) - «ml-100k.zip» guardado [4924029/4924029]



In [6]:
with zipfile.ZipFile("ml-100k.zip", 'r') as zip_ref:
    zip_ref.extractall(".")

In [7]:
dir_train = 'ml-100k'

# Generamos los títulos de las columnas del archivo items.

columns = ['movieid', 'title', 'release_date', 'video_release_date', \
           'IMDb_URL', 'unknown', 'Action', 'Adventure', 'Animation', \
           'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', \
           'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', \
           'Thriller', 'War', 'Western']

In [8]:
# Primero creamos el dataframe con los datos
df_train = pd.read_csv(f'{dir_train}/u2.base',
                         sep='\t',
                         names=['userid', 'itemid', 'rating', 'timestamp'],
                         header=None)

In [9]:
df_train.head()

Unnamed: 0,userid,itemid,rating,timestamp
0,1,3,4,878542960
1,1,4,3,876893119
2,1,5,3,889751712
3,1,6,5,887431973
4,1,7,4,875071561


### Transfromación de dataset a feedback implícito

El dataset MovieLens cuenta con ratings dados por los usuarios a cada película. Esto significa que el dataset es explícito, por lo que un método como ALS no sería el apropiado en este caso. Sin embargo, podemos procesar el set de datos para simular un dataset con feedback implícito a partir de MovieLens.

In [10]:
# rating >= 3 , relevante (1) y rating menor a 3 es no relevante (0)
df_train.rating = [1 if x >=3 else 0 for x in df_train.rating ]

In [11]:
df_train.head(20)

Unnamed: 0,userid,itemid,rating,timestamp
0,1,3,1,878542960
1,1,4,1,876893119
2,1,5,1,889751712
3,1,6,1,887431973
4,1,7,1,875071561
5,1,10,1,875693118
6,1,11,0,875072262
7,1,12,1,878542960
8,1,13,1,875071805
9,1,14,1,874965706


In [12]:
# Cargamos el dataset con los items
df_items = pd.read_csv(f'{dir_train}/u.item',
                        sep='|',
                        index_col=0,
                        names = columns,
                        header=None,
                        encoding='latin-1')

In [13]:
df_items.head()

Unnamed: 0_level_0,title,release_date,video_release_date,IMDb_URL,unknown,Action,Adventure,Animation,Children,Comedy,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,0,0,0,0,0,0,0,0,0
2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


### Preprocesamiento de los datos a formato sparse

In [14]:
user_items = {}
itemset = set()

for row in df_train.itertuples():
    if row[3]:
        if row[1] not in user_items:
            user_items[row[1]] = []

        user_items[row[1]].append(row[2])
        itemset.add(row[2])

itemset = np.sort(list(itemset))

sparse_matrix = np.zeros((len(user_items), len(itemset)))

for i, items in enumerate(user_items.values()):
    sparse_matrix[i] = np.isin(itemset, items, assume_unique=True).astype(int)

user_item_matrix = sparse.csr_matrix(sparse_matrix)

user_ids = {key: i for i, key in enumerate(user_items.keys())}
items_ids = {key: i for i, key in enumerate(itemset)}

In [15]:
sparse_matrix.shape

(943, 1550)

## ALS (Alternating Least Squares)

### Entrenamiento de método

In [16]:
# Definimos y entrenamos el modelo con optimización ALS
model_als = implicit.als.AlternatingLeastSquares()
model_als.fit(user_item_matrix)

  check_blas_config()


  0%|          | 0/15 [00:00<?, ?it/s]

In [17]:
user_item_matrix

<943x1550 sparse matrix of type '<class 'numpy.float64'>'
	with 65962 stored elements in Compressed Sparse Row format>

### Generar recomendaciones

In [18]:
def show_recommendations(model, user, n):
    recommendations = model.recommend(userid=user_ids[user], user_items=user_item_matrix[user_ids[user]], N=n)
    return df_items.loc[itemset[recommendations[0]]]['title']

Ejemplo de recomendación y búsqueda de items similares con los factores latentes ya entrenados:

In [19]:
show_recommendations(model_als, user=61, n=10)

movieid
313                  Titanic (1997)
307    Devil's Advocate, The (1997)
302        L.A. Confidential (1997)
271        Starship Troopers (1997)
245         Devil's Own, The (1997)
315                Apt Pupil (1998)
300            Air Force One (1997)
322           Murder at 1600 (1997)
272        Good Will Hunting (1997)
316       As Good As It Gets (1997)
Name: title, dtype: object

### Ver títulos similares.

In [22]:
def show_similar_movies(model, item, n=10):
    print(f"Similar items to {df_items.loc[item]['title']}")
    sim_items = model.similar_items(items_ids[item], n)[0]
    sim_items = [itemset[i] for i in sim_items]
    return df_items.loc[sim_items]['title']

In [23]:
show_similar_movies(model_als, 313, 10)

Similar items to Titanic (1997)


movieid
313                       Titanic (1997)
1293                     Star Kid (1997)
272             Good Will Hunting (1997)
315                     Apt Pupil (1998)
751           Tomorrow Never Dies (1997)
1612             Leading Man, The (1996)
354           Wedding Singer, The (1998)
1236    Other Voices, Other Rooms (1997)
1294    Ayn Rand: A Sense of Life (1997)
895                      Scream 2 (1997)
Name: title, dtype: object