# Sistemas de recomendación con LightFM
En este notebook se obtienen los distintos tipos de modelos (colaborativo, basado en contenido e híbrido) con el conjunto de datos de MovieLens y la librería LightFM.

In [32]:
# Importar todo lo necesario
import numpy as np
import pandas as pd
import scipy as sp
from lightfm import LightFM
from lightfm.evaluation import precision_at_k
from lightfm.data import Dataset

## Recomendaciones

In [33]:
# Pequeña función para ver las recomendaciones
def sample_recommendation(model, data, user_ids, items_df):
    n_users, n_items = data.shape

    for user_id in user_ids:
        known_positives = items_df['Título'][data.tocsr()[user_id].indices]
        scores = model.predict(user_id, np.arange(n_items))
        top_items = items_df['Título'][np.argsort(-scores)]
        
        print("User %s" % user_id)
        print("    Known positives:")
        
        for x in known_positives[:3]:
            print("        %s" % x)
            
        print("    Recommended:")
        
        for x in top_items[:3]:
            print("         %s" % x)

## MovieLens

### Obtención de los dataframes

In [34]:
# Obtención del dataframe de ml_data
ml_data_df = pd.read_csv('data/movielens/ml_data.csv', delim_whitespace=True, names=['Id Usuario','Id Película','Valoración','Fecha'])

# Descomentar para comprobar que el dataframe se ha obtenido correctamente
#data_df

# Obtención del dataframe de user
ml_user_df = pd.read_csv('data/movielens/user.csv', sep='|', names=['Id Usuario', 'Edad', 'Género', 'Ocupación', 'Código Postal'])

# Descomentar para comprobar que el dataframe se ha obtenido correctamente
#user_df

# Obtención del dataframe de ml_items
ml_items_df = pd.read_csv('data/movielens/ml_items.csv', sep='|',
    names=['Id Película','Título','Fecha de estreno','Fecha DVD','iMDB','Género desconocido','Acción','Aventura','Animación','Infantil','Comedia', 'Crimen','Docuemntal','Drama','Fantasía','Cine negro','Horror','Musical','Misterio','Romance','Ciencia ficción','Thriller','Bélico','Western'],
    encoding='latin-1')

# Descomentar para comprobar que el dataframe se ha obtenido correctamente
#items_df

### Obtención del dataset y de las matrices   
Convierto los dataframes en las estructuras de datos que necesita LightFM para poder sacar las matrices y poder hacer uso de su sistema de recomendación

In [35]:
# Obtención de los dataset
ml_dataset = Dataset()
ml_dataset.fit(ml_data_df['Id Usuario'], ml_data_df['Id Película'])
ml_dataset.fit_partial(users=ml_user_df['Id Usuario'], items=ml_items_df['Id Película'],
                    user_features=ml_user_df['Género'], item_features=ml_items_df['Título'])

#num_users, num_items = dataset.interactions_shape()
#print('Num users: {}, num_items {}.'.format(num_users, num_items))

# Obtención de las matrices
(ml_interactions, ml_weights) = ml_dataset.build_interactions((row['Id Usuario'], row['Id Película'], row['Valoración']) for index, row in ml_data_df.iterrows())
ml_item_features = ml_dataset.build_item_features((row['Id Película'], [row['Título']]) for index, row in ml_items_df.iterrows())
ml_user_features = ml_dataset.build_user_features((row['Id Usuario'], [row['Género']]) for index, row in ml_user_df.iterrows())

### Obtención de los modelos

#### Modelo colaborativo

In [36]:
ml_collab_model = LightFM(loss='warp')
ml_collab_model.fit(ml_interactions, sample_weight=ml_weights, epochs=30, num_threads=2)

<lightfm.lightfm.LightFM at 0x1a1da95f28>

In [37]:
sample_recommendation(ml_collab_model, ml_interactions, [3, 25, 450], ml_items_df)

User 3
    Known positives:
        Get Shorty (1995)
        Twelve Monkeys (1995)
        Dead Man Walking (1995)
    Recommended:
         Searching for Bobby Fischer (1993)
         Free Willy (1993)
         Grosse Pointe Blank (1997)
User 25
    Known positives:
        Babe (1995)
        Dead Man Walking (1995)
        Seven (Se7en) (1995)
    Recommended:
         My Life as a Dog (Mitt liv som hund) (1985)
         Ace Ventura: When Nature Calls (1995)
         Young Guns (1988)
User 450
    Known positives:
        Twelve Monkeys (1995)
        Babe (1995)
        Seven (Se7en) (1995)
    Recommended:
         Free Willy (1993)
         Spawn (1997)
         Star Wars (1977)


#### Modelo híbrido

In [38]:
ml_hybrid_model = LightFM(loss='warp')
ml_hybrid_model.fit(ml_interactions, item_features=ml_item_features, sample_weight=ml_weights, epochs=30, num_threads=2)

<lightfm.lightfm.LightFM at 0x1a1da95668>

In [39]:
sample_recommendation(ml_hybrid_model, ml_interactions, [3, 25, 450], ml_items_df)

User 3
    Known positives:
        Get Shorty (1995)
        Twelve Monkeys (1995)
        Dead Man Walking (1995)
    Recommended:
         Spawn (1997)
         Grosse Pointe Blank (1997)
         Star Wars (1977)
User 25
    Known positives:
        Babe (1995)
        Dead Man Walking (1995)
        Seven (Se7en) (1995)
    Recommended:
         Young Guns (1988)
         Free Willy 3: The Rescue (1997)
         Trainspotting (1996)
User 450
    Known positives:
        Twelve Monkeys (1995)
        Babe (1995)
        Seven (Se7en) (1995)
    Recommended:
         Spawn (1997)
         Free Willy (1993)
         Weekend at Bernie's (1989)


#### Modelo por contenido

In [40]:
ml_content_model = LightFM(loss='warp')
ml_content_model.fit(ml_interactions, user_features=ml_user_features, item_features=ml_item_features, sample_weight=ml_weights, epochs=30, num_threads=2)

<lightfm.lightfm.LightFM at 0x1a31f4a780>

In [41]:
sample_recommendation(ml_content_model, ml_interactions, [3, 25, 450], ml_items_df)

User 3
    Known positives:
        Get Shorty (1995)
        Twelve Monkeys (1995)
        Dead Man Walking (1995)
    Recommended:
         My Life as a Dog (Mitt liv som hund) (1985)
         Doom Generation, The (1995)
         Liar Liar (1997)
User 25
    Known positives:
        Babe (1995)
        Dead Man Walking (1995)
        Seven (Se7en) (1995)
    Recommended:
         Gay Divorcee, The (1934)
         Black Sheep (1996)
         Ace Ventura: When Nature Calls (1995)
User 450
    Known positives:
        Twelve Monkeys (1995)
        Babe (1995)
        Seven (Se7en) (1995)
    Recommended:
         Free Willy (1993)
         Spawn (1997)
         Birdcage, The (1996)


## Anime

### Obtención de los dataframes

In [42]:
# Obtención del dataframe de anime
anime_items_df = pd.read_csv('data/anime/anime.csv', sep=',', 
    names=['Id Anime', 'Título', 'Género', 'Tipo', 'Episodios', 'Valoración Media', 'Miembros'])

# Descomentar para comprobar que el dataframe se ha obtenido correctamente
#anime_items_df

# Obtención del dataframe con las valoraciones de anime
anime_data1_df = pd.read_csv('data/anime/ratings1.csv', sep=',', names=['Id Usuario', 'Id Anime', 'Valoración'], low_memory=False)
anime_data2_df = pd.read_csv('data/anime/ratings2.csv', sep=',', names=['Id Usuario', 'Id Anime', 'Valoración'], low_memory=False)
anime_data3_df = pd.read_csv('data/anime/ratings3.csv', sep=',', names=['Id Usuario', 'Id Anime', 'Valoración'], low_memory=False)
anime_data4_df = pd.read_csv('data/anime/ratings4.csv', sep=',', names=['Id Usuario', 'Id Anime', 'Valoración'], low_memory=False)
anime_data_df = pd.concat([anime_data1_df, anime_data2_df, anime_data3_df, anime_data4_df])

# Descomentar para comprobar que el dataframe se ha obtenido correctamente
#anime_data_df

### Obtención del dataset y de las matrices  
Convierto los dataframes en las estructuras de datos que necesita LightFM para poder sacar las matrices y poder hacer uso de su sistema de recomendación

In [43]:
# Obtención de los dataset
anime_dataset = Dataset()
anime_dataset.fit(anime_data_df['Id Usuario'], anime_data_df['Id Anime'])
anime_dataset.fit_partial(items=anime_items_df['Id Anime'], item_features=anime_items_df['Título'])

#num_users, num_items = dataset.interactions_shape()
#print('Num users: {}, num_items {}.'.format(num_users, num_items))

# Obtención de las matrices
(anime_interactions, anime_weights) = anime_dataset.build_interactions((row['Id Usuario'], row['Id Anime'], row['Valoración']) for index, row in anime_data_df.iterrows())
anime_item_features = anime_dataset.build_item_features((row['Id Anime'], [row['Título']]) for index, row in anime_items_df.iterrows())

### Obtención de los modelos

#### Modelo colaborativo

In [44]:
anime_collab_model = LightFM(loss='warp')
anime_collab_model.fit(anime_interactions, sample_weight=anime_weights, epochs=30, num_threads=2)

<lightfm.lightfm.LightFM at 0x1a1f45a550>

In [45]:
sample_recommendation(anime_collab_model, anime_interactions, [3, 25, 450], anime_items_df)

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self.loc[key]


User 3
    Known positives:
        Steins;Gate
        Sen to Chihiro no Kamikakushi
        Ookami Kodomo no Ame to Yuki
    Recommended:
         Ton-Ton Atta to Niigata no Mukashibanashi
         Spoon-hime no Swing Kitchen
         Doubutsu Kankyou Kaigi
User 25
    Known positives:
        Ookami Kodomo no Ame to Yuki
        Monogatari Series: Second Season
        Fate/Zero 2nd Season
    Recommended:
         Ton-Ton Atta to Niigata no Mukashibanashi
         Spoon-hime no Swing Kitchen
         Doubutsu Kankyou Kaigi
User 450
    Known positives:
        Kimi no Na wa.
        Fullmetal Alchemist: Brotherhood
        Ginga Eiyuu Densetsu
    Recommended:
         Gintama: Shiroyasha Koutan
         Rose of Versailles
         Steins;Gate


#### Modelo híbrido

In [None]:
anime_hybrid_model = LightFM(loss='warp')
anime_hybrid_model.fit(anime_interactions, item_features=anime_item_features, sample_weight=anime_weights, epochs=30, num_threads=2)

In [None]:
sample_recommendation(anime_hybrid_model, anime_interactions, [3, 25, 450], anime_items_df)