
## Parte 3: Construyendo un Sistema de Recomendacion con Feedback Implicito


In [1]:
pip install implicit

Collecting implicit
  Downloading implicit-0.7.2-cp37-cp37m-macosx_10_9_x86_64.whl.metadata (6.1 kB)
Collecting threadpoolctl (from implicit)
  Downloading threadpoolctl-3.1.0-py3-none-any.whl.metadata (9.2 kB)
Downloading implicit-0.7.2-cp37-cp37m-macosx_10_9_x86_64.whl (802 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m802.9/802.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Installing collected packages: threadpoolctl, implicit
Successfully installed implicit-0.7.2 threadpoolctl-3.1.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix

import implicit

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

### Paso 2: Cargando los datos


In [3]:
ratings = pd.read_csv("reviews_with_user_id.csv")
hotels = pd.read_csv("hotel_info_eu.csv")

In [5]:
# Remove columns from the DataFrame
#ratings = ratings.drop(columns=['Unnamed: 0', 'temp_id'])
ratings.head()

Unnamed: 0,user_id,hotel_id,Reviewer_Nationality,Negative_Review,Positive_Review,Reviewer_Score,Review_Date,given_reviews
0,1,0,Russia,I am so angry that i made this post available...,Only the park outside of the hotel was beauti...,2.9,8/3/2017,7
1,2,0,Ireland,No Negative,No real complaints the hotel was great great ...,7.5,8/3/2017,7
2,3,0,Australia,Rooms are nice but for elderly a bit difficul...,Location was good and staff were ok It is cut...,7.1,7/31/2017,9
3,4,0,United Kingdom,My room was dirty and I was afraid to walk ba...,Great location in nice surroundings the bar a...,3.8,7/31/2017,1
4,5,0,New Zealand,You When I booked with your company on line y...,Amazing location and building Romantic setting,6.7,7/24/2017,3


In [7]:
#hotels = hotels.drop(columns=['Unnamed: 0'])
hotels.head()

Unnamed: 0,hotel_id,hotel_country,Hotel_Name,Hotel_Address
0,0,Netherlands,Hotel Arena,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
1,1,United Kingdom,K K Hotel George,1 15 Templeton Place Earl s Court Kensington a...
2,2,United Kingdom,Apex Temple Court Hotel,1 2 Serjeant s Inn Fleet Street City of London...
3,3,United Kingdom,The Park Grand London Paddington,1 3 Queens Garden Westminster Borough London W...
4,4,France,Monhotel Lounge SPA,1 3 Rue d Argentine 16th arr 75116 Paris France


### Paso 3: Transformando los datos


In [10]:
def create_X(df: pd.DataFrame):
    """
    Generates a sparse matrix from ratings dataframe.
    
    Args:
        df: pandas dataframe
    
    Returns:
        X: sparse matrix
        user_mapper: dict that maps user id's to user indices
        user_inv_mapper: dict that maps user indices to user id's
        hotel_mapper: dict that maps movie id's to movie indices
        hotel_inv_mapper: dict that maps movie indices to movie id's
    """
    N = df['user_id'].nunique()
    M = df['user_id'].nunique()

    user_mapper = dict(zip(np.unique(df["user_id"]), list(range(N))))
    hotel_mapper = dict(zip(np.unique(df["hotel_id"]), list(range(M))))
    
    user_inv_mapper = dict(zip(list(range(N)), np.unique(df["user_id"])))
    hotel_inv_mapper = dict(zip(list(range(M)), np.unique(df["hotel_id"])))
    
    user_index = [user_mapper[i] for i in df['user_id']]
    hotel_index = [hotel_mapper[i] for i in df['hotel_id']]

    X = csr_matrix((df["Reviewer_Score"], (hotel_index, user_index)), shape=(M, N))
    
    return X, user_mapper, hotel_mapper, user_inv_mapper, hotel_inv_mapper

In [11]:
X, user_mapper, hotel_mapper, user_inv_mapper, hotel_inv_mapper = create_X(ratings)

In [12]:
pip install fuzzywuzzy

Note: you may need to restart the kernel to use updated packages.


In [14]:
from fuzzywuzzy import process

def hotel_finder(title):
    all_titles = hotels['Hotel_Name'].tolist()
    closest_match = process.extractOne(title, all_titles)
    return closest_match[0]

hotel_title_mapper = dict(zip(hotels['Hotel_Name'], hotels['hotel_id']))
hotel_title_inv_mapper = dict(zip(hotels['hotel_id'], hotels['Hotel_Name']))

def get_hotel_index(title):
    fuzzy_title = hotel_finder(title)
    movie_id = hotel_title_mapper[fuzzy_title]
    movie_idx = hotel_mapper[movie_id]
    return movie_idx

def get_hotel_title(hotel_idx): 
    hotel_id = hotel_inv_mapper[hotel_idx]
    title = hotel_title_inv_mapper[hotel_id]
    return title 

In [19]:
hotel_index = get_hotel_index('Monhotel')
hotel_name = get_hotel_title(hotel_index)
print(hotel_name)

Monhotel Lounge SPA


### Paso 4: Construyendo el modelo de modelo de Recomendacion de Feedback Implicito

In [17]:
model = implicit.als.AlternatingLeastSquares(factors=50)

  check_blas_config()


In [18]:
model.fit(X.T.tocsr())

100%|██████████| 15/15 [00:09<00:00,  1.56it/s]


In [20]:
hotel_of_interest = 'Monhotel'

hotel_index = get_hotel_index(hotel_of_interest)
related = model.similar_items(hotel_index)
related

(array([ 4,  5,  1,  2, 10, 11,  9,  0, 12,  8], dtype=int32),
 array([1.        , 0.98986036, 0.9251524 , 0.9019392 , 0.88701075,
        0.8862056 , 0.8759067 , 0.87510616, 0.8668109 , 0.8657092 ],
       dtype=float32))

In [21]:
print(f"Por que te hospedaste en el hotel {hotel_finder(hotel_of_interest)} te pueden interesar los siguientes hoteles:")
for t, r in zip(related[0], related[1]):
    
    recommended_hotel = get_hotel_title(t)
    if recommended_hotel != hotel_finder(hotel_of_interest):
        print(recommended_hotel)



Por que te hopedaste en el hotel Monhotel Lounge SPA te pueden interesar los siguientes hoteles:
Kube Hotel Ice Bar
K K Hotel George
Apex Temple Court Hotel
Hotel Trianon Rive Gauche
InterContinental London Park Lane
Splendid Etoile
Hotel Arena
Novotel Suites Paris Nord 18 me
One Aldwych


### Paso 5: Generando las recomendaciones del usuario

In [22]:
user_id = 90

In [24]:
user_ratings = ratings[ratings['user_id']==user_id].merge(hotels[['hotel_id', 'Hotel_Name']])
user_ratings = user_ratings.sort_values('Reviewer_Score', ascending=False)
print(f"El numero de hoteles rankeados por el usuario {user_id} es de: {user_ratings['hotel_id'].nunique()}")

El numero de hoteles rankeados por el usuario 90 es de: 10


In [26]:
user_ratings = ratings[ratings['user_id']==user_id].merge(hotels[['hotel_id', 'Hotel_Name']])
user_ratings = user_ratings.sort_values('Reviewer_Score', ascending=False)
top_5 = user_ratings.head()
top_5

Unnamed: 0,user_id,hotel_id,Reviewer_Nationality,Negative_Review,Positive_Review,Reviewer_Score,Review_Date,given_reviews,Hotel_Name
3,90,35,Slovenia,,Super quiet room well designed good wifi nesp...,10.0,2/16/2017,31,The Nadler Soho
9,90,47,Slovenia,No Negative,The location was great The breakfast was also...,10.0,2/17/2017,15,Le Senat
10,90,51,Slovenia,Normal rooms are a bit small but that is Pari...,The location of the hotel is great Close by i...,10.0,8/11/2016,17,Les Plumes Hotel
7,90,41,Slovenia,Bed was to small queen size not king size Air...,Friendly staff fluent in English excellent co...,9.2,8/9/2015,9,Crowne Plaza Paris R publique
1,90,8,Slovenia,I would prefer a breakfast buffet Quality of ...,No Positive,8.8,11/30/2015,3,One Aldwych


In [28]:
bottom_5 = user_ratings[user_ratings['Reviewer_Score']<8].tail()
bottom_5

Unnamed: 0,user_id,hotel_id,Reviewer_Nationality,Negative_Review,Positive_Review,Reviewer_Score,Review_Date,given_reviews,Hotel_Name
0,90,0,Slovenia,The rooms could be more sound secure Lots of ...,The front desk where very kind and helpful,7.5,10/3/2016,12,Hotel Arena
2,90,23,Slovenia,Bathroom was old and uncomfortable Loud const...,Very good breakfast and lovelly breakfast are...,7.5,9/24/2016,4,Novotel London West
5,90,39,Slovenia,Small room and small bathroom not better than...,Clean and excellent location,7.5,3/5/2016,1,The Nadler Victoria
4,90,36,Slovenia,Staff students with no energie in hotel bar a...,Nice room with a lot of equiptment You get a ...,7.1,12/26/2015,1,Grange St Paul s Hotel
11,90,56,Slovenia,The staff did not feel very relaxed and welco...,Comfortable bed,6.7,12/15/2015,4,Pullman London St Pancras


A partir de las preferencias anteriores, podemos inferir algo acerca del usuario 90. Veamos que recomendaciones se pueden generar para este usuario en particular.

Utilizaremos `recommend()` que utiliza el indice del usuario y lo transpone con la matriz user-item.

In [29]:
X_t = X.T.tocsr()
user_idx = user_mapper[user_id]
recommendations = model.recommend(user_idx, X_t[user_idx])
recommendations

(array([37, 40, 54,  7, 58, 31, 32, 26, 27, 57], dtype=int32),
 array([0.87365365, 0.8357782 , 0.74676883, 0.6822851 , 0.6731199 ,
        0.67127436, 0.6454601 , 0.6310118 , 0.62272614, 0.6072975 ],
       dtype=float32))

Hagamos una conversion del indice al nombre de los hoteles recomendados.

In [31]:
for t, r in zip(recommendations[0], recommendations[1]):
    recommended_hotel = get_hotel_title(t)
    print(recommended_hotel)

The Ampersand Hotel
Novotel London Tower Bridge
W London Leicester Square
Park Plaza County Hall London
Doubletree by Hilton London Kensington
Knightsbridge Hotel
The London EDITION
H tel Juliana Paris
Hotel L Antoine
Crowne Plaza London Kensington
