# Models Part 2 : Hybrids, NLP and More...

In this notebook we will be building a few different models for our recommendation system.

In our first models we tried some baseline models with only the original user reviews. In this notebook we will go deeper using our reviews to get suggestions (using TF-IDF). By the end of this, we need to combine our content based models and our collaborative-filtering model to get a hybrid model.

**The steps are as follows:** 
1. Import Train and Test data
2. TFIDF (fit_transform() on our training data)
3. TFIDF test (transform() on test data)
4. Calcualte the cosin similarity between objects using tfidf

#### Import libraries/modules below:

In [1]:
import pickle
import re
import pandas as pd
import surprise
from surprise import KNNWithMeans
from surprise import SVD
from surprise import SVDpp
from surprise import NMF
from surprise import NormalPredictor
from surprise import KNNBaseline
from surprise import KNNBasic
from surprise import KNNWithZScore
from surprise import BaselineOnly
from surprise import CoClustering
from surprise import SlopeOne
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
from surprise.model_selection import cross_validate
from surprise.model_selection import GridSearchCV
from surprise.reader import Reader
from surprise.model_selection import KFold
from surprise.model_selection import GridSearchCV
from Mod_5_functions import pickle_file,open_pickle

**Import test and train data:**

In [2]:
user_reviews_train = open_pickle('Data/train_data')
user_reviews_test = open_pickle('Data/test_data')

**TFIDF:**

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(ngram_range=(0,3))
tfidf_matrix = vectorizer.fit_transform(user_reviews_train.rev_comp_reviews_corrections_new)

**Cosine Similarity:**

In [10]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarities = cosine_similarity(tfidf_matrix,Y=None,dense_output=False)

In [46]:
results = {} 
for idx, row in user_reviews_train.iterrows(): #iterates through all the rows
    similar_zip = sorted(list(zip(cosine_similarities[idx].indices,cosine_similarities[idx].data)),
                         key=lambda tup: tup[1], reverse=True)
    similar_indices = [i[0] for i in sorted(similar_zip,key = lambda tupe: tupe[1], reverse=True)][1:]
    
    similar_items = [user_reviews_train['rev_company_name'][i] for i in similar_indices]
    results[row['rev_company_name']] = similar_items
    

In [47]:
pickle_file(results,'Data/Cosine_similarity_train')

'Pickled object!'

In [48]:
results['Peloton']

['Equinox East 63rd Street',
 'Greenhouse Holistic - CLOSED',
 'Pure Barre - New York Union Square',
 'Sal Anthony’s Movement Salon - CLOSED',
 'Physique 57',
 'SWERVE Fitness',
 'Planet Fitness - Manhattan - Harlem Malcolm X Blvd. - CLOSED',
 'Coban’s Muay Thai Camp - MOVED',
 'Tiger Schulmann’s Mixed Martial Arts',
 'Blink Fitness - Noho',
 '92nd Street Y',
 'Equinox East 63rd Street',
 'Equinox East 44th Street',
 'Aerial Arts NYC',
 'Bodhi Fitness Center',
 'Iyengar Yoga Institute of New York',
 'Y7 Studio Union Square',
 'Rumble Boxing',
 'Peloton',
 'Equinox Brooklyn Heights',
 'Metropolitan Pool and Recreation Center',
 'Chelsea Piers Fitness - Chelsea',
 '24 Hour Fitness - Kew Gardens',
 'Sivananda Ashram Yoga Ranch',
 'Church Street Boxing Gym',
 'Body & Pole',
 'Willspace',
 'New York Health & Racquet Club',
 'iLoveKickboxing- Chelsea NYC',
 'Alycea Ungaro’s Real Pilates',
 'Recreation Center 54',
 'Y7 Studio Flatiron',
 'Flywheel Sports Upper West Side',
 'Alvin Ailey Americ

In [49]:
cosine_df = pd.DataFrame(results)

In [50]:
cosine_df.head()

Unnamed: 0,14th Street Y,21 Pilates,212 Pilates,22nd Street Synergy Fitness - CLOSED,24 Hour Fitness - 53rd St,24 Hour Fitness - Bay Shore,24 Hour Fitness - Bensonhurst,24 Hour Fitness - Broadway & Houston Ultra - CLOSED,24 Hour Fitness - Jamaica Center - CLOSED,24 Hour Fitness - Kew Gardens,...,iLoveKickboxing- Carle Place,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Financial District NYC,iLoveKickboxing- Harlem,iLoveKickboxing- Lake Grove,inSHAPE Fitness,lululemon athletica,modelFIT,re:AB Pilates - CLOSED,trampoLEAN
0,Planet Fitness - Manhattan - Harlem Malcolm X ...,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Church Street Boxing Gym,Equinox East 63rd Street,...,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Tiger Schulmann’s Mixed Martial Arts,Equinox East 63rd Street
1,Equinox East 63rd Street,Greenhouse Holistic - CLOSED,Tiger Schulmann’s Mixed Martial Arts,Planet Fitness - Manhattan - Harlem Malcolm X ...,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,...,Greenhouse Holistic - CLOSED,Planet Fitness - Manhattan - Harlem Malcolm X ...,Greenhouse Holistic - CLOSED,Coban’s Muay Thai Camp - MOVED,Greenhouse Holistic - CLOSED,Planet Fitness - Manhattan - Harlem Malcolm X ...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Adam Sanford Fitness - CLOSED,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...
2,24 Hour Fitness - Kew Gardens,Loom Yoga,Coban’s Muay Thai Camp - MOVED,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,Greenhouse Holistic - CLOSED,Blink Fitness - Noho,Blink Fitness - Noho,Equinox East 63rd Street,Blink Fitness - Noho,...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,Pure Barre - New York Union Square,Greenhouse Holistic - CLOSED,Greenhouse Holistic - CLOSED,Greenhouse Holistic - CLOSED,Greenhouse Holistic - CLOSED
3,92nd Street Y,Equinox East 63rd Street,Pure Barre - New York Union Square,Coban’s Muay Thai Camp - MOVED,Synergy Fitness Clubs - CLOSED,Tiger Schulmann’s Mixed Martial Arts,Equinox East 63rd Street,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...,Greenhouse Holistic - CLOSED,...,Blink Fitness - Noho,Blink Fitness - Noho,Coban’s Muay Thai Camp - MOVED,Planet Fitness - Manhattan - Harlem Malcolm X ...,Tiger Schulmann’s Mixed Martial Arts,Equinox East 63rd Street,Coban’s Muay Thai Camp - MOVED,Pure Barre - New York Union Square,Sal Anthony’s Movement Salon - CLOSED,Coban’s Muay Thai Camp - MOVED
4,Greenhouse Holistic - CLOSED,Iyengar Yoga Institute of New York,Greenhouse Holistic - CLOSED,92nd Street Y,Church Street Boxing Gym,Equinox East 44th Street,92nd Street Y,Sivananda Ashram Yoga Ranch,92nd Street Y,Coban’s Muay Thai Camp - MOVED,...,Tiger Schulmann’s Mixed Martial Arts,Tiger Schulmann’s Mixed Martial Arts,Tiger Schulmann’s Mixed Martial Arts,Greenhouse Holistic - CLOSED,Blink Fitness - Noho,Body & Pole,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...,Physique 57,24 Hour Fitness - Kew Gardens


In [51]:
benchmark = []
# Iterate over all algorithms
for algorithm in [SVD(), SVDpp(), SlopeOne(), NMF(), NormalPredictor(), KNNBaseline(), KNNBasic(), KNNWithMeans(), KNNWithZScore(), BaselineOnly(), CoClustering()]:
    # Perform cross validation
    results = cross_validate(algorithm, tfidf_matrix, measures=['RMSE'], cv=3, verbose=False)
    
    # Get results & append algorithm name
    tmp = pd.DataFrame.from_dict(results).mean(axis=0)
    tmp = tmp.append(pd.Series([str(algorithm).split(' ')[0].split('.')[-1]], index=['Algorithm']))
    benchmark.append(tmp)
    
pd.DataFrame(benchmark).set_index('Algorithm').sort_values('test_rmse') 

AttributeError: raw_ratings not found