# Models Part 2 : Hybrids, NLP and More...

In this notebook we will be building a few different models for our recommendation system.

In our first models we tried some baseline models with only the original user reviews. In this notebook we will go deeper using our reviews to get suggestions (using TF-IDF). By the end of this, we need to combine our content based models and our collaborative-filtering model to get a hybrid model.

**The steps are as follows:** 
1. Import Train and Test data
2. TFIDF (fit_transform() on our training data)
3. TFIDF test (transform() on test data)
4. Calcualte the cosin similarity between objects using tfidf

#### Import libraries/modules below:

In [1]:
import pickle
import re
import pandas as pd
import surprise
from surprise import KNNWithMeans
from surprise import SVD
from surprise import SVDpp
from surprise import NMF
from surprise import NormalPredictor
from surprise import KNNBaseline
from surprise import KNNBasic
from surprise import KNNWithZScore
from surprise import BaselineOnly
from surprise import CoClustering
from surprise import SlopeOne
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
from surprise.model_selection import cross_validate
from surprise.model_selection import GridSearchCV
from surprise.reader import Reader
from surprise.model_selection import KFold
from surprise.model_selection import GridSearchCV
from Mod_5_functions import pickle_file,open_pickle

**Import test and train data:**

In [2]:
user_reviews_train = open_pickle('Data/train_data')
user_reviews_test = open_pickle('Data/test_data')

**TFIDF:**

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(ngram_range=(0,3))
tfidf_matrix = vectorizer.fit_transform(user_reviews_train.rev_comp_reviews_corrections_new)

**Cosine Similarity:**

In [10]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarities = cosine_similarity(tfidf_matrix,Y=None,dense_output=False)

In [46]:
results = {} 
for idx, row in user_reviews_train.iterrows(): #iterates through all the rows
    similar_zip = sorted(list(zip(cosine_similarities[idx].indices,cosine_similarities[idx].data)),
                         key=lambda tup: tup[1], reverse=True)
    similar_indices = [i[0] for i in sorted(similar_zip,key = lambda tupe: tupe[1], reverse=True)][1:]
    
    similar_items = [user_reviews_train['rev_company_name'][i] for i in similar_indices]
    results[row['rev_company_name']] = similar_items
    

In [47]:
pickle_file(results,'Data/Cosine_similarity_train')

'Pickled object!'

In [None]:
results['Peloton']

In [49]:
cosine_df = pd.DataFrame(results)

In [50]:
cosine_df.head()

Unnamed: 0,14th Street Y,21 Pilates,212 Pilates,22nd Street Synergy Fitness - CLOSED,24 Hour Fitness - 53rd St,24 Hour Fitness - Bay Shore,24 Hour Fitness - Bensonhurst,24 Hour Fitness - Broadway & Houston Ultra - CLOSED,24 Hour Fitness - Jamaica Center - CLOSED,24 Hour Fitness - Kew Gardens,...,iLoveKickboxing- Carle Place,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Financial District NYC,iLoveKickboxing- Harlem,iLoveKickboxing- Lake Grove,inSHAPE Fitness,lululemon athletica,modelFIT,re:AB Pilates - CLOSED,trampoLEAN
0,Planet Fitness - Manhattan - Harlem Malcolm X ...,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Church Street Boxing Gym,Equinox East 63rd Street,...,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Equinox East 63rd Street,Tiger Schulmann’s Mixed Martial Arts,Equinox East 63rd Street
1,Equinox East 63rd Street,Greenhouse Holistic - CLOSED,Tiger Schulmann’s Mixed Martial Arts,Planet Fitness - Manhattan - Harlem Malcolm X ...,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,...,Greenhouse Holistic - CLOSED,Planet Fitness - Manhattan - Harlem Malcolm X ...,Greenhouse Holistic - CLOSED,Coban’s Muay Thai Camp - MOVED,Greenhouse Holistic - CLOSED,Planet Fitness - Manhattan - Harlem Malcolm X ...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Adam Sanford Fitness - CLOSED,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...
2,24 Hour Fitness - Kew Gardens,Loom Yoga,Coban’s Muay Thai Camp - MOVED,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,Greenhouse Holistic - CLOSED,Blink Fitness - Noho,Blink Fitness - Noho,Equinox East 63rd Street,Blink Fitness - Noho,...,Planet Fitness - Manhattan - Harlem Malcolm X ...,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...,Blink Fitness - Noho,Planet Fitness - Manhattan - Harlem Malcolm X ...,Pure Barre - New York Union Square,Greenhouse Holistic - CLOSED,Greenhouse Holistic - CLOSED,Greenhouse Holistic - CLOSED,Greenhouse Holistic - CLOSED
3,92nd Street Y,Equinox East 63rd Street,Pure Barre - New York Union Square,Coban’s Muay Thai Camp - MOVED,Synergy Fitness Clubs - CLOSED,Tiger Schulmann’s Mixed Martial Arts,Equinox East 63rd Street,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...,Greenhouse Holistic - CLOSED,...,Blink Fitness - Noho,Blink Fitness - Noho,Coban’s Muay Thai Camp - MOVED,Planet Fitness - Manhattan - Harlem Malcolm X ...,Tiger Schulmann’s Mixed Martial Arts,Equinox East 63rd Street,Coban’s Muay Thai Camp - MOVED,Pure Barre - New York Union Square,Sal Anthony’s Movement Salon - CLOSED,Coban’s Muay Thai Camp - MOVED
4,Greenhouse Holistic - CLOSED,Iyengar Yoga Institute of New York,Greenhouse Holistic - CLOSED,92nd Street Y,Church Street Boxing Gym,Equinox East 44th Street,92nd Street Y,Sivananda Ashram Yoga Ranch,92nd Street Y,Coban’s Muay Thai Camp - MOVED,...,Tiger Schulmann’s Mixed Martial Arts,Tiger Schulmann’s Mixed Martial Arts,Tiger Schulmann’s Mixed Martial Arts,Greenhouse Holistic - CLOSED,Blink Fitness - Noho,Body & Pole,Equinox East 63rd Street,Planet Fitness - Manhattan - Harlem Malcolm X ...,Physique 57,24 Hour Fitness - Kew Gardens


with train:

In [52]:
tfidf_matrix_test = vectorizer.transform(user_reviews_test.rev_comp_reviews_corrections_new)

In [54]:
cosine_similarities_test = cosine_similarity(tfidf_matrix_test,Y=None,dense_output=False)

In [55]:
results_test = {} 
for idx, row in user_reviews_test.iterrows(): #iterates through all the rows
    similar_zip = sorted(list(zip(cosine_similarities_test[idx].indices,cosine_similarities_test[idx].data)),
                         key=lambda tup: tup[1], reverse=True)
    similar_indices = [i[0] for i in sorted(similar_zip,key = lambda tupe: tupe[1], reverse=True)][1:]
    
    similar_items = [user_reviews_test['rev_company_name'][i] for i in similar_indices]
    results_test[row['rev_company_name']] = similar_items

In [56]:
cosine_df_test = pd.DataFrame(results_test)
cosine_df_test.head()

Unnamed: 0,21 Pilates,22nd Street Synergy Fitness - CLOSED,24 Hour Fitness - 53rd St,24 Hour Fitness - Bensonhurst,24 Hour Fitness - Broadway & Houston Ultra - CLOSED,24 Hour Fitness - Madison Square Park Ultra,24 Hour Fitness - Riverdale,24 Hour Fitness - Sheepshead Bay,24 Hour Fitness - Tilden,2nd Story Pilates and Yoga,...,iLoveKickboxing- Carle Place,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Forest Hills,iLoveKickboxing- Harlem,iLoveKickboxing- Park Slope,iLoveKickboxing- Richmond Hill,iLoveKickboxing.com - CLOSED,lululemon athletica,modelFIT,trampoLEAN
0,Equinox Flatiron,Equinox Flatiron,Equinox Flatiron,The Gym 105,Planet Fitness - Manhattan,Equinox Flatiron,The Gym 105,The Gym 105,The Gym 105,Upper West Side Yoga and Wellness,...,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Chelsea NYC,Planet Fitness - Manhattan,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Chelsea NYC,Planet Fitness - Manhattan,The Gym 105,Equinox Flatiron
1,The Gym 105,The Gym 105,The Gym 105,Equinox Flatiron,The Gym 105,Planet Fitness - Manhattan,Planet Fitness - Manhattan,Planet Fitness - Manhattan,Equinox Flatiron,Equinox West 92nd Street,...,iLoveKickboxing- Astoria,iLoveKickboxing- Astoria,iLoveKickboxing- Astoria,iLoveKickboxing- Astoria,The Gym 105,iLoveKickboxing- Astoria,iLoveKickboxing- Astoria,The Gym 105,Equinox Flatiron,Planet Fitness - Manhattan
2,Planet Fitness - Manhattan,Planet Fitness - Manhattan,Planet Fitness - Manhattan,Precision Athlete,Equinox Flatiron,The Gym 105,Equinox Flatiron,Equinox Flatiron,Planet Fitness - Manhattan,Equinox Flatiron,...,iLoveKickboxing- Bayside,iLoveKickboxing- Bayside,iLoveKickboxing- Bayside,iLoveKickboxing- Bayside,Equinox Flatiron,iLoveKickboxing- Bayside,iLoveKickboxing- Bayside,Equinox Flatiron,Planet Fitness - Manhattan,The Gym 105
3,Equinox West 92nd Street,Planet Fitness - Manhattan - Washington Heights,Planet Fitness - Manhattan - Washington Heights,Planet Fitness - Manhattan,Planet Fitness - Manhattan - Washington Heights,Planet Fitness - Manhattan - Washington Heights,Life Time Athletic,24 Hour Fitness - Tilden,Equinox West 92nd Street,The Gym 105,...,iLoveKickboxing- Forest Hills,iLoveKickboxing- Forest Hills,iLoveKickboxing- Forest Hills,iLoveKickboxing- Forest Hills,Planet Fitness - Manhattan - Washington Heights,iLoveKickboxing- Forest Hills,iLoveKickboxing- Forest Hills,Upper West Side Yoga and Wellness,Upper West Side Yoga and Wellness,Uplift Studios
4,Planet Fitness - Manhattan - Washington Heights,24 Hour Fitness - Tilden,Equinox West 92nd Street,TITLE Boxing Club,Equinox East 44th Street,Equinox West 92nd Street,Planet Fitness - Manhattan - Washington Heights,Equinox West 92nd Street,Planet Fitness - Manhattan - Washington Heights,Planet Fitness - Manhattan,...,iLoveKickboxing- Richmond Hill,iLoveKickboxing- Richmond Hill,iLoveKickboxing- Richmond Hill,iLoveKickboxing- Richmond Hill,Upper West Side Yoga and Wellness,iLoveKickboxing- Richmond Hill,iLoveKickboxing- Richmond Hill,Equinox West 92nd Street,Planet Fitness - Manhattan - Washington Heights,Planet Fitness - Manhattan - Washington Heights


In [122]:
cosine_df_test.shape

(2393, 931)

**With business categories instead:**

In [100]:
user_reviews_cats_train = open_pickle('Data/training_df_with_categories')

In [101]:
user_reviews_cats_train.dropna(axis=0,inplace=True)

In [102]:
user_reviews_cats_train.reset_index(drop=True,inplace=True)

In [103]:
from tqdm._tqdm_notebook import tqdm_notebook

tqdm_notebook.pandas(desc="Progress: ")

user_reviews_cats_train.categories = user_reviews_cats_train.categories.ize(lambda cats: ' '.join(cats))


In [None]:
set_user_cats_train = user_reviews_cats_train[['rev_comp_url','rev_company_name','categories']].drop_duplicates()
set_user_cats_train.reset_index(inplace=True,drop=True)

In [144]:
vec = TfidfVectorizer(ngram_range=(0,3))
tfidf_matrix_2 = vec.fit_transform(set_user_cats_train.categories)

In [145]:
cosine_similarities_2 = cosine_similarity(tfidf_matrix_2,Y=None,dense_output=False)

In [146]:
results_2 = {} 
for idx, row in set_user_cats_train.iterrows(): #iterates through all the rows
    similar_zip = sorted(list(zip(cosine_similarities_2[idx].indices,cosine_similarities_2[idx].data)),
                         key=lambda tup: tup[1], reverse=True)
    similar_indices = [i[0] for i in sorted(similar_zip,key = lambda tupe: tupe[1], reverse=True)][1:]
    
    similar_items = [set_user_cats_train['rev_company_name'][i] for i in similar_indices]
    results_2[row['rev_company_name']] = similar_items

In [147]:
cosine_df_2 = pd.DataFrame(results_2)
cosine_df_2.head()

Unnamed: 0,14th Street Y,21 Pilates,212 Pilates,22nd Street Synergy Fitness - CLOSED,24 Hour Fitness - 53rd St,24 Hour Fitness - Bay Shore,24 Hour Fitness - Bensonhurst,24 Hour Fitness - Broadway & Houston Ultra - CLOSED,24 Hour Fitness - Jamaica Center - CLOSED,24 Hour Fitness - Kew Gardens,...,iLoveKickboxing- Carle Place,iLoveKickboxing- Chelsea NYC,iLoveKickboxing- Financial District NYC,iLoveKickboxing- Harlem,iLoveKickboxing- Lake Grove,inSHAPE Fitness,lululemon athletica,modelFIT,re:AB Pilates - CLOSED,trampoLEAN
0,Athletic & Swim Club,Balanced Pilates,Balanced Pilates,LA Fitness,One 2 One New York Gym - CLOSED,One 2 One New York Gym - CLOSED,One 2 One New York Gym - CLOSED,One 2 One New York Gym - CLOSED,One 2 One New York Gym - CLOSED,One 2 One New York Gym - CLOSED,...,iLoveKickboxing- Carle Place,iLoveKickboxing- Carle Place,iLoveKickboxing- Bronx,iLoveKickboxing- Carle Place,iLoveKickboxing- Carle Place,Doyle Fitness Personal Training - CLOSED,lululemon athletica,South Brooklyn Weightlifting Club,Balanced Pilates,Doyle Fitness Personal Training - CLOSED
1,CompleteBody,Core Zone Pilates,Core Zone Pilates,LA Fitness,Bev Francis Powerhouse Gym,Bev Francis Powerhouse Gym,Bev Francis Powerhouse Gym,Bev Francis Powerhouse Gym,Bev Francis Powerhouse Gym,Bev Francis Powerhouse Gym,...,CKO Kickboxing,CKO Kickboxing,CKO Kickboxing,CKO Kickboxing,CKO Kickboxing,Abby Sweitzer - CLOSED,lululemon athletica,Phyt NYC,Core Zone Pilates,Abby Sweitzer - CLOSED
2,Downtown Fitness Club - CLOSED,ALine Pilates,ALine Pilates,Gold’s Gym - CLOSED,New York Sports Club - CLOSED,New York Sports Club - CLOSED,New York Sports Club - CLOSED,New York Sports Club - CLOSED,New York Sports Club - CLOSED,New York Sports Club - CLOSED,...,iLoveKickboxing- Bayside,iLoveKickboxing- Bayside,CKO Kickboxing,iLoveKickboxing- Bayside,iLoveKickboxing- Bayside,Yossi Solan Fitness,lululemon athletica,Bally Total Fitness - CLOSED,ALine Pilates,Yossi Solan Fitness
3,New York Health & Racquet Club,Bodycraft Pilates Studio,Bodycraft Pilates Studio,LA Fitness,Planet Fitness - Albany - Crossgates Commons,Planet Fitness - Albany - Crossgates Commons,Planet Fitness - Albany - Crossgates Commons,Planet Fitness - Albany - Crossgates Commons,Planet Fitness - Albany - Crossgates Commons,Planet Fitness - Albany - Crossgates Commons,...,iLoveKickboxing- Harlem,iLoveKickboxing- Harlem,CKO Kickboxing - CLOSED,iLoveKickboxing- Harlem,iLoveKickboxing- Harlem,Hard Boiled Holistics - CLOSED,Lululemon Athletica,Planet Fitness - Bronx - Gun Hill Rd,Bodycraft Pilates Studio,Hard Boiled Holistics - CLOSED
4,New York Health & Racquet Club,The Swan Brooklyn - MOVED,The Swan Brooklyn - MOVED,Crunch - The Hub - CLOSED,New York Sports Clubs,New York Sports Clubs,New York Sports Clubs,New York Sports Clubs,New York Sports Clubs,New York Sports Clubs,...,iLoveKickboxing- Astoria,iLoveKickboxing- Astoria,CKO Kickboxing Gramercy,iLoveKickboxing- Astoria,iLoveKickboxing- Astoria,Oss Life,Lolë - CLOSED,Mark Fisher Fitness,The Swan Brooklyn - MOVED,Oss Life


In [155]:
cosine_df_2['Progressive Krav Maga'][:5]

0     Krav Maga Experts - Upper West Side
1                        Gotham Jiu Jitsu
2                      Eizan Ryu Ju Jitsu
3    Tiger Schulmann’s Mixed Martial Arts
4                       Krav Maga Experts
Name: Progressive Krav Maga, dtype: object

Tomorrow to-do:
- look into lightFM see how I could use it in this
- add the cotent-based to ratings simple
- fix sentiment analysis part see if improves other model
- try to add categories and important words of each review into the cosin similarity
- choose best model by EOD: either django or just do 

In [163]:
import time
def reccomend(place,num):
    recs = list(cosine_df_2[place][:num])
    print(f'If you like {place}, you may also like these {num} places:',sep='\n')
    for i in recs:
        print(i)
    

In [184]:
reccomend('Mark Fisher Fitness',5)

If you like Mark Fisher Fitness, you may also like these 5 places:
South Brooklyn Weightlifting Club
Phyt NYC
Bally Total Fitness - CLOSED
Planet Fitness - Bronx - Gun Hill Rd
Mark Fisher Fitness


In [171]:
reccomend('CKO Kickboxing',10)

If you like CKO Kickboxing, you may also like these 10 places:
CKO Kickboxing
CKO Kickboxing - CLOSED
CKO Kickboxing Gramercy
Kick Fever Fitness
NY Best Kickboxing
CKO Kickboxing
Kettlebell Kickboxing
iLoveKickboxing- Lake Grove
iLoveKickboxing- Carle Place
CKO Kickboxing


In [169]:
print(cosine_similarities_2)

  (0, 1897)	0.708536799694944
  (0, 1896)	0.2280358663564938
  (0, 1895)	0.39066105188621963
  (0, 1894)	0.6901046196478208
  (0, 1893)	0.6901046196478208
  (0, 1892)	0.6901046196478208
  (0, 1891)	0.39066105188621963
  (0, 1890)	0.3466738341149852
  (0, 1889)	0.21991442298356306
  (0, 1888)	0.39066105188621963
  (0, 1887)	0.1511233478308044
  (0, 1886)	0.39066105188621963
  (0, 1885)	0.1586399571310062
  (0, 1884)	0.1321267380052649
  (0, 1883)	0.2893681798100932
  (0, 1882)	0.5705536922921387
  (0, 1881)	0.2238481888516384
  (0, 1880)	0.1436480279834579
  (0, 1879)	0.6901046196478208
  (0, 1878)	0.39066105188621963
  (0, 1877)	0.3153422048776035
  (0, 1876)	0.2762228800413501
  (0, 1875)	0.13266261960385484
  (0, 1874)	0.5248569502771481
  (0, 1873)	0.708536799694944
  :	:
  (1897, 24)	1.0000000000000002
  (1897, 23)	0.19360311550304438
  (1897, 22)	0.26545795450317633
  (1897, 21)	0.6007085198907258
  (1897, 20)	0.3653980963815884
  (1897, 19)	0.29374314797532747
  (1897, 18)	0.4526

In [173]:
print(cosine_similarities[0])

  (0, 2357)	0.32213523483741396
  (0, 2347)	0.25486607332266853
  (0, 2344)	0.32621827631671474
  (0, 2321)	0.21850891914574852
  (0, 2315)	0.3686322021146715
  (0, 2290)	0.2260328640531859
  (0, 2259)	0.4252696466930319
  (0, 2193)	0.2931312774076619
  (0, 2189)	0.24583030903097783
  (0, 2185)	0.25474980574278033
  (0, 2173)	0.29781489220229784
  (0, 2078)	0.4246406899464351
  (0, 1902)	0.3873633985495266
  (0, 1832)	0.29352354802469127
  (0, 1754)	0.36655797584737765
  (0, 1751)	0.37497031642161294
  (0, 1658)	0.2600469213209326
  (0, 1617)	0.36020791221649273
  (0, 1595)	0.34057929562031336
  (0, 1550)	0.33204004908398027
  (0, 1547)	0.24244641782723475
  (0, 1514)	0.3898946167361514
  (0, 1392)	0.24579681189220234
  (0, 1391)	0.17990830436651528
  (0, 1346)	0.27387281583933465
  :	:
  (0, 43)	0.5470423993304431
  (0, 42)	0.5584090883376334
  (0, 40)	0.5193150353799271
  (0, 36)	0.5520818969088089
  (0, 34)	0.5221902584613852
  (0, 30)	0.5284684401341293
  (0, 29)	0.4806420622621551

In [175]:
set_user_cats_train['rev_company_name'][0]

'Planet Fitness - Manhattan - Canal St - NY'

In [183]:
set_user_cats_train[set_user_cats_train['rev_company_name'] == 'Peloton'].index.values

array([ 4, 37])