# Recommender System

In this notebook, a recommender system is created based on the space embeddings obtained from the discrete space embeddings model. The purpose of the recommender system is to recommend areas in either London or New York based on a user's experience in either of the cities. The recommender system will make recommendations based on space embeddings for an area weighted by the average rating a user has given to establishments in the area. The similarity between embeddings is calculated using cosine similarity, where only areas with cosine similarity above 0.5 are considered valid recommendations.

The recommender system is hosted as an API through the AI platform Grace by 2021.ai. The API receives establishment IDs and corresponding ratings as inputs, returning cosine similarity for all areas in the city the user has no experience in. The areas that are considered valid recommendations are visualized on a map and colored according to their calculated similarity. To demonstrate how the API works and to assess the quality of the recommendations obtained, the recommender system is tested on three personas with different preferences. 

# Table of Content
* [Recommender System](#0-bullet)

* [Personas](#1-bullet) 
    
    * [Charles - Middle Age Man from London ](#2-bullet)

    * [Tess - Hipster in 20's from London](#3-bullet)

    * [Jack - Working Class Guy from London ](#4-bullet)
    
    * [NY version of Tess](#5-bullet)

* [Conclusion](#6-bullet)


    




In [64]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import requests
import pickle
from tensorflow import keras
plt.rcParams["figure.figsize"] = (12,7)
sns.set_style("darkgrid")

# Helperfunctions for gridline plots
from helperfunctions import get_country_specific_information, get_geojson_grid, generateBaseMap, save_to_png
NY_location, L_location, NY_num, L_num = get_country_specific_information()

# Recommender System <a class="anchor" id="0-bullet"></a>

In this section, the recommender system is built on the embeddings obtained from the discrete space embeddings model. Below relevant data is loaded, which includes the dataframe `places_final.csv` containing information about each establishment and the pickle file `discrete_embeddings.pkl` containing the space embeddings.

In [65]:
# Load data used for modelling
#df = pd.read_csv("data/space_embedding_data_NLP.csv", index_col=[0])
places = pd.read_csv("data/places_final.csv")

In [66]:
# Get embeddings from model
embeddings=pd.read_pickle("deployment/serialized/discrete_embeddings.pkl")
embeddings=pd.DataFrame(embeddings).T
embeddings.head()

Unnamed: 0,0,1,2,3,4,5
L0,0.001433,-0.012677,0.006213,-0.000165,0.008206,0.011099
L1,0.269152,3.460942,-1.458247,-2.876282,-1.317084,-5.724437
L10,-0.002178,0.019951,0.0365,-0.011969,0.013564,0.027723
L100,-0.032761,-0.034071,-0.022443,-0.023177,-0.04993,-0.015488
L101,0.00777,-0.049722,0.018263,-0.047544,0.024548,-0.039448


The space embeddings are 5-dimensional vectors, comprising the information of the 11 targets predicted in the space embedding model. 

The embeddings are essential to the recommender system, since cosine similarity is calculated between them and the user input. To use the recommender system, a post request has to be made for the API, which the recommender system is deployed on. The recommender system is deployed using Grace, where the code used can be seen below. The recommender system takes `x` and `feature_names` as input, where the former is a matrix containing establishment IDs and their corresponding user given ratings and the latter are the feature names `IDs` and `Rating`. Firstly, some data manipulations are completed to calculate the average embedding of the user. Afterward, the cosine similarity between the average embedding of the user and the embeddings for the other city are calculated. The recommender system then returns a sorted data frame of the cosine similarity for each area in the other city than provided in the input `x`.

```python

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def predict(self, x, feature_names):
        
        # Define user df
        user = pd.DataFrame(data=x, columns=feature_names)
        IDs = user['IDs'].values
        # Save df of visited establishments
        visited = self.places.loc[self.places.gPlusPlaceId.isin(IDs)][['gPlusPlaceId','city','Grid']]
        # Add grid cell and city to user
        user = user.merge(visited, left_on='IDs', right_on='gPlusPlaceId')
        # Ensure rating is numeric
        user['Rating'] = pd.to_numeric(user['Rating'])
        # Save location user has experience from
        city = user.city.unique()[0]
        
        # Calculate average rating of user based on ratings user have given in each grid 
        avg_grid_rating = pd.DataFrame(user.groupby('Grid').Rating.mean())
        avg_user_rating = self.embeddings.loc[avg_grid_rating.index].values.T @ avg_grid_rating.values
        # Calculate the cosine similarity for embeddings in other city 
        recommendation = pd.DataFrame(index = [i for i in self.embeddings.index if city[0] not in i])
        for grid in recommendation.index:
            recommendation.loc[grid, 'cosine_similarity'] = cosine_similarity(avg_user_rating.reshape(1,-1),self.embeddings.loc[grid].values.reshape(1,-1))

        # Clean the indexes
        if city == "London":
            recommendation['clean_index'] = [int(i[2:]) for i in recommendation.index] 
        else:
            recommendation['clean_index'] = [int(i[1:]) for i in recommendation.index] 
        
        # Return the df with cosine similiarity for each grid in other city
        return recommendation.sort_values(by="cosine_similarity", ascending=False).reset_index().values


```

The recommender system is hosted on the Grace API and can be called using a post request. If the request is successful, it returns the recommendations of the above-described recommender system.

In [67]:
def APICall(IDs, Ratings):
    
    feature_names = ['IDs', 'Rating']
    data = list(map(list, zip(IDs, Ratings)))

    # Construct data for endpoint
    endpoint = "https://models.grace-dtu.2021.services/seldon/project-spaceembeddings/recommender-system/api/v0.1/predictions"
    headers = {'Grace-Client-Secret': 'c0de6747-ffb6-4023-913f-53c8222435bb'}
    payload = {"data": {"names": feature_names,
                        "ndarray": data}}

    # Request response from endpoint
    response = requests.post(endpoint, json=payload, headers=headers)

    print(response.status_code)

    if response.status_code == 200:
        recommendations = pd.DataFrame(response.json()['data']['ndarray'], columns=['Grid', 'cosine_similarity', 'clean_index']).set_index('Grid')
    
    else:
        recommendations = None

    return PushIndex(recommendations)

When the recommendations are obtained, the IDs of the grid cell indices needs to be pushed to map the grid cells correctly to the original grid. The original grid has an outer edge that is not included in the grid cell id initialization, thus the indices must be corrected to account for this. The function below ensures that only areas with cosine similarity above 0.5 are considered in the recommendation. These areas and their cosine similarities are visualize in a map. 

In [68]:
# Assumes that user only have ratings in either London or NY
from sklearn.metrics.pairwise import cosine_similarity

def PushIndex(df):

    #0-18 = 22-40
    #19-37 = 43-61
    #38-56 = 64-82

    old_matrix = np.array([np.array([i for i in range(22,41)]) for j in range(19)])
    new_matrix = []
    col = 21

    for idx, x_i in enumerate(old_matrix):
        new_matrix.append(x_i + idx * col)

    mapping = dict(zip([i for i in range(0,21*21)], np.array(new_matrix).flatten().tolist()))
    df['new_index'] = df.clean_index.map(mapping)
    #similiarities = [1 if i in df.new_index.values else 0 for i in range(0,21*21)]
    similiarities = [df.loc[df.new_index==num]['cosine_similarity'].values[0] if num in df.new_index.values else 0 for num in range(0,21*21)]
    similiarities = [i if i>0.5 else 0 for i in similiarities]
    
    return similiarities, df

The function below will be used to visualize the recommended areas of the personas created in the next section.

In [72]:
import json
import matplotlib as mpl
import folium

def Plot(city, similiarity_list, title):

    if city == "London":
        grid = get_geojson_grid("New York", n=NY_num)
        default_location = NY_location

    else:
        grid = get_geojson_grid("London", n=L_num)
        default_location = L_location


    m =  generateBaseMap(default_location)

    for i, box in enumerate(grid):
        geo_json = json.dumps(box)

        if similiarity_list[i] == 0:
            color = mpl.colors.to_hex("white")
            gj = folium.GeoJson(geo_json, style_function=lambda feature, color=color: {'color':"grey", 'weight': 0.6,'dashArray': '1, 1', 'fillOpacity': 0.1,})

        else:
            #value =  (similiarity_list[i]-0.5) / (1-0.5) 
            s_min = min(np.array(similiarities)[np.where(np.array(similiarities)>0)])   
            s_max = max(np.array(similiarities)[np.where(np.array(similiarities)>0)])    
            #print(min(similiarity_list))
            value =  (similiarity_list[i]-s_min) / (s_max-s_min) 
            #print(value)
            #color = plt.cm.Greens(similiarity_list[i])
            #color = plt.cm.PuRd(similiarity_list[i])
            color = plt.cm.PuRd(value+0.5)
            color = mpl.colors.to_hex(color)

            gj = folium.GeoJson(geo_json,
                                style_function=lambda feature, color=color: {
                                                                                'fillColor': color,
                                                                                'color':"grey",
                                                                                'weight': 0.6,
                                                                                'dashArray': '1, 1',
                                                                                'fillOpacity': 0.9,
                                                                            })

        m.add_child(gj)

    title_html = '''
            <h3 align="center" style="font-size:16px"><b>{}</b></h3>
            '''.format(title)   
    m.get_root().html.add_child(folium.Element(title_html))   

    return m

# Personas <a class="anchor" id="1-bullet"></a>

To validate the recommender system, three personas are created. As we are more familiar with London, the personas are created from our knowledge of the areas and research of surrounding areas. Since establishments in London are used as input, the recommender system will output recommended areas in New York on a map. Here, darker shaded areas denoted highly similar areas. 

The personas are created to fit three very different personalities and socio-economic classes, where the aim is to clearly distinguish the personas and their recommended areas. 

To get an understanding of the different areas in New York / Manhattan, an overview of the average income is used as an indicator of the socio-economic classes,  (disclaimer – the plot is not very colorblind friendly) 
* (1)	https://ny.curbed.com/2017/8/9/16119400/income-distribution-nyc-map

Furthermore, we assume that the areas listed below can be classified as ‘hip’ 
* (2)	https://www.businessinsider.com/maps-where-the-hipsters-and-the-yuppies-live-in-new-york-city-2013-10?r=US&IR=T
* (3)	https://www.wimdu.com/blog/new-yorks-top-neighborhoods-part-1-manhattan

And that the wealthy areas listed in the link below are representative for the upper class. 
* (4)	https://ceoworld.biz/2021/04/04/the-most-wealthy-neighborhoods-in-new-york-city/
* (5)	https://ny.curbed.com/2017/6/27/15881706/nyc-richest-neighborhoods-manhattan-brooklyn 

### Charles - Middle Age Man from London  <a class="anchor" id="2-bullet"></a>

The first persona is an upper-class middle-aged man, named Charles. Charles enjoys a good bottle of wine, fine dining, a good steak and prefers places with a view. Charles is not fond of the poorer areas in London and dislikes shawarma as he perceives it as ‘junk’. Recently he discovered how little pleasure he got out of cheesy entertainment.

Charles has made the 5 Google ratings listed below:
1.	Chelsea Riverside Brasserie (Chelsea) – 4 stars 
2.	Sophie's Steakhouse and Bar (Fulham)– 5 stars 
3.	Santini (Belgraive) – 5 stars
4.	Golden Grill (Camberwell) – 1 star
5.	Network Theatre Waterloo (Waterloo) – 3 starts

In [85]:
#Charles - middle age man from London 
persona_L1 = ['117894493394086195117', '104063119900400467553', '109257478465059465681', '108826432794361150085', '108474869308709310130']
L1_ratings = [4, 5, 5, 1, 3]

IDs =  persona_L1
Ratings = L1_ratings
similiarities, df = APICall(IDs, Ratings)
m = Plot('London',similiarities, 'Persona 1 - London to New York')
#save_to_png("Persona1_NY",m)

200


![link](img/Persona1_NY.png)

Charles is roughly recommended 5 area clusters, which are
1.	Lincoln Square / Hell’s Kitchen
2.	West Village
3.	Turtle Bay 
4.	Garmancy/Union Square/Flatiron District
5.	Lower Manhatten 

Most of the recommended areas covered in the five clusters are listed as 'wealthy NYC areas' in article (4) and (5). Therefore, one can argue that the recommended areas coincide with areas one would expect Charles to visit given his expensive preferences. On the other hand, it can be questioned whether all recommendations are equally valid. When considering cluster 5, it is found that this cluster is highly recommended with a dark shade. However, none of the surrounding areas are recommended, which makes it difficult to validate if the area can be concluded fit for Charles or if it merely contains a few highly recommended places. 

The validation of the recommended areas are based on manual inspection of the match between grids and actual areas in NYC. With this in mind, an overall assessment of Charles' recommended areas in NYC, is considerably positive despite isolated areas. 

### Tess - Hipster in 20's from London <a class="anchor" id="3-bullet"></a>

The second persona is Tess who is in her late twenties, who would be classified as a hipster by most people. She enjoys natural wine, nightclubs, and streets buzzing with life. Two things she dislikes are capitalism and fast fashion!

Tess recently review the 5 places below on Google:
1.	Blade Soho (Soho) – 5 stars
2.	Camden tattoo & piercings (Camden) – 4 stars
3.	Foxcroft and Ginger café (Soho) – 4 stars
4.	Yumemoki (Fulham) – 2 stars 
5.	Zara (regent street) – 1 star

In [84]:
# Tess - hipster in 20's from London
persona_L2 =  ['101414550408078459025', '101222111970032450108',  '110867761419823189329', '102526759936228897422', '105899272211447947388']
L2_ratings = [5, 4, 4, 2 ,1 ]

IDs =  persona_L2
Ratings = L2_ratings
similiarities, df = APICall(IDs, Ratings)
m = Plot('London',similiarities, 'Persona 2 - London to New York')
#save_to_png("Persona2_NY",m)

200


![link](img/Persona2_NY.png)

Tess is recommended to visit some more widespread areas in New York. If we consider the recommended areas of darker shades she should visit: 
1.	Upper East Side
2.	Greenwich Village/Chelsea/SoHo
3.	Lower East Side
4.	Midtown

All the places recommended match the ‘hip’ areas listed in article (2) and (3). Again, the recommender system provides reasonable recommendation, since one would expect Tess to visit some of the areas recommended for her.

### Jack - Working Class Guy from London <a class="anchor" id="4-bullet"></a>

Finally, the third perona is Jack. He is an working-class guy, who loves his football team, Tottenham Spurs. He also loves a proper roast and a big pint after a long day of work. Generally, Jack dislikes everything expensive, except of course – tickets for a football match. 

Jack has reviewed the 5 following places on Google:
1.	The Court (Tottenham) – 5 stars
2.	Katies (fish and chips) (Peckham) – 4 stars
3.	The Old Haberdasher (Peckham) – 4 stars
4.	Barts (Chelsea) – 1 star 
5.	Café de Paris (Soho) – 1 star

In [83]:
# Jack - working class guy from London
persona_L3  = ['117142684046048778874', '114202385542308064803', '111040899007592704996', '116048702018782286233', '101959599981305942585']
L3_ratings = [5, 4, 4, 1, 1]

IDs =  persona_L3
Ratings = L3_ratings
similiarities, df = APICall(IDs, Ratings)
m = Plot('London',similiarities, 'Persona 3 - London to New York')
#save_to_png("Persona3_NY",m)

200


![link](img/Persona3_NY.png)

The last persona, Jack, is recommended to visit especially 2 larger clusters: 
1.	Harlem / Manhattan Valley
2.	Lower Manhattan / China Town

The first recommendation for Jack is Harlem or Manhattan Valley. It is expected that this recommendation is well-received by Jack, since Harlem can described as having [affordable housing, iconic restaurants, community "vibes", and proud history](https://www.common.com/blog/2021/01/living-in-harlem-ny-the-ultimate-guide/). Jack is, surprisingly, recommended Lower Manhattan which can be perceived as an expensive area. However, as the area can also be considered touristy, the area might capture exactly what Jack is seeking. 


### New York version of Tess  <a class="anchor" id="5-bullet"></a>

The recommender system seems to provide reasonable recommendations when recommending areas in New York based on establishments in London. To assess if the recommender system also works the other way around, we have created a New York version of Tess the Hipster.

New Yorker Tess has rated the following on Google:
* The Harrison (Greenwich) - 5 stars
* Garden Court Café (Lenox Hill) - 4 stars
* Chuck E. Cheese's (Bronx) - 1 star
* FireBird (Hells Kitchen) - 2 star
* Zara (SOHO) - 1 star

In [76]:
#New Yorker version of the hipster
NY1 = ['113977789460220411709',  '101274427290837781011', '106861653242718638090', '103027305365171139191', '113219169863257979653']
Rating_N1 = [4, 5, 1, 2, 1]

In [79]:
IDs = NY1
Ratings = Rating_N1
similiarities, df = APICall(IDs, Ratings)
m = Plot('New York',similiarities, 'Persona 2 - New York to London')
#save_to_png("Persona2_L",m)

200


![link](img/Persona2_L.png) 

The recommendation created for the New York version of Jess, does not give any clear indication of places she should visit. The areas are highly spread out and it is difficult to justify that the recommendation is reasonable. Thus, it does not seem to work as well, when going from New York to London. Therefore, further validations will not be conducted from New York to London.

# Conclusion <a class="anchor" id="6-bullet"></a>

To wrap up the final notebook documenting our work on space embeddings, it is possible to create a recommendation system based on space embeddings. The results are positive for personas coming from London to New York but less interpretive the other way around. As a proof of concept, the feasibility is confirmed but several things should be reconsidered if the concept should be realized. These will be highlighted in the discussion, which we urge you to read in the [README notebook](./README.ipynb).