# A (Data Science) Holiday Guide to London

As we know, besides being the place where the Azzurri won EURO2020, London represents a prominent industry hub for Data Science and Deep Learning.

In the code that follows, starting from Mario Levorato [Airbnb Kaggle Dataset](https://www.kaggle.com/levorato/inside-airbnb-london), I'll try to build a Supervised Learning algorithm to recommend a new room based on the description of the chosen one.

Enjoy!

![Millennium Bridge,Jhonatan Chng, Via Unsplash](https://images.unsplash.com/photo-1532444143931-9f60a76242e7?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=1050&q=80)

In [None]:
#Kaggle default setup

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
#Import files
path = '/kaggle/input/inside-airbnb-london/listings.csv'
listings = pd.read_csv(path, usecols=['id','listing_url','last_scraped','name','description','neighborhood_overview','picture_url','host_location'])

# ML recommendations system

Since Kaggle has a limited CPU I slice the dataset to its first 10k rows.

In [None]:
#Slice listing to its first 10.000 rows
df = listings.iloc[0:10000,:]
df.shape

In [None]:
df.head()

In [None]:
#Import Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
#Remove stopwords
tfidf = TfidfVectorizer(stop_words='english')

In [None]:
#Clean description column
df['description'] = df['description'].fillna('')

In [None]:
#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(df['description'])

As we can see, each word of each row in the column description has now a parameter of similarity with all the words of the dataframe.

In [None]:
print(tfidf_matrix)

In [None]:
#Import linear kernel
from sklearn.metrics.pairwise import linear_kernel

In [None]:
# Compute the cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [None]:
indices = pd.Series(df.index, index=df['name']).drop_duplicates()

In [None]:
def get_recommendations(name, cosine_sim=cosine_sim):
    idx = indices[name]

    # Get the pairwsie similarity scores of all rooms description with the rooms name
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the room based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar rooms
    sim_scores = sim_scores[1:11]

    # Get the room indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar rooms
    return df[['name','listing_url','host_location']].iloc[movie_indices]

In [None]:
#Find some casual rooms
import random
random_room = df.iloc[random.randint(1,10000)]['name']
print(random_room)

In [None]:
get_recommendations(random_room)

In [None]:
#Find specific room
get_recommendations('Spacious two bedroom apartment')