# <p style="text-align:center;"><u>THE ASIAN TOURIST IN PARIS</u></p>

# Introduction

<font size=4>In the city of Paris, there are thousands of tourists who have come to see the amazing city for its sheer beauty. These tourists are from different nationals and have cuisines which are specific to them. For example, asians will prefer to go eat in restaurants that specialize in making asian cuisines rather than eating in a restaurant that deals with different kinds of cuisines. To this end, the problem i intend to solve is deciding the best AirBnB apartment locations where Asian tourists can reside during a tourist visit to the city of Paris.</font>
<br><br>

<font size=4>By clustering the Asian restaurants using Foursquare API and AirBnB private room listings data for the city of Paris, i can successfully cluster the restaurants into groups and propose the best AirBnB private room apartment based on the average price per night and the number of reviews each room has. By solving this problem, i can help Asian tourists coming to Paris to choose an AirBnB apartment situated in proximity to </font>
<br><br>

<font size=4>In conclusion, the problem i intend to solve can be framed as the question: <br><br><b>Which AirBnB apartment should an Asian tourist choose to reside in during a visit to Paris city in order to enjoy Asian cuisines located nearby?</b></font>

# Data 

<font size=4>To solve the stated problem, i will be using Foursquare's location data which will be accessed using their GET API. The kind of data i will be accessing include:
<br>
<br>

<li>Restaurants in Paris city</li>
<li>Asian restaurants and other interesting places around these restaurants</li>


<br>
Also, i will be utilizing data from http://public.opendatasoft.com. This data will comprise of all AirBnB private room listings in the city of Paris, their latitude and longitude, price per night, neighborhood of location and some other redundant data.
With these set of data, i can determine the best location/AirBnB property in which an asian tourist should reside in during a visit to the city of Paris by clustering the restaurant data and determing the cluster with the most number of restaurants where asian cuisines are being sold. Also, with the data, i can explore AirBnB listings around the chosen sites, and propose them as the hotel of choice for asian tourists that are coming into Paris city.
<br>
<br>
The classification will put into consideration the price of the rooms per night and the number of reviews for each AirBnB listing.</font>

# Methodology

<font size=4>The methodology to adopt for this problem can be briefly described in the following phases:<br></font>
<br>
    <b>1. AirBnB Data collection from http://public.opendatasoft.com</b>
<br>
The AirBnB data will be accessed from OpenDataSoft and the data set will be limited to private rooms in AirBnB's listings in the city of Paris. The collected data set will be stored in  a .csv format.
<br>    
    <b>2. AirBnB Data preprocessing.</b>
<br>
The Airbnb data will be loaded into a pandas dataframe and data preprocessing techniques such as cleaning, trimming, shaping etc will be carried out on it to prepare it for processing.
<br><br>
    <b>3. EDA</b>
<br>
Exploratory data analysis will carried out on the data to better describe it and get some insight about the data
<br><br>
    <b>4. Utilization of Foursquare API search function</b>
<br>
The Foursquare API search function will then be used to find Asian restaurants within 500m of each AirBnB private room in Paris city. The JSON result will be cleaned and made ready for clustering
<br><br>
    <b>5. Clustering of Asian restaurants</b>
<br>
To properly group the asian restuarants with respect to the locations of AirBnB private rooms, the KNN machine learning technique will be used to cluster the restaurant into ten (10) different clusters.
<br>
<br>
<br>


#### <u>AirBnB Data Preprocessing</u>

##### 1. Import necessary dependencies

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
#pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#!pip install beautifulsoup4
#from bs4 import BeautifulSoup
import urllib.request
import csv


# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

##### 2. Read the AirBnB Paris listing data into Pandas dataframe

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Room ID,Name,Host ID,Neighbourhood,Room type,Room Price,Minimum nights,Number of reviews,Date last review,Number of reviews per month,Rooms rent by the host,Availibility,Updated Date,City,Country,Coordinates,Location
0,27425430,Bâtiments et cartier chik,94000621,Observatoire,Private room,30,20,0,,,1,0,2019-07-09,Paris,France,"48.8324388237,2.32971312235","France, Paris, Observatoire"
1,28003325,Chambre Double Galilée B&B,211467698,Passy,Private room,125,3,12,2019-06-19,1.2,4,105,2019-07-09,Paris,France,"48.8709289603,2.29685938737","France, Paris, Passy"
2,28069177,Two Rooms perfect for families,135335766,Batignolles-Monceau,Private room,398,1,0,,,7,29,2019-07-09,Paris,France,"48.8817959345,2.31241028731","France, Paris, Batignolles-Monceau"
3,28115947,Accommodations for 3 next to the train station,126446227,Opéra,Private room,119,1,4,2019-06-08,0.49,6,317,2019-07-09,Paris,France,"48.8786312806,2.32969475795","France, Paris, Opéra"
4,28122260,"Jolie chambre 2p ds cosy F3, 5m du Parc Expos",184941981,Vaugirard,Private room,50,1,27,2019-06-21,4.53,2,295,2019-07-09,Paris,France,"48.8300155978,2.29669152579","France, Paris, Vaugirard"


In [3]:
# Because of running time, we'll be using just 1000 rows of data
airbnb = df.sample(n=1000, random_state=3) # Get 1000 random rows from the AirBnB data set
airbnb.reset_index(drop=True, inplace=True) # Reset the index
airbnb.shape # Output the shape of the selected data

(1000, 17)

##### 3. Modify the data by dropping unneccesary columns

In [4]:
# Drop the unneeded data columns
airbnb.drop(['Name', 'Host ID', 'Room type', 'Minimum nights', 'Date last review', 'Number of reviews per month', 'Rooms rent by the host', 'Availibility',
       'Updated Date', 'City', 'Country', 'Location'], axis=1, inplace=True)
# Display the modified dataframe shape
print(airbnb.shape)

airbnb.head() # Display the 1st 5 rows of the dataframe

(1000, 5)


Unnamed: 0,Room ID,Neighbourhood,Room Price,Number of reviews,Coordinates
0,12817090,Entrepôt,30,8,"48.8733818465,2.37345456201"
1,2181753,Reuilly,40,104,"48.8413143526,2.38339810857"
2,29895295,Reuilly,50,7,"48.8472675613,2.40041154285"
3,30728070,Popincourt,139,0,"48.8609207147,2.36660350692"
4,21862193,Entrepôt,40,1,"48.8790213314,2.37008084215"


In [5]:
# Split the data in the coordinates column into latitude and longitude
airbnb['Latitude'], airbnb['Longitude'] = airbnb['Coordinates'].str.split(',', 1).str
# Convert the latitude and longitude to float datatype
airbnb['Latitude'] = airbnb['Latitude'].astype(float, copy=True)
airbnb['Longitude'] = airbnb['Longitude'].astype(float, copy=True)
# Drop the coordinate column
airbnb.drop(['Coordinates'], axis=1, inplace=True)

In [6]:
airbnb.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace=True) # Rename the Neighbourhood column
airbnb.head()

Unnamed: 0,Room ID,Neighborhood,Room Price,Number of reviews,Latitude,Longitude
0,12817090,Entrepôt,30,8,48.873382,2.373455
1,2181753,Reuilly,40,104,48.841314,2.383398
2,29895295,Reuilly,50,7,48.847268,2.400412
3,30728070,Popincourt,139,0,48.860921,2.366604
4,21862193,Entrepôt,40,1,48.879021,2.370081


In [41]:
airbnb.describe

<bound method NDFrame.describe of       Room ID         Neighborhood  Room Price  Number of reviews   Latitude  \
0    12817090             Entrepôt          30                  8  48.873382   
1     2181753              Reuilly          40                104  48.841314   
2    29895295              Reuilly          50                  7  48.847268   
3    30728070           Popincourt         139                  0  48.860921   
4    21862193             Entrepôt          40                  1  48.879021   
5    26834043               Élysée        1000                  0  48.869121   
6    34379249    Buttes-Montmartre          60                  0  48.892996   
7    12912440         Observatoire          40                  1  48.837924   
8    10902362             Gobelins          40                  6  48.828511   
9     6275557              Reuilly          55                  2  48.843485   
10   26442567    Buttes-Montmartre          65                 48  48.885455   
11   2

In [7]:
# Output the number of neighborhoods where AirBnB listings are located in Paris
print ('The total number of neighborhood is', len(airbnb['Neighborhood'].value_counts()))

The total number of neighborhood is 20


##### 4a. Use geopy to get the lat. and long. of Paris, France

In [8]:
address = 'Paris, France' # address to be inputted into the geolocator

geolocator = Nominatim(user_agent='Paris_Explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566101, 2.3514992.


##### 4b. Create a map of Paris with neighbourhoods superimposed on top.

In [9]:
# create map of New York using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, neighbourhood, ID in zip(airbnb['Latitude'], airbnb['Longitude'], airbnb['Neighborhood'], airbnb['Room ID']):
    label = '{}, ''Room ID:{}'.format(neighbourhood, ID) # Show neighborhood and Room ID as popup label
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

#### <u>Foursquare API Utilization</u>

In [10]:
CLIENT_ID = 'MFKZJY4N32FIOZQ0XWT1UTD53RMQWZY3RHBFZ4KTTKNOO0X0' # my Foursquare ID
CLIENT_SECRET = '5TIVBR0L34XJGXQDUZVMELO5YZURRRG2W3UWJNRYMTH3T0VJ' # my Foursquare Secret
VERSION = '20191002' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MFKZJY4N32FIOZQ0XWT1UTD53RMQWZY3RHBFZ4KTTKNOO0X0
CLIENT_SECRET:5TIVBR0L34XJGXQDUZVMELO5YZURRRG2W3UWJNRYMTH3T0VJ


##### Now, let's get the Asian restaurants within a radius of 500 meters of the neighborhoods
This will be done using a defined function that will return the needed parameters from the Foursquare API JSON result.

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    limit = 20 # limit of number of venues returned by Foursquare API
    search_query = 'Asian Restaurant'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            lat, 
            lng, 
            VERSION, 
            search_query, 
            radius, 
            limit)

          
        # make the GET request
        results = requests.get(url).json()['response']['venues']
        
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
# Use the defined function to get all the venues
asian_restaurants = getNearbyVenues(names=airbnb['Neighborhood'],
                                   latitudes=airbnb['Latitude'],
                                   longitudes=airbnb['Longitude']
                                  )

Entrepôt
Reuilly
Reuilly
Popincourt
Entrepôt
Élysée
Buttes-Montmartre
Observatoire
Gobelins
Reuilly
Buttes-Montmartre
Ménilmontant
Popincourt
Popincourt
Buttes-Montmartre
Popincourt
Hôtel-de-Ville
Popincourt
Observatoire
Vaugirard
Vaugirard
Luxembourg
Popincourt
Reuilly
Gobelins
Bourse
Panthéon
Opéra
Buttes-Chaumont
Gobelins
Élysée
Entrepôt
Bourse
Palais-Bourbon
Opéra
Buttes-Chaumont
Vaugirard
Opéra
Observatoire
Reuilly
Ménilmontant
Gobelins
Passy
Vaugirard
Buttes-Chaumont
Ménilmontant
Buttes-Chaumont
Passy
Buttes-Montmartre
Ménilmontant
Louvre
Entrepôt
Gobelins
Entrepôt
Ménilmontant
Entrepôt
Vaugirard
Louvre
Ménilmontant
Reuilly
Buttes-Montmartre
Entrepôt
Opéra
Buttes-Montmartre
Ménilmontant
Entrepôt
Élysée
Gobelins
Luxembourg
Luxembourg
Batignolles-Monceau
Bourse
Batignolles-Monceau
Entrepôt
Gobelins
Vaugirard
Luxembourg
Palais-Bourbon
Opéra
Opéra
Reuilly
Passy
Popincourt
Buttes-Montmartre
Passy
Vaugirard
Buttes-Montmartre
Vaugirard
Ménilmontant
Ménilmontant
Entrepôt
Entrepôt
Passy
B

In [13]:
print(asian_restaurants.shape)
asian_restaurants.head(20)

(13885, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Entrepôt,48.873382,2.373455,Asian Soupe,48.873556,2.375092,"[{'id': '4bf58dd8d48988d14a941735', 'name': 'V..."
1,Entrepôt,48.873382,2.373455,Asian Touch,48.872724,2.371047,"[{'id': '4bf58dd8d48988d149941735', 'name': 'T..."
2,Entrepôt,48.873382,2.373455,Gölbasi Restaurant,48.873567,2.370311,"[{'id': '4f04af1f2fb6e1c99f3db0bb', 'name': 'T..."
3,Entrepôt,48.873382,2.373455,Do Brasil,48.873884,2.371596,"[{'id': '4bf58dd8d48988d16b941735', 'name': 'B..."
4,Entrepôt,48.873382,2.373455,Barak,48.874663,2.372837,"[{'id': '4f04af1f2fb6e1c99f3db0bb', 'name': 'T..."
5,Entrepôt,48.873382,2.373455,Oliva,48.87311,2.375951,"[{'id': '55a5a1ebe4b013909087cb7f', 'name': 'T..."
6,Entrepôt,48.873382,2.373455,Restaurant Les 4 Frères,48.874568,2.373783,"[{'id': '4bf58dd8d48988d115941735', 'name': 'M..."
7,Entrepôt,48.873382,2.373455,Rôtisserie Sainte-Marthe,48.873507,2.370876,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R..."
8,Entrepôt,48.873382,2.373455,Restaurant Tai Yien,48.872311,2.377507,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C..."
9,Entrepôt,48.873382,2.373455,Le Firat,48.870307,2.372929,"[{'id': '5283c7b4e4b094cb91ec88d7', 'name': 'K..."


Define a function that will get the category type for each returned venue from the JSON result file.

In [14]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['Venue Category']
    except:
        categories_list = row['venues.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
asian_restaurants['categories'] = asian_restaurants.apply(get_category_type, axis=1)

In [15]:
asian_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,categories
0,Entrepôt,48.873382,2.373455,Asian Soupe,48.873556,2.375092,"[{'id': '4bf58dd8d48988d14a941735', 'name': 'V...",Vietnamese Restaurant
1,Entrepôt,48.873382,2.373455,Asian Touch,48.872724,2.371047,"[{'id': '4bf58dd8d48988d149941735', 'name': 'T...",Thai Restaurant
2,Entrepôt,48.873382,2.373455,Gölbasi Restaurant,48.873567,2.370311,"[{'id': '4f04af1f2fb6e1c99f3db0bb', 'name': 'T...",Turkish Restaurant
3,Entrepôt,48.873382,2.373455,Do Brasil,48.873884,2.371596,"[{'id': '4bf58dd8d48988d16b941735', 'name': 'B...",Brazilian Restaurant
4,Entrepôt,48.873382,2.373455,Barak,48.874663,2.372837,"[{'id': '4f04af1f2fb6e1c99f3db0bb', 'name': 'T...",Turkish Restaurant


In [16]:
asian_restaurants.drop(['Venue Category'], inplace=True, axis=1) # Drop the initial Venue category column
asian_restaurants.rename(columns={'categories': "Venue Category"}, inplace=True) # Rename the new category as venue category
asian_restaurants.head(20)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Entrepôt,48.873382,2.373455,Asian Soupe,48.873556,2.375092,Vietnamese Restaurant
1,Entrepôt,48.873382,2.373455,Asian Touch,48.872724,2.371047,Thai Restaurant
2,Entrepôt,48.873382,2.373455,Gölbasi Restaurant,48.873567,2.370311,Turkish Restaurant
3,Entrepôt,48.873382,2.373455,Do Brasil,48.873884,2.371596,Brazilian Restaurant
4,Entrepôt,48.873382,2.373455,Barak,48.874663,2.372837,Turkish Restaurant
5,Entrepôt,48.873382,2.373455,Oliva,48.87311,2.375951,Trattoria/Osteria
6,Entrepôt,48.873382,2.373455,Restaurant Les 4 Frères,48.874568,2.373783,Middle Eastern Restaurant
7,Entrepôt,48.873382,2.373455,Rôtisserie Sainte-Marthe,48.873507,2.370876,Restaurant
8,Entrepôt,48.873382,2.373455,Restaurant Tai Yien,48.872311,2.377507,Chinese Restaurant
9,Entrepôt,48.873382,2.373455,Le Firat,48.870307,2.372929,Kebab Restaurant


In [17]:
print('The number of unique venue is:', len(asian_restaurants['Venue Category'].unique().tolist())) # Print the number of unique restaurants returned
asian_restaurants['Venue Category'].unique().tolist() # Print the list of unique venues

The number of unique venue is: 116


['Vietnamese Restaurant',
 'Thai Restaurant',
 'Turkish Restaurant',
 'Brazilian Restaurant',
 'Trattoria/Osteria',
 'Middle Eastern Restaurant',
 'Restaurant',
 'Chinese Restaurant',
 'Kebab Restaurant',
 'Asian Restaurant',
 'African Restaurant',
 'Greek Restaurant',
 'Cantonese Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Spa',
 'Tattoo Parlor',
 'Fast Food Restaurant',
 'Cafeteria',
 'College Cafeteria',
 None,
 'Italian Restaurant',
 'Sushi Restaurant',
 'Lebanese Restaurant',
 'French Restaurant',
 'Falafel Restaurant',
 'Romanian Restaurant',
 'Moroccan Restaurant',
 'Indian Restaurant',
 'General Entertainment',
 'Bar',
 'Tibetan Restaurant',
 'Pizza Place',
 'Peruvian Restaurant',
 'Bistro',
 'Japanese Restaurant',
 'Ethiopian Restaurant',
 'Hotel',
 'Auto Dealership',
 'North Indian Restaurant',
 'Halal Restaurant',
 'Molecular Gastronomy Restaurant',
 'Donut Shop',
 'Buffet',
 'Pub',
 'Comfort Food Restaurant',
 'Hotel Bar',
 'Café',
 'Diner',
 'Sandwich Place',
 'Gastro

<b>NB:</b> From the print above, the venue category contains results which are not restaurants, results which are undefined kinds of restaurants, and some results which are not restaurants of the asian origin. These rows will be dropped using the lines of code below.

In [18]:
# Create a list of unwanted venue categories and store in variable 'unwanted'
unwanted = ['Restaurant', 'Brazilian Restaurant', 'African Restaurant', 'Greek Restaurant', 'Fast Food Restaurant','Italian Restaurant','French Restaurant', 'Moroccan Restaurant', 'Romanian Restaurant',
 'Cafeteria', 'German Restaurant', 'Modern European Restaurant', 'South American Restaurant','Ethiopian Restaurant']

In [19]:
# Store rows in which the venue category is a restaurant in variable 'ásian'
asian = asian_restaurants[asian_restaurants['Venue Category'].str.contains("Restaurant") == True]

In [20]:
asian.shape # Display the new shape of the data set

(11363, 7)

In [21]:
asian = asian[~asian['Venue Category'].isin(unwanted)]

In [22]:
print(asian.shape)
asian.head(20)

(5805, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Entrepôt,48.873382,2.373455,Asian Soupe,48.873556,2.375092,Vietnamese Restaurant
1,Entrepôt,48.873382,2.373455,Asian Touch,48.872724,2.371047,Thai Restaurant
2,Entrepôt,48.873382,2.373455,Gölbasi Restaurant,48.873567,2.370311,Turkish Restaurant
4,Entrepôt,48.873382,2.373455,Barak,48.874663,2.372837,Turkish Restaurant
6,Entrepôt,48.873382,2.373455,Restaurant Les 4 Frères,48.874568,2.373783,Middle Eastern Restaurant
8,Entrepôt,48.873382,2.373455,Restaurant Tai Yien,48.872311,2.377507,Chinese Restaurant
9,Entrepôt,48.873382,2.373455,Le Firat,48.870307,2.372929,Kebab Restaurant
10,Entrepôt,48.873382,2.373455,Restaurant Le Yun,48.871185,2.377746,Asian Restaurant
12,Entrepôt,48.873382,2.373455,Restaurant Paradis,48.871745,2.37749,Asian Restaurant
16,Entrepôt,48.873382,2.373455,Restaurant Rapide Ben Long,48.869554,2.370224,Asian Restaurant


In [23]:
asian.groupby('Neighborhood').count() # Group the dataframe by neighborhood

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Batignolles-Monceau,325,325,325,325,325,325
Bourse,196,196,196,196,196,196
Buttes-Chaumont,316,316,316,316,316,316
Buttes-Montmartre,592,592,592,592,592,592
Entrepôt,960,960,960,960,960,960
Gobelins,300,300,300,300,300,300
Hôtel-de-Ville,129,129,129,129,129,129
Louvre,108,108,108,108,108,108
Luxembourg,162,162,162,162,162,162
Ménilmontant,225,225,225,225,225,225


In [24]:
print('There are {} uniques categories.'.format(len(asian['Venue Category'].unique())))

There are 48 uniques categories.


With one hot encoding, convert the venue categories into a data form that can be processed

In [25]:
# one hot encoding
asian_onehot = pd.get_dummies(asian[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
asian_onehot['Neighborhood'] = asian['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [asian_onehot.columns[-1]] + list(asian_onehot.columns[:-1])
asian_onehot = asian_onehot[fixed_columns]

asian_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Australian Restaurant,Burgundian Restaurant,Cambodian Restaurant,Cantonese Restaurant,...,Sri Lankan Restaurant,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Entrepôt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,Entrepôt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
2,Entrepôt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
4,Entrepôt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
6,Entrepôt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
asian_onehot.shape

(5805, 49)

In [27]:
asian_grouped = asian_onehot.groupby('Neighborhood').mean().reset_index() # Find the mean of the data set and reset the index
asian_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Australian Restaurant,Burgundian Restaurant,Cambodian Restaurant,Cantonese Restaurant,...,Sri Lankan Restaurant,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Batignolles-Monceau,0.0,0.0,0.046154,0.0,0.073846,0.0,0.0,0.0,0.0,...,0.0,0.098462,0.0,0.0,0.0,0.132308,0.0,0.070769,0.0,0.061538
1,Bourse,0.0,0.0,0.0,0.030612,0.015306,0.0,0.0,0.0,0.0,...,0.0,0.056122,0.0,0.066327,0.0,0.02551,0.010204,0.173469,0.010204,0.035714
2,Buttes-Chaumont,0.0,0.0,0.012658,0.0,0.113924,0.0,0.0,0.0,0.009494,...,0.0,0.0,0.0,0.0,0.0,0.072785,0.0,0.224684,0.0,0.031646
3,Buttes-Montmartre,0.0,0.0,0.0,0.0,0.032095,0.0,0.0,0.0,0.0,...,0.003378,0.011824,0.0,0.0,0.0,0.010135,0.037162,0.16723,0.0,0.023649
4,Entrepôt,0.0,0.0,0.0,0.0,0.167708,0.0,0.0,0.0,0.015625,...,0.015625,0.004167,0.0,0.026042,0.0,0.044792,0.0,0.222917,0.028125,0.011458
5,Gobelins,0.0,0.0,0.0,0.0,0.11,0.0,0.0,0.043333,0.016667,...,0.0,0.003333,0.003333,0.0,0.0,0.053333,0.0,0.076667,0.0,0.163333
6,Hôtel-de-Ville,0.0,0.0,0.0,0.007752,0.03876,0.0,0.108527,0.0,0.0,...,0.0,0.085271,0.0,0.0,0.0,0.077519,0.031008,0.0,0.093023,0.077519
7,Louvre,0.0,0.0,0.009259,0.101852,0.166667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.037037
8,Luxembourg,0.0,0.0,0.098765,0.0,0.030864,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.006173,0.074074,0.0,0.049383,0.0,0.111111
9,Ménilmontant,0.0,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.288889,0.0,0.017778


Find the top 5 restaurants in each neighborhood

In [28]:
num_top_venues = 5

for hood in asian_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = asian_grouped[asian_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Batignolles-Monceau----
                 venue  freq
0   Seafood Restaurant  0.14
1      Thai Restaurant  0.13
2  Japanese Restaurant  0.12
3     Sushi Restaurant  0.10
4    Indian Restaurant  0.09


----Bourse----
                 venue  freq
0   Chinese Restaurant  0.17
1   Turkish Restaurant  0.17
2    Indian Restaurant  0.10
3  Szechuan Restaurant  0.07
4     Kebab Restaurant  0.07


----Buttes-Chaumont----
                venue  freq
0  Turkish Restaurant  0.22
1  Chinese Restaurant  0.13
2    Asian Restaurant  0.11
3   Indian Restaurant  0.09
4    Kebab Restaurant  0.09


----Buttes-Montmartre----
                       venue  freq
0          Indian Restaurant  0.22
1         Turkish Restaurant  0.17
2           Halal Restaurant  0.14
3           Kebab Restaurant  0.09
4  Middle Eastern Restaurant  0.05


----Entrepôt----
                 venue  freq
0   Turkish Restaurant  0.22
1    Indian Restaurant  0.17
2     Asian Restaurant  0.17
3  Japanese Restaurant  0.06
4   Chinese

In [29]:
# A function to return the most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = asian_grouped['Neighborhood']

for ind in np.arange(asian_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(asian_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Batignolles-Monceau,Seafood Restaurant,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Indian Restaurant,Asian Restaurant,Turkish Restaurant,Vietnamese Restaurant,Kebab Restaurant,Shabu-Shabu Restaurant
1,Bourse,Turkish Restaurant,Chinese Restaurant,Indian Restaurant,Szechuan Restaurant,Lebanese Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant
2,Buttes-Chaumont,Turkish Restaurant,Chinese Restaurant,Asian Restaurant,Kebab Restaurant,Indian Restaurant,Thai Restaurant,Middle Eastern Restaurant,Seafood Restaurant,Lebanese Restaurant,Japanese Restaurant
3,Buttes-Montmartre,Indian Restaurant,Turkish Restaurant,Halal Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Doner Restaurant,Falafel Restaurant,Tibetan Restaurant,Asian Restaurant,Chinese Restaurant
4,Entrepôt,Turkish Restaurant,Indian Restaurant,Asian Restaurant,Japanese Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Kebab Restaurant,Vegetarian / Vegan Restaurant,Szechuan Restaurant


#### Clustering the Neighborhood using KNN

In [31]:
# set number of clusters
kclusters = 5

asian_grouped_clustering = asian_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(asian_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 3, 3, 3, 0, 1, 1, 1, 3], dtype=int32)

In [32]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

asian_merged = airbnb

# merge asian_grouped with the sorted neighborhood data
asian_merged = asian_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

asian_merged.head()

Unnamed: 0,Room ID,Neighborhood,Room Price,Number of reviews,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,12817090,Entrepôt,30,8,48.873382,2.373455,3,Turkish Restaurant,Indian Restaurant,Asian Restaurant,Japanese Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Kebab Restaurant,Vegetarian / Vegan Restaurant,Szechuan Restaurant
1,2181753,Reuilly,40,104,48.841314,2.383398,0,Turkish Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Vietnamese Restaurant,Lebanese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Mediterranean Restaurant,Falafel Restaurant
2,29895295,Reuilly,50,7,48.847268,2.400412,0,Turkish Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Vietnamese Restaurant,Lebanese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Mediterranean Restaurant,Falafel Restaurant
3,30728070,Popincourt,139,0,48.860921,2.366604,3,Asian Restaurant,Indian Restaurant,Turkish Restaurant,Kebab Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Mediterranean Restaurant,Thai Restaurant
4,21862193,Entrepôt,40,1,48.879021,2.370081,3,Turkish Restaurant,Indian Restaurant,Asian Restaurant,Japanese Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Kebab Restaurant,Vegetarian / Vegan Restaurant,Szechuan Restaurant


# Results & Discussion

After the clustering of the restaurants using KNN, the cluster labels were generated. Hence, the venues can now be superimposed on the map of Paris to show each of the clusters as depicted below

In [33]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(asian_merged['Latitude'], asian_merged['Longitude'], asian_merged['Neighborhood'], asian_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

From the map above, it is seen that their is clear overlap in the clusters due to high concentration of similar venue category across the AirBnB listing.
To further explain the, the data set will be grouped by the cluster label in order to spot how the clustering algorithm grouped the restaurants with respect to the average price of AirBnB apartments

In [34]:
asian_merged.groupby('Cluster Labels').mean()

Unnamed: 0_level_0,Room ID,Room Price,Number of reviews,Latitude,Longitude
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,19208980.0,124.703008,27.146617,48.850011,2.351896
1,20564840.0,162.981132,28.669811,48.860463,2.316343
2,23122640.0,347.761905,24.809524,48.855525,2.30863
3,19271570.0,99.283186,21.396018,48.87503,2.366437
4,23028620.0,146.040816,18.163265,48.858356,2.273812


From the result above, it is clear that the average room price for each cluster is different, thus can help an Asian tourist determine his/her room of choice.

In [35]:
asian_bar = asian_merged.groupby('Cluster Labels').mean()
ax = asian_bar.plot.bar(y='Room Price', rot=0)

The dataframes below shows the result for each cluster

<b>CLUSTER 1</b>

In [36]:
asian_merged.loc[asian_merged['Cluster Labels'] == 0, asian_merged.columns[[0,1,2,3,4,5] + list(range(7, asian_merged.shape[1]))]]

Unnamed: 0,Room ID,Neighborhood,Room Price,Number of reviews,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,2181753,Reuilly,40,104,48.841314,2.383398,Turkish Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Vietnamese Restaurant,Lebanese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Mediterranean Restaurant,Falafel Restaurant
2,29895295,Reuilly,50,7,48.847268,2.400412,Turkish Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Vietnamese Restaurant,Lebanese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Mediterranean Restaurant,Falafel Restaurant
7,12912440,Observatoire,40,1,48.837924,2.335897,Kebab Restaurant,Chinese Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Cantonese Restaurant,Persian Restaurant,Caribbean Restaurant,Vietnamese Restaurant,Indian Restaurant
8,10902362,Gobelins,40,6,48.828511,2.368842,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Turkish Restaurant,Kebab Restaurant,Thai Restaurant,Middle Eastern Restaurant,Doner Restaurant,Halal Restaurant,Cambodian Restaurant
9,6275557,Reuilly,55,2,48.843485,2.381207,Turkish Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Vietnamese Restaurant,Lebanese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Mediterranean Restaurant,Falafel Restaurant
18,24611716,Observatoire,40,0,48.834929,2.326319,Kebab Restaurant,Chinese Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Cantonese Restaurant,Persian Restaurant,Caribbean Restaurant,Vietnamese Restaurant,Indian Restaurant
23,27231809,Reuilly,30,6,48.846536,2.403962,Turkish Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Vietnamese Restaurant,Lebanese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Mediterranean Restaurant,Falafel Restaurant
24,35149192,Gobelins,110,0,48.832656,2.354788,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Turkish Restaurant,Kebab Restaurant,Thai Restaurant,Middle Eastern Restaurant,Doner Restaurant,Halal Restaurant,Cambodian Restaurant
25,8492308,Bourse,150,129,48.866438,2.351878,Turkish Restaurant,Chinese Restaurant,Indian Restaurant,Szechuan Restaurant,Lebanese Restaurant,Kebab Restaurant,Sushi Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant
26,23244712,Panthéon,90,22,48.837211,2.348776,Chinese Restaurant,Kebab Restaurant,Indian Restaurant,Turkish Restaurant,Middle Eastern Restaurant,Halal Restaurant,Afghan Restaurant,Thai Restaurant,Syrian Restaurant,Japanese Restaurant


<b>CLUSTER 2</b>

In [37]:
asian_merged.loc[asian_merged['Cluster Labels'] == 1, asian_merged.columns[[0,1,2,3,4,5] + list(range(7, asian_merged.shape[1]))]]

Unnamed: 0,Room ID,Neighborhood,Room Price,Number of reviews,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,21585068,Hôtel-de-Ville,128,88,48.854200,2.351054,Burgundian Restaurant,Japanese Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Indian Restaurant,Vietnamese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Falafel Restaurant,Mediterranean Restaurant
19,12318143,Vaugirard,50,13,48.846108,2.293216,Lebanese Restaurant,Korean Restaurant,Persian Restaurant,Indian Restaurant,Thai Restaurant,Mediterranean Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Doner Restaurant,Falafel Restaurant
20,6198789,Vaugirard,40,4,48.838193,2.308725,Lebanese Restaurant,Korean Restaurant,Persian Restaurant,Indian Restaurant,Thai Restaurant,Mediterranean Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Doner Restaurant,Falafel Restaurant
21,25371291,Luxembourg,100,49,48.856680,2.337738,Lebanese Restaurant,Indian Restaurant,Japanese Restaurant,Vietnamese Restaurant,American Restaurant,Middle Eastern Restaurant,Thai Restaurant,Indonesian Restaurant,Turkish Restaurant,Mediterranean Restaurant
36,18951169,Vaugirard,45,109,48.834085,2.286690,Lebanese Restaurant,Korean Restaurant,Persian Restaurant,Indian Restaurant,Thai Restaurant,Mediterranean Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Doner Restaurant,Falafel Restaurant
43,26983103,Vaugirard,500,5,48.842847,2.305724,Lebanese Restaurant,Korean Restaurant,Persian Restaurant,Indian Restaurant,Thai Restaurant,Mediterranean Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Doner Restaurant,Falafel Restaurant
50,21105126,Louvre,500,0,48.861768,2.343610,Asian Restaurant,Lebanese Restaurant,Korean Restaurant,Japanese Restaurant,Arepa Restaurant,Chinese Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Portuguese Restaurant,Vietnamese Restaurant
56,2846240,Vaugirard,80,1,48.831453,2.282518,Lebanese Restaurant,Korean Restaurant,Persian Restaurant,Indian Restaurant,Thai Restaurant,Mediterranean Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Doner Restaurant,Falafel Restaurant
57,25986935,Louvre,1000,0,48.867829,2.326000,Asian Restaurant,Lebanese Restaurant,Korean Restaurant,Japanese Restaurant,Arepa Restaurant,Chinese Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Portuguese Restaurant,Vietnamese Restaurant
68,27266740,Luxembourg,120,2,48.853699,2.334086,Lebanese Restaurant,Indian Restaurant,Japanese Restaurant,Vietnamese Restaurant,American Restaurant,Middle Eastern Restaurant,Thai Restaurant,Indonesian Restaurant,Turkish Restaurant,Mediterranean Restaurant


<b>CLUSTER 3</b>

In [38]:
asian_merged.loc[asian_merged['Cluster Labels'] == 2, asian_merged.columns[[0,1,2,3,4,5] + list(range(7, asian_merged.shape[1]))]]

Unnamed: 0,Room ID,Neighborhood,Room Price,Number of reviews,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,28095621,Palais-Bourbon,69,61,48.848568,2.309096,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
77,15005210,Palais-Bourbon,50,2,48.857104,2.303614,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
99,16624675,Palais-Bourbon,280,4,48.853831,2.309337,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
160,21589739,Palais-Bourbon,42,9,48.856704,2.319156,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
162,34191655,Palais-Bourbon,199,0,48.853945,2.308219,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
183,15579358,Palais-Bourbon,70,174,48.854688,2.300159,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
220,13012978,Palais-Bourbon,320,1,48.859396,2.298308,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
227,10509789,Palais-Bourbon,120,26,48.851514,2.30028,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
238,14806004,Palais-Bourbon,35,33,48.849329,2.314511,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant
246,9562142,Palais-Bourbon,64,0,48.855511,2.294054,Korean Restaurant,Thai Restaurant,Indian Restaurant,Spanish Restaurant,Eastern European Restaurant,Tapas Restaurant,Mediterranean Restaurant,Lebanese Restaurant,American Restaurant,Middle Eastern Restaurant


<b>CLUSTER 4</b>

In [39]:
asian_merged.loc[asian_merged['Cluster Labels'] == 3, asian_merged.columns[[0,1,2,3,4,5] + list(range(7, asian_merged.shape[1]))]]

Unnamed: 0,Room ID,Neighborhood,Room Price,Number of reviews,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,12817090,Entrepôt,30,8,48.873382,2.373455,Turkish Restaurant,Indian Restaurant,Asian Restaurant,Japanese Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Kebab Restaurant,Vegetarian / Vegan Restaurant,Szechuan Restaurant
3,30728070,Popincourt,139,0,48.860921,2.366604,Asian Restaurant,Indian Restaurant,Turkish Restaurant,Kebab Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Mediterranean Restaurant,Thai Restaurant
4,21862193,Entrepôt,40,1,48.879021,2.370081,Turkish Restaurant,Indian Restaurant,Asian Restaurant,Japanese Restaurant,Chinese Restaurant,Thai Restaurant,Middle Eastern Restaurant,Kebab Restaurant,Vegetarian / Vegan Restaurant,Szechuan Restaurant
5,26834043,Élysée,1000,0,48.869121,2.309637,Turkish Restaurant,Indian Restaurant,Persian Restaurant,North Indian Restaurant,Thai Restaurant,Chinese Restaurant,Asian Restaurant,Halal Restaurant,Russian Restaurant,Kebab Restaurant
6,34379249,Buttes-Montmartre,60,0,48.892996,2.360433,Indian Restaurant,Turkish Restaurant,Halal Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Doner Restaurant,Falafel Restaurant,Tibetan Restaurant,Asian Restaurant,Chinese Restaurant
10,26442567,Buttes-Montmartre,65,48,48.885455,2.334187,Indian Restaurant,Turkish Restaurant,Halal Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Doner Restaurant,Falafel Restaurant,Tibetan Restaurant,Asian Restaurant,Chinese Restaurant
11,21442642,Ménilmontant,150,2,48.875010,2.390250,Turkish Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Indian Restaurant,Asian Restaurant,Portuguese Restaurant,Chinese Restaurant,Japanese Restaurant,Dim Sum Restaurant,Vietnamese Restaurant
12,14785583,Popincourt,28,4,48.856686,2.391691,Asian Restaurant,Indian Restaurant,Turkish Restaurant,Kebab Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Mediterranean Restaurant,Thai Restaurant
13,36433345,Popincourt,35,0,48.869369,2.378698,Asian Restaurant,Indian Restaurant,Turkish Restaurant,Kebab Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Mediterranean Restaurant,Thai Restaurant
14,22840532,Buttes-Montmartre,75,12,48.886701,2.334159,Indian Restaurant,Turkish Restaurant,Halal Restaurant,Kebab Restaurant,Middle Eastern Restaurant,Doner Restaurant,Falafel Restaurant,Tibetan Restaurant,Asian Restaurant,Chinese Restaurant


<b>CLUSTER 5</b>

In [40]:
asian_merged.loc[asian_merged['Cluster Labels'] == 4, asian_merged.columns[[0,1,2,3,4,5] + list(range(7, asian_merged.shape[1]))]]

Unnamed: 0,Room ID,Neighborhood,Room Price,Number of reviews,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,35154311,Passy,75,2,48.872736,2.281648,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
47,28422062,Passy,119,0,48.848998,2.273493,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
81,21973329,Passy,40,17,48.868741,2.283,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
84,24908135,Passy,30,1,48.838085,2.258432,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
92,9325509,Passy,48,0,48.851644,2.269515,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
147,9376335,Passy,65,0,48.84167,2.260336,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
153,26350080,Passy,40,0,48.876441,2.28642,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
177,12240489,Passy,58,193,48.86399,2.289509,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
182,22509579,Passy,70,62,48.858324,2.277299,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant
196,26320543,Passy,1000,0,48.875763,2.282561,Japanese Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Korean Restaurant,Thai Restaurant,Australian Restaurant,Lebanese Restaurant,Seafood Restaurant,Asian Restaurant,Persian Restaurant


# Conclusion & Recommendations

From the results, it is seen that the KNN machine learning algorithm is a good tool for clustering asian restaurants which are in proximity to AirBnB private room listings in the city of Paris.
<br>
<br>
<br>
As a recommendation, to make the outcome even much better, one can consider the review of each of these restaurant to help the Asian tourist determine the locations with the most positive reviews. The price of the rooms can be factored into the KNN model. Also, the distance of each restaurant from the AirBnB private apartment can be computed to further help improve the outcome of the model.