# Best location for your Business 

### Background and description

In this project, the problem attempted to solve will be to find the best possible location or the most optimal, for an Indian restaurant in the city of London, United Kingdom. To achieve this task, an analytical approach will be used, based on advanced machine learning techniques and data analysis, concretely clustering and perhaps some data visualization techniques.

During the process of analysis, several data transformations will be performed, in order the find the best possible data format for the machine learning model to ingest. Once the data is set up and prepared, a modeling process will be carried out, and this statistical analysis will provide the best possible places to locate the Indian restaurant.

The idea is to focus on the venues where I can find some good Indian Restaurant in terms of Rating ( a good restaurant for me will have a rating greater or equal to 9.0/10.0) and then look for similar venues where we cannot find a similar restaurants in terms of rating and where we can avoid competition.

### Data we need

The data that will be used to develop this project is based on two sites:

1. The Foursquare Api: This data will be accesed via Python, and used to obtain the most common venues per neighborhood in the city of Madrid. This way, it is possible to have a taste of how the city's venues are distributed, what are the most common places for leisure, and in general, it will provide an idea of what people's likes are.

2. The www.doogal.co.uk Website: This site provides several data sources of great utility to solve this problem. The files are provided in csv format, and they are built over a statstical exploitation and use basis. The data contains updated information about the postcodes and locations of Boroughs and neighborhoods in London. This data will be analyzed in such a way that one could determine the best location of r anew venue/restaurant/other based on similarities between neighborhoods and the rating of differents restaurants.

You can access to the data via this <a href='https://www.doogal.co.uk/UKPostcodesCSV.ashx?region=E12000007'>link</a>

#### Let's import the librairies needed and see what the data looks like

In [1]:
import requests
import pandas as pd
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Let's now import the data from a csv file 

In [2]:
London = pd.read_csv('London postcodes.csv')
London.columns

Index([u'Postcode', u'In Use?', u'Latitude', u'Longitude', u'Easting',
       u'Northing', u'Grid Ref', u'County', u'District', u'Ward',
       u'District Code', u'Ward Code', u'Country', u'County Code',
       u'Constituency', u'Introduced', u'Terminated', u'Parish',
       u'National Park', u'Population', u'Households', u'Built up area',
       u'Built up sub-division', u'Lower layer super output area',
       u'Rural/urban', u'Region', u'Altitude', u'London zone', u'LSOA Code',
       u'Local authority', u'MSOA Code', u'Middle layer super output area',
       u'Parish Code', u'Census output area', u'Constituency Code',
       u'Index of Multiple Deprivation', u'Quality', u'User Type',
       u'Last updated', u'Nearest station', u'Distance to station',
       u'Postcode area', u'Postcode district'],
      dtype='object')

In [3]:
London.head()

Unnamed: 0,Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,County,District,Ward,District Code,Ward Code,Country,County Code,Constituency,Introduced,Terminated,Parish,National Park,Population,Households,Built up area,Built up sub-division,Lower layer super output area,Rural/urban,Region,Altitude,London zone,LSOA Code,Local authority,MSOA Code,Middle layer super output area,Parish Code,Census output area,Constituency Code,Index of Multiple Deprivation,Quality,User Type,Last updated,Nearest station,Distance to station,Postcode area,Postcode district
0,BR1 1AA,Yes,51.401546,0.015415,540291,168873,TQ402688,Greater London,Bromley,Bromley Town,E09000006,E05000109,England,E11000009,Bromley and Chislehurst,2016-05-01,,"Bromley, unparished area",,,,Greater London,Bromley,Bromley 018B,Urban major conurbation,London,71,5,E01000675,,E02000144,Bromley 018,E43000196,E00003264,E14000604,20532,1,0,2018-11-15,Bromley South,0.218257,BR,BR1
1,BR1 1AB,Yes,51.406333,0.015208,540262,169405,TQ402694,Greater London,Bromley,Bromley Town,E09000006,E05000109,England,E11000009,Bromley and Chislehurst,2012-03-01,,"Bromley, unparished area",,,,Greater London,Bromley,Bromley 008B,Urban major conurbation,London,71,4,E01000676,,E02000134,Bromley 008,E43000196,E00003255,E14000604,10169,1,0,2018-11-15,Bromley North,0.253666,BR,BR1
2,BR1 1AD,No,51.400057,0.016715,540386,168710,TQ403687,Greater London,Bromley,Bromley Town,E09000006,E05000109,England,E11000009,Bromley and Chislehurst,2014-09-01,2017-09-01,"Bromley, unparished area",,,,Greater London,Bromley,Bromley 018B,Urban major conurbation,London,53,5,E01000675,,E02000144,Bromley 018,E43000196,E00003264,E14000604,20532,1,1,2018-11-15,Bromley South,0.044559,BR,BR1
3,BR1 1AE,Yes,51.404543,0.014195,540197,169204,TQ401692,Greater London,Bromley,Bromley Town,E09000006,E05000109,England,E11000009,Bromley and Chislehurst,2008-08-01,,"Bromley, unparished area",,34.0,21.0,Greater London,Bromley,Bromley 018C,Urban major conurbation,London,71,4,E01000677,,E02000144,Bromley 018,E43000196,E00003266,E14000604,19350,1,0,2018-11-15,Bromley North,0.462939,BR,BR1
4,BR1 1AF,Yes,51.401392,0.014948,540259,168855,TQ402688,Greater London,Bromley,Bromley Town,E09000006,E05000109,England,E11000009,Bromley and Chislehurst,2015-05-01,,"Bromley, unparished area",,,,Greater London,Bromley,Bromley 018B,Urban major conurbation,London,58,5,E01000675,,E02000144,Bromley 018,E43000196,E00003264,E14000604,20532,1,0,2018-11-15,Bromley South,0.227664,BR,BR1


We select the columns we need and group the result by *District* and *Ward* .

In [4]:
London = London[['District','Ward','Latitude','Longitude']]
London = London.groupby(['District','Ward']).mean().reset_index()
London.shape

(657, 4)

We rename the columns District to Borough and Ward to Neighborhood.

In [5]:
London = London.rename(columns = {'District' : 'Borough', 'Ward' : 'Neighborhood'})
London.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Barking and Dagenham,Abbey,51.539438,0.078646
1,Barking and Dagenham,Alibon,51.545854,0.150322
2,Barking and Dagenham,Becontree,51.554239,0.118916
3,Barking and Dagenham,Chadwell Heath,51.580459,0.136098
4,Barking and Dagenham,Eastbrook,51.555701,0.169529


##### Let's check these places on the map of London

In [6]:
geolocator = Nominatim(user_agent="Ldn-explorer")
location = geolocator.geocode('London, UK')
LDN_latitude = location.latitude
LDN_longitude = location.longitude
print('The geograpical coordinate of London are : {},{}'.format(LDN_latitude,LDN_longitude))

The geograpical coordinate of London are : 51.5073219,-0.1276474


In [7]:
# create map of London using latitude and longitude values
map_London= folium.Map(location=[LDN_latitude, LDN_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(London['Latitude'], London['Longitude'], London['Borough'], London['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.6,
        parse_html=False).add_to(map_London)  
    
map_London

### Collect the data using Foursquare API

Now we will use Foursquare API to collect all the information needed <br>
Let's define first our Credentials

In [7]:
#Credentials :
CLIENT_ID = 'XNMD2E4XZXZ4BIHWUA24BQ2HMP1A25MFIAT5GXITZIWMA0T4' # Foursquare ID
CLIENT_SECRET = '25KDYDWFLH2WIC0CXUSOAGGGME5A1MKFVC21V0G2USNYUAKJ' # Foursquare Secret
VERSION = '20180605' # Foursquare API version 
LIMIT = 100

Let's define a function that help us to collect the venues information using the Foursquare API 

In [8]:
def getNearbyVenues(Boroughs, names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    i = 0
    for borough, name, lat, lng in zip( Boroughs, names, latitudes, longitudes):
        i = i +1
        print('{} : {} '.format(str(i),name))
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            borough,
            name, 
            lat, 
            lng,
            v['venue']['id'], 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                  'Borough',
                  'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude',
                  'Venue ID', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We will use a radius of 500 meters (default value)

In [9]:
LDN_venues = getNearbyVenues(Boroughs = London['Borough'],
                                names= London['Neighborhood'],
                                latitudes= London['Latitude'],
                                longitudes=London['Longitude']
                                  )

1 : Abbey 
2 : Alibon 
3 : Becontree 
4 : Chadwell Heath 
5 : Eastbrook 
6 : Eastbury 
7 : Gascoigne 
8 : Goresbrook 
9 : Heath 
10 : Longbridge 
11 : Mayesbrook 
12 : Parsloes 
13 : River 
14 : Thames 
15 : Valence 
16 : Village 
17 : Whalebone 
18 : Brunswick Park 
19 : Burnt Oak 
20 : Childs Hill 
21 : Colindale 
22 : Coppetts 
23 : East Barnet 
24 : East Finchley 
25 : Edgware 
26 : Finchley Church End 
27 : Garden Suburb 
28 : Golders Green 
29 : Hale 
30 : Hendon 
31 : High Barnet 
32 : Mill Hill 
33 : Oakleigh 
34 : Totteridge 
35 : Underhill 
36 : West Finchley 
37 : West Hendon 
38 : Woodhouse 
39 : Barnehurst 
40 : Belvedere 
41 : Bexleyheath 
42 : Blackfen & Lamorbey 
43 : Blendon & Penhill 
44 : Crayford 
45 : Crook Log 
46 : East Wickham 
47 : Erith 
48 : Falconwood & Welling 
49 : Longlands 
50 : Northumberland Heath 
51 : Sidcup 
52 : Slade Green & Northend 
53 : St Mary's & St James 
54 : Thamesmead East 
55 : West Heath 
56 : Alperton 
57 : Barnhill 
58 : Brondesbury P

420 : Clapham Common 
421 : Clapham Town 
422 : Coldharbour 
423 : Ferndale 
424 : Gipsy Hill 
425 : Herne Hill 
426 : Knight's Hill 
427 : Larkhall 
428 : Oval 
429 : Prince's 
430 : St Leonard's 
431 : Stockwell 
432 : Streatham Hill 
433 : Streatham South 
434 : Streatham Wells 
435 : Thornton 
436 : Thurlow Park 
437 : Tulse Hill 
438 : Vassall 
439 : Bellingham 
440 : Blackheath 
441 : Brockley 
442 : Catford South 
443 : Crofton Park 
444 : Downham 
445 : Evelyn 
446 : Forest Hill 
447 : Grove Park 
448 : Ladywell 
449 : Lee Green 
450 : Lewisham Central 
451 : New Cross 
452 : Perry Vale 
453 : Rushey Green 
454 : Sydenham 
455 : Telegraph Hill 
456 : Whitefoot 
457 : Abbey 
458 : Cannon Hill 
459 : Colliers Wood 
460 : Cricket Green 
461 : Dundonald 
462 : Figge's Marsh 
463 : Graveney 
464 : Hillside 
465 : Lavender Fields 
466 : Longthornton 
467 : Lower Morden 
468 : Merton Park 
469 : Pollards Hill 
470 : Ravensbury 
471 : Raynes Park 
472 : St Helier 
473 : Trinity 
474 : 

In [10]:
LDN_venues.to_csv('data_LDN.csv', encoding='utf-8')

In [28]:
LDN_venues = pd.read_csv('data_LDN.csv',index_col = 0)

##### Let's see what the data collected looks like

In [29]:
LDN_venues.head()

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,Abbey,51.539438,0.078646,4d1235522e5837045532e2d1,Nando's,51.539655,0.081828,Portuguese Restaurant
1,Barking and Dagenham,Abbey,51.539438,0.078646,54f61cbc498e8bf6d9942aad,Cristina's,51.536523,0.076672,Steakhouse
2,Barking and Dagenham,Abbey,51.539438,0.078646,4bef953cea570f47c68f8fd2,Barking Abbey,51.535352,0.076054,Park
3,Barking and Dagenham,Abbey,51.539438,0.078646,5628014b498e3a8dc4613c3a,The Gym London Barking,51.536193,0.078601,Gym
4,Barking and Dagenham,Abbey,51.539438,0.078646,4bd6d06ccfa7b713581028da,Subway,51.538688,0.080788,Sandwich Place


### Top Indian Restaurants

Let's look for the Indian Restaurant among this list of venues

In [30]:
LDN_venues_Ind = LDN_venues[LDN_venues['Venue Category'] == 'Indian Restaurant'].reset_index(drop = True)

In [31]:
LDN_venues_Ind.head()

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,Chadwell Heath,51.580459,0.136098,4eda68c10e61d46ad8079062,Tandoori Hut,51.581113,0.138522,Indian Restaurant
1,Barnet,Coppetts,51.610247,-0.150876,4b4cebf1f964a5204bc426e3,Boulevard Club,51.614477,-0.149575,Indian Restaurant
2,Barnet,East Finchley,51.591907,-0.169203,4be6fc83d4f7c9b6af4b2720,Majjo's,51.589515,-0.163581,Indian Restaurant
3,Barnet,Finchley Church End,51.598672,-0.198558,4b2fdba4f964a52061f124e3,Sun and Sand,51.599815,-0.196658,Indian Restaurant
4,Barnet,Hendon,51.588959,-0.222478,4b4a5996f964a520d38426e3,Original Lahore Restaurant,51.58757,-0.22092,Indian Restaurant


In [32]:
CLIENT_ID = 'SSZNCBOH4JGHQGQJSSGZXXU4F3ADFNM33LI4T0OSAL20XTXA' # your Foursquare ID
CLIENT_SECRET = 'EM45KNIUAGG30OARQIGAGTF3OG41BBNKBGSBO0QHZRKRUWS2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

The *getNearbyVenue* function doesn't return the rating of these restaurants so will need to use their id to get more information about them. Let's then define the function *get_rating* that use Foursquare API to collect the rating of restaurants and add it the DataFrame.

In [33]:
def get_rating(V):
    rating_list=[]
    i = 0
    for v_id in V['Venue ID']:
        # create the API request URL
        i = i+1
        if i%20 == 0 :
            print(i)
        url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(v_id,CLIENT_ID,CLIENT_SECRET,VERSION)
        # make the GET request
        results = requests.get(url).json()
        if ('venue' in results['response'].keys()) and ('rating' in results['response']['venue'].keys()):
            rating_list.append(results['response']['venue']['rating'])
        else :
            rating_list.append('None')
    V['Rating'] = rating_list
    return V
LDN_venues_Ind = get_rating(LDN_venues_Ind)

20
40
60
80
100
120
140
160
180
200
220
240
260
280
300


In [34]:
LDN_venues_Ind

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category,Rating
0,Barking and Dagenham,Chadwell Heath,51.580459,0.136098,4eda68c10e61d46ad8079062,Tandoori Hut,51.581113,0.138522,Indian Restaurant,
1,Barnet,Coppetts,51.610247,-0.150876,4b4cebf1f964a5204bc426e3,Boulevard Club,51.614477,-0.149575,Indian Restaurant,
2,Barnet,East Finchley,51.591907,-0.169203,4be6fc83d4f7c9b6af4b2720,Majjo's,51.589515,-0.163581,Indian Restaurant,7.5
3,Barnet,Finchley Church End,51.598672,-0.198558,4b2fdba4f964a52061f124e3,Sun and Sand,51.599815,-0.196658,Indian Restaurant,7.5
4,Barnet,Hendon,51.588959,-0.222478,4b4a5996f964a520d38426e3,Original Lahore Restaurant,51.58757,-0.22092,Indian Restaurant,
5,Barnet,West Finchley,51.60797,-0.18569,4e2b2ab718a80bb058565445,Rani,51.604449,-0.188331,Indian Restaurant,6.2
6,Barnet,West Finchley,51.60797,-0.18569,4c9e477a8afca0934947ff15,Meera's Xpress,51.604916,-0.182462,Indian Restaurant,
7,Barnet,West Finchley,51.60797,-0.18569,4ec55c42775b7dea87be4da1,Guru Indian Restaurant,51.604345,-0.188227,Indian Restaurant,
8,Brent,Alperton,51.540182,-0.293987,4bf67c50d4cdb713d15b84fe,Maru Bhajias,51.543873,-0.2972,Indian Restaurant,7.8
9,Brent,Alperton,51.540182,-0.293987,4b5a0519f964a5202ba828e3,The Clay Oven,51.54051,-0.29865,Indian Restaurant,


We filter our result by choosing just the Indian restaurants that have a rating.

In [35]:
LDN_venues_Ind = LDN_venues_Ind[LDN_venues_Ind['Rating'] != 'None'].reset_index(drop = True)

Let's take a look on the rating of the Indian Restaurants we have in our data :

In [36]:
LDN_venues_Ind['Rating']

0      7.5
1      7.5
2      6.2
3      7.8
4      6.8
5      7.1
6      5.6
7      7.7
8      7.1
9      7.6
10     7.9
11     7.7
12     7.5
13     6.8
14     6.1
15     7.7
16     7.3
17     7.9
18     7.1
19     9.5
20       8
21     7.3
22     5.9
23     8.1
24     7.9
25     8.1
26     9.2
27     7.9
28     7.6
29     7.9
30     7.6
31     7.8
32     7.9
33     7.6
34     7.9
35     7.6
36     7.5
37     7.9
38     7.6
39     7.8
40     7.8
41     7.5
42     7.5
43     8.1
44     8.1
45     9.2
46     7.5
47     7.6
48     7.2
49     8.1
50     7.9
51     7.6
52     8.1
53     7.6
54     7.9
55     7.6
56     7.9
57     7.6
58     6.1
59     7.8
60     7.6
61     6.4
62     7.6
63     7.4
64     9.6
65     8.1
66     5.9
67     6.1
68     7.4
69     7.7
70     6.7
71     6.5
72     8.2
73     7.7
74       7
75     8.2
76     8.4
77     7.7
78     7.7
79     6.6
80       7
81     7.1
82     7.1
83     7.1
84       7
85     7.6
86     6.5
87     7.4
88     7.4
89     7.5
90     6.6

In [37]:
LDN_venues_Ind.shape

(222, 10)

In [38]:
LDN_venues_Ind

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category,Rating
0,Barnet,East Finchley,51.591907,-0.169203,4be6fc83d4f7c9b6af4b2720,Majjo's,51.589515,-0.163581,Indian Restaurant,7.5
1,Barnet,Finchley Church End,51.598672,-0.198558,4b2fdba4f964a52061f124e3,Sun and Sand,51.599815,-0.196658,Indian Restaurant,7.5
2,Barnet,West Finchley,51.60797,-0.18569,4e2b2ab718a80bb058565445,Rani,51.604449,-0.188331,Indian Restaurant,6.2
3,Brent,Alperton,51.540182,-0.293987,4bf67c50d4cdb713d15b84fe,Maru Bhajias,51.543873,-0.2972,Indian Restaurant,7.8
4,Brent,Northwick Park,51.566576,-0.316862,4bb2e75d14cfd13ac78615ab,Mumbai Junction,51.567265,-0.321747,Indian Restaurant,6.8
5,Brent,Queens Park,51.534667,-0.213917,4be5c301cf200f4701b0133c,Curry Nights,51.531931,-0.217563,Indian Restaurant,7.1
6,Brent,Tokyngton,51.555851,-0.279472,4bca092fb6c49c7456948f91,Bobby Moore Spice,51.558537,-0.278152,Indian Restaurant,5.6
7,Brent,Wembley Central,51.551152,-0.296438,4cb0c630db32f04d8bd6c24d,Palm Beach,51.551351,-0.298647,Indian Restaurant,7.7
8,Brent,Wembley Central,51.551152,-0.296438,4fbd3bede4b0d45a095fde4a,Saravana Bhavan,51.551316,-0.298753,Indian Restaurant,7.1
9,Brent,Willesden Green,51.547607,-0.232489,4bb870b598c7ef3b2c193102,Kadiris,51.54751,-0.226541,Indian Restaurant,7.6


##### Now we come to an interesting criterea : "How to define a good restaurant". Our answer is to choose a threshold for the rating and get all the remaining restaurants above this threshold. 
##### Our threshold will be 9/10 as we suppose the investor is looking to open a very good restaurant and he has all the money to provision it as these good restaurant.

In [39]:
LDN_venues_Ind['Rating'] = LDN_venues_Ind['Rating'].astype(float)
Top_indian_restaurant = LDN_venues_Ind[LDN_venues_Ind['Rating'] >= 9.0][['Borough','Neighborhood','Venue ID','Rating']].reset_index(drop=True).groupby(['Borough','Neighborhood','Venue ID']).mean().reset_index()

In [40]:
Latitudes = []
Longitudes = []
for borough, neighborhood in zip(Top_indian_restaurant['Borough'],Top_indian_restaurant['Neighborhood']):
    df = London[London['Borough'] == borough]
    df = df[df['Neighborhood'] == neighborhood]
    Latitudes.append(df['Latitude'].values[0])
    Longitudes.append(df['Longitude'].values[0])


##### Now let's see how the list of the best restaurant with the names of their boroughs, neighborhoods, Latitude, Longitude and Rating

In [41]:
Top_indian_restaurant['Latitude'] = Latitudes
Top_indian_restaurant['Longitude'] = Longitudes
Top_indian_restaurant

Unnamed: 0,Borough,Neighborhood,Venue ID,Rating,Latitude,Longitude
0,Camden,St Pancras and Somers Town,53cd09ab498e5a7d7b19b4c1,9.5,51.534992,-0.131719
1,City of London,Bishopsgate,56465da7498ec6f39fb6dee5,9.2,51.517772,-0.080899
2,City of London,Portsoken,56465da7498ec6f39fb6dee5,9.2,51.514568,-0.075636
3,Hackney,Hoxton East & Shoreditch,5040721ae4b01446aa41d438,9.6,51.527112,-0.08156
4,Kensington and Chelsea,Campden,5a08ab31dec1d60b246b1ad9,9.1,51.504697,-0.195304
5,Kensington and Chelsea,Queen's Gate,54bd8243498ecd38da686040,9.1,51.498521,-0.185496
6,Kensington and Chelsea,Queen's Gate,5a08ab31dec1d60b246b1ad9,9.1,51.498521,-0.185496
7,Lewisham,Crofton Park,4ac518e0f964a52030aa20e3,9.0,51.448641,-0.039665
8,Southwark,Rye Lane,4ae2be59f964a520658f21e3,9.1,51.469996,-0.070042
9,Tower Hamlets,Spitalfields & Banglatown,56465da7498ec6f39fb6dee5,9.2,51.519117,-0.071417


### Machine Learning : KNN

###### Let's get back to the data stored in our DataFrame LDN_venues and get the category of all venues we have in a one_hot_encoding format

In [42]:
# one hot encoding
LDN_onehot = pd.get_dummies(LDN_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
LDN_onehot = LDN_onehot.reset_index(drop=True)
LDN_onehot.head()

Unnamed: 0,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bar,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Betting Shop,Bike Rental / Bike Share,Bike Shop,Bistro,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Bulgarian Restaurant,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Canal,Canal Lock,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Carpet Store,Casino,Castle,Caucasian Restaurant,Cemetery,Chaat Place,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,College Gym,College Quad,College Residence Hall,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cricket Ground,Cuban Restaurant,Cupcake Shop,Currency Exchange,Currywurst Joint,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dive Shop,Doctor's Office,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gun Shop,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Herbs & Spices Store,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hunan Restaurant,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Iraqi Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Malay Restaurant,Mamak Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Mall,Outlet Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Physical Therapist,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,Racetrack,Radio Station,Rafting,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Reservoir,Residential Building (Apartment / Condo),Rest Area,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Rugby Pitch,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Scottish Restaurant,Sculpture Garden,Seafood Restaurant,Shaanxi Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Taxi Stand,Tea Room,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Toll Booth,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tram Station,Travel & Transport,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Veneto Restaurant,Video Game Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yakitori Restaurant,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [43]:
LDN_onehot['Borough'] = LDN_venues['Borough'] 
LDN_onehot['Neighborhood'] = LDN_venues['Neighborhood']

#### Important criterea

We group our data by bouroughs and neighborhoods by taking the average of these values. Then we delete from these data the boroughs were we could find the best Indian Restaurant. We have choosen this criterea because we want to be far from competition of these good restaurants. 

In [44]:
X =  LDN_onehot.groupby(['Borough','Neighborhood']).mean().reset_index()
i = 0
for borough, neighborhood in zip(Top_indian_restaurant['Borough'],Top_indian_restaurant['Neighborhood']):
    i = i +1 
    if i == 1 :
        S = ~((X['Neighborhood']==neighborhood) & (X['Borough']==borough))
    else :  
        S = S & ~((X['Neighborhood']==neighborhood) & (X['Borough']==borough))

        Y1 = X[S]
for borough in Top_indian_restaurant['Borough']:
    Y1 = Y1[Y1['Borough']!= borough]


Y2 = X[~S]

In [45]:
Y11 = Y1.drop(['Borough', 'Neighborhood'], axis = 1).values
Y21 = Y2.drop(['Borough', 'Neighborhood'], axis = 1).values

## KNN

We will use *Manually* the K-NN algorithm to get the K best similar places to the ones identified as good places for Indian Restaurants. we will define the distances between the two arrays (Y21 of the places of the best Indian Restaurants and the Y11 for the rest).

In [46]:
from scipy.spatial import distance
Distances = distance.cdist(Y21, Y11, 'euclidean')

In [47]:
Distances

array([[0.26918272, 0.46067648, 0.45046698, ..., 0.46067648, 0.48044508,
        0.60905152],
       [0.27326493, 0.45343136, 0.46647615, ..., 0.4621688 , 0.47074409,
        0.6329297 ],
       [0.27609825, 0.45628938, 0.48600412, ..., 0.46497312, 0.47770284,
        0.64436015],
       ...,
       [0.27434473, 0.44362146, 0.4698936 , ..., 0.45254834, 0.46561787,
        0.63150614],
       [0.31711304, 0.49132474, 0.48311489, ..., 0.49132474, 0.49132474,
        0.6483826 ],
       [0.30549288, 0.47391982, 0.46968074, ..., 0.47812132, 0.46968074,
        0.63529521]])

Now we have choosen K = 3 to get the top 3 similar places of each place among the ones in our training data.
Let's what the result looks like :

In [55]:
m,n = Distances.shape
venues_list = []
K = 3
for i in range(m) :
    indexes = np.argsort(Distances[i])
    for j in range(K):
        venue = Y1.iloc[indexes[j]]
        venue_details = London[(London['Borough']== venue['Borough']) & (London['Neighborhood']== venue['Neighborhood'])].reset_index(drop=True)
        venues_list.append([venue_details.loc[0,'Borough'],
            venue_details.loc[0,'Neighborhood'],
            venue_details.loc[0,'Latitude'], 
            venue_details.loc[0,'Longitude']
                       ])
best_places = pd.DataFrame(venues_list,columns=['Borough','Neighborhood','Latitude','Longitude'])
best_places.head(18)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Lambeth,Clapham Town,51.464214,-0.140251
1,Lambeth,Coldharbour,51.463543,-0.109225
2,Islington,St Peter's,51.535218,-0.098018
3,Islington,Bunhill,51.524003,-0.093838
4,Hammersmith and Fulham,Hammersmith Broadway,51.493197,-0.226935
5,Lambeth,Coldharbour,51.463543,-0.109225
6,Hammersmith and Fulham,Hammersmith Broadway,51.493197,-0.226935
7,Wandsworth,Thamesfield,51.464483,-0.216192
8,Islington,Bunhill,51.524003,-0.093838
9,Islington,Clerkenwell,51.524617,-0.110779


Foreach one of the best Indian Restaurants we found we could find the 3 best places similar to its place. We had 6 Restaurants so we have in our DataFrame Result 18 = 6 * 3 different places.

### Results

##### Let's look for these places on the map :

In [56]:
# create map of London using latitude and longitude values
map_best_place = folium.Map(location=[LDN_latitude, LDN_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(best_places['Latitude'], best_places['Longitude'], best_places['Borough'], best_places['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.6,
        parse_html=False).add_to(map_best_place)  
    
map_best_place

##### As you can see we found almost 20 places we can suggest to an investor where he can open a new Indian Restaurant and be confident that his Restaurant will get a very high rating.

Now It's time for investor to do further analysis and choose the best location among the suggestions for him.

#### NIZAR 