<h2>Capstone Project - The Battle of Neighborhoods </h2> <br>
*by Stephan Bartelheim*

**1 Introduction**  
Düsseldorf is a german city with the highest concentration of japanese citizens outside Japan. Apart from a big number of subsidiaries of big Japanese companies and banks, there is also a plethora of japanese shops and restaurants. Our client a recent arrival from Japan wishes to open another restaurant in the city. Since he is unfamiliar with the place, he needs advice on where to open and what kind of food to offer. We, therefore, try to identify areas in which japanese restaurants tend to thrive and then pick one where the competition is particularly weak. Lastly we will choose a category of japanese food (e.g. Ramen, Sushi, general) that is generally popular but again faces weak competition in our selected area. In the end the client will be left with a very specific recommendation.

**2 Data**  
The area data can be accessed under https://www.dasoertliche.de/Themen/Postleitzahlen/D%C3%BCsseldorf.html. It is from the german phone register provided by the *Deutsche Telekom AG* and is free to use. Unlike in the previous exercises we will divide the city by postal codes and not by neighborhoods, since that is how the data is structured. We will use the *geocoder package* to add coordinates and the FourSquare API *explore* call to retrieve data about the composition of the areas. This will give us a list of venues listed on FourSquare. Apart from business this can for example be tourist sites or sites for access to public transport. Lastly the data about the competing venues in the chosen area is retrieved from the FourSqare API to see how those businesses are rated by customers.

**3 Methodology**

In [61]:
%%capture
#install and import required packages

!pip install lxml beautifulsoup4 html5lib matplotlib -U
import pandas as pd
import numpy as np

import requests
import json

!pip install geocoder
import geocoder

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!pip install folium
import folium 


Requirement already up-to-date: lxml in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (4.6.2)
Requirement already up-to-date: beautifulsoup4 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (4.9.3)
Requirement already up-to-date: html5lib in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (1.1)
Requirement already up-to-date: matplotlib in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (3.3.3)


In [2]:
# We scrape the neighborhood data from the phone register
list_of_dfs = pd.read_html('https://www.dasoertliche.de/Themen/Postleitzahlen/Duesseldorf.html')

Looking at the data we see that there is 38 areas. There is several redundant columns and the and the column labels are in german. We right these problems and add coordinates using the geocoder package with the arcgis API.

In [3]:
DUS=list_of_dfs[0]
DUS

Unnamed: 0,PLZ,Ortsname,Ortsteil,Landkreis,Bundesland
0,40210,Düsseldorf,FriedrichstadtStadtmitte,Stadt Düsseldorf,Nordrhein-Westfalen
1,40211,Düsseldorf,PempelfortStadtmitte,Stadt Düsseldorf,Nordrhein-Westfalen
2,40212,Düsseldorf,FriedrichstadtStadtmitte,Stadt Düsseldorf,Nordrhein-Westfalen
3,40213,Düsseldorf,AltstadtCarlstadtFriedrichstadtPempelfortStadt...,Stadt Düsseldorf,Nordrhein-Westfalen
4,40215,Düsseldorf,Friedrichstadt,Stadt Düsseldorf,Nordrhein-Westfalen
5,40217,Düsseldorf,FriedrichstadtUnterbilk,Stadt Düsseldorf,Nordrhein-Westfalen
6,40219,Düsseldorf,HafenUnterbilk,Stadt Düsseldorf,Nordrhein-Westfalen
7,40221,Düsseldorf,BilkFleheHafenHammUnterbilkVolmerswerth,Stadt Düsseldorf,Nordrhein-Westfalen
8,40223,Düsseldorf,BilkFleheUnterbilk,Stadt Düsseldorf,Nordrhein-Westfalen
9,40225,Düsseldorf,BilkFleheOberbilkWersten,Stadt Düsseldorf,Nordrhein-Westfalen


In [4]:
# drop  useless column and translate column names
DUS.drop(columns=['Landkreis','Bundesland','Ortsname'],inplace=True)
DUS.rename(columns={"PLZ":"Postal Code","Ortsteil":"Neighborhood"},inplace=True)
DUS.head()

In [7]:
# adding coordinates
latitude=[]
longitude=[]
for code in DUS['Postal Code']:
    g = geocoder.arcgis('{}, Düsseldorf, Germany'.format(code))
    print(code, g.latlng)
    while (g.latlng is None):
        g = geocoder.arcgis('{}, Düsseldorf, Germany'.format(code))
        #print(code, g.latlng)
    latitude.append(g.latlng[0])
    longitude.append(g.latlng[1])
    
DUS['Latitude'],DUS['Longitude']=latitude, longitude

40210 [51.22150000000005, 6.789191251000034]
40211 [51.22951000000006, 6.789158909000037]
40212 [51.223825317000035, 6.782230000000027]
40213 [51.224287497000034, 6.773790000000076]
40215 [51.213835000000074, 6.784225497000023]
40217 [51.21256632700005, 6.774075000000039]
40219 [51.214015000000074, 6.7626072300000715]
40221 [51.197482853000054, 6.7508150000000455]
40223 [51.19999500000006, 6.771754548000047]
40225 [51.195430000000044, 6.792440561000035]
40227 [51.21338000000003, 6.801196244000039]
40229 [51.19814000000008, 6.839687545000061]
40231 [51.21238500000004, 6.829771300000061]
40233 [51.22154500000005, 6.812466556000061]
40235 [51.23440159300003, 6.824325000000044]
40237 [51.23772500000007, 6.810427326000024]
40239 [51.24330000000003, 6.804985062000071]
40468 [51.27115500000008, 6.77094326200006]
40470 [51.25533500000006, 6.806955039000059]
40472 [51.27154000000007, 6.830336058000057]
40474 [51.26917000000003, 6.728711524000062]
40476 [51.24750000000006, 6.782821579000029]
404

Since some neighborhoods contain several postal codes, we will perform the further analysis based on the unique postal codes instead of the neighborhoods as in the Toronto and Manhattan analyses. Next we use the FourSqaure API to retrieve lists of venues from our areas.

In [11]:
# The code was removed by Watson Studio for sharing.

In [12]:
# define function to use the FourSquare API explore call
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['id'],
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue_ID',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
%%capture
# retrieve venues for postcode areas
dusseldorf_venues = getNearbyVenues(names=DUS['Postal Code'],
                                   latitudes=DUS['Latitude'],
                                   longitudes=DUS['Longitude']
                                  )

40210
40211
40212
40213
40215
40217
40219
40221
40223
40225
40227
40229
40231
40233
40235
40237
40239
40468
40470
40472
40474
40476
40477
40479
40489
40545
40547
40549
40589
40591
40593
40595
40597
40599
40625
40627
40629
40489
40597


Looking at the number of venues we were able to retrieve by neighborhood we see that there is some areas with very few venues. This could mean that there are really very few businesses in these areas in which case it's probably not an area where a restaurant could strive or that we are just lacking data in which case we can't say much about the characteristics of the area. Here more data would be required. For now we exclude areas with less than 15 venues.

In [14]:
#numbers of venues per area
dusseldorf_venues.groupby('Postal Code').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue_ID,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
40210,100,100,100,100,100,100,100
40211,40,40,40,40,40,40,40
40212,100,100,100,100,100,100,100
40213,100,100,100,100,100,100,100
40215,65,65,65,65,65,65,65
40217,72,72,72,72,72,72,72
40219,81,81,81,81,81,81,81
40221,2,2,2,2,2,2,2
40223,21,21,21,21,21,21,21
40225,7,7,7,7,7,7,7


In [50]:
# filter out areas with sparse data
dus_venues_red=dusseldorf_venues.groupby('Postal Code').filter(lambda x:len(x)>15)

(904, 8)

Next we see that there are 904 venues left in total coming from 174 categories. This would later result in very high variance model. We therefore group similar categories together and remove the remaining that appear less than 3 times. In the process we only lose 31 rows and reduce the number of categories to 14.

In [None]:
dus_venues_red.shape

In [62]:
# number of venue categories
len(dus_venues_red['Venue Category'].unique())

145

In [16]:
# list of venue categories
dus_venues_red['Venue Category'].unique()

array(['Grocery Store', 'Japanese Restaurant', 'Ramen Restaurant',
       'Korean Restaurant', 'Italian Restaurant', 'Arts & Crafts Store',
       'Souvlaki Shop', 'Ice Cream Shop', 'Chinese Restaurant',
       'Pastry Shop', 'Café', 'Cocktail Bar', 'Hotel', 'Burger Joint',
       'Vegetarian / Vegan Restaurant', 'Bar', 'Sushi Restaurant',
       'Brewery', 'Bakery', 'Ethiopian Restaurant', 'Thai Restaurant',
       'Turkish Restaurant', 'Bubble Tea Shop', 'Breakfast Spot',
       'Greek Restaurant', 'Frozen Yogurt Shop', 'Coffee Shop',
       'Indie Movie Theater', 'Theater', 'Soba Restaurant', 'Bookstore',
       'North Indian Restaurant', 'Seafood Restaurant', 'Cigkofte Place',
       'Doner Restaurant', 'Salad Place', 'Bistro',
       'General Entertainment', 'Fast Food Restaurant', 'Pizza Place',
       'Pharmacy', 'Donut Shop', 'Drugstore', 'Advertising Agency',
       'Mexican Restaurant', 'Middle Eastern Restaurant',
       'Trattoria/Osteria', 'Sporting Goods Shop', 'Shoe Stor

In [53]:
# duplicate category column for simplification
dus_venues_red['Category simple']=dus_venues_red['Venue Category']

In [54]:
#grouping of categories for simplified categories
dus_venues_red['Category simple'].replace(regex=['^Ramen.*','^Sushi.*','^Japanese.*','^Soba.*'] , value='Japanese',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*[Ss]hop.*','.*[Ss]tore.*','.*Boutique.*'] , value='Shopping',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*Restaurant.*','^Pizza.*','^Burger.*','^Bistro.*','^Steak.*','^Trattoria.*','^BBQ.*','^Deli.*','^Sandwich.*','^Breakfast.*','^Soup.*'], value='Restaurant other',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*Bar.*','.*[Pp]ub.*','^Taverna.*','^Nightclub','^Brewery.*','^Rock.*'], value='Drinking Place',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*[Gg]ym.*','.*[Ss]occer.*','.*[Ss]port.*','.*Yoga.*','.*Hockey.*'], value='Sports Venue',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*[Mm]arket.*','.*[Gg]rocer.*'], value='Groceries',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*[Mm]useum.*','.*[Tt]heater.*','.*[Gg]allery.*','.*Site.*','^Opera.*'], value='Culture',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*[Ss]top.*','.*[Ss]tation.*','[Pp]latform.*'], value='Public Transport',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*Hostel.*'], value='Hotel',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*[Pp]ark.*','.*Playground.*','.*[Pp]laza.*','^Fountain.*'], value='Recreation',inplace=True)
dus_venues_red['Category simple'].replace(regex=['.*[Aa]gency.*','.*[Ss]ervice.*'], value='Business Services',inplace=True)
len(dus_venues_red['Category simple'].unique())

43

In [55]:
# filter out venues from rare categories
dus_venues_red=dus_venues_red.groupby('Category simple').filter(lambda x:len(x)>2)

In [56]:
dus_venues_red.shape

(873, 9)

In [21]:
# frequency of new categories
dus_venues_red.groupby('Category simple')['Venue'].count()

Category simple
Bakery                27
Bank                   6
Business Services      6
Café                  52
Culture               18
Drinking Place        78
Groceries             28
Hotel                 57
Japanese              46
Public Transport      26
Recreation            35
Restaurant other     305
Shopping             174
Sports Venue          15
Name: Venue, dtype: int64

We next create dummy variables from the simplified categories, calculate mean value for the areas and perform knn clustering. We do not, however, use the 'Japanese' dummy since we want to know what characterises areas with lots of japanese restaruants apart from the fact that there is a lot of japanese restaurants. Working with 7 clusters here gives the nicest result with relatively balanced clusters and a geographical pattern where areas on the same side of the city tend to fall into the same cluster and central and outside areas as well.

In [23]:
# creating dummies for clustering

# one hot encoding
dus_onehot = pd.get_dummies(dus_venues_red[['Category simple']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dus_onehot.insert(loc=0,column='Postal Code', value=dus_venues_red['Postal Code'])

dus_grouped = dus_onehot.groupby('Postal Code').mean().reset_index()
dus_grouped

Unnamed: 0,Postal Code,Bakery,Bank,Business Services,Café,Culture,Drinking Place,Groceries,Hotel,Japanese,Public Transport,Recreation,Restaurant other,Shopping,Sports Venue
0,40210,0.020833,0.0,0.0,0.052083,0.041667,0.041667,0.0,0.09375,0.229167,0.0,0.0,0.354167,0.166667,0.0
1,40211,0.025641,0.0,0.025641,0.076923,0.025641,0.076923,0.0,0.153846,0.051282,0.025641,0.025641,0.410256,0.076923,0.025641
2,40212,0.020408,0.0,0.0,0.05102,0.020408,0.030612,0.0,0.071429,0.163265,0.0,0.030612,0.173469,0.438776,0.0
3,40213,0.020833,0.0,0.0,0.0625,0.052083,0.197917,0.020833,0.03125,0.0,0.0,0.072917,0.291667,0.25,0.0
4,40215,0.0625,0.0,0.046875,0.109375,0.015625,0.125,0.0,0.171875,0.015625,0.046875,0.015625,0.28125,0.109375,0.0
5,40217,0.014493,0.0,0.0,0.057971,0.014493,0.086957,0.057971,0.043478,0.014493,0.014493,0.072464,0.391304,0.202899,0.028986
6,40219,0.025974,0.012987,0.0,0.103896,0.0,0.077922,0.012987,0.0,0.012987,0.051948,0.051948,0.506494,0.142857,0.0
7,40223,0.1,0.0,0.0,0.0,0.05,0.0,0.15,0.05,0.0,0.0,0.05,0.15,0.35,0.1
8,40227,0.035714,0.0,0.0,0.0,0.0,0.107143,0.035714,0.107143,0.0,0.071429,0.0,0.357143,0.25,0.035714
9,40233,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.083333,0.0,0.0,0.041667,0.416667,0.25,0.083333


In [24]:
# set number of clusters
kclusters = 7

dus_grouped_clustering = dus_grouped.drop(['Postal Code','Japanese'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dus_grouped_clustering)
kmeans.labels_[0:16]

array([0, 0, 2, 5, 0, 6, 4, 2, 6, 6, 1, 4, 4, 4, 5, 3], dtype=int32)

In [25]:
# add clustering labels
dus_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

dus_merged = DUS

# merge labels on original dataframe
dus_merged = dus_merged.join(dus_grouped.set_index('Postal Code'), on='Postal Code',how='inner')

In [26]:
# create map
map_clusters = folium.Map(location=[51.2277, 6.7735], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

rainbow

['#8000ff', '#2c7ef7', '#2adddd', '#80ffb4', '#d4dd80', '#ff7e41', '#ff0000']

In [27]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dus_merged['Latitude'], dus_merged['Longitude'], dus_merged['Postal Code'], dus_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [28]:
dus_merged

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude,Cluster Labels,Bakery,Bank,Business Services,Café,Culture,Drinking Place,Groceries,Hotel,Japanese,Public Transport,Recreation,Restaurant other,Shopping,Sports Venue
0,40210,FriedrichstadtStadtmitte,51.2215,6.789191,0,0.020833,0.0,0.0,0.052083,0.041667,0.041667,0.0,0.09375,0.229167,0.0,0.0,0.354167,0.166667,0.0
1,40211,PempelfortStadtmitte,51.22951,6.789159,0,0.025641,0.0,0.025641,0.076923,0.025641,0.076923,0.0,0.153846,0.051282,0.025641,0.025641,0.410256,0.076923,0.025641
2,40212,FriedrichstadtStadtmitte,51.223825,6.78223,2,0.020408,0.0,0.0,0.05102,0.020408,0.030612,0.0,0.071429,0.163265,0.0,0.030612,0.173469,0.438776,0.0
3,40213,AltstadtCarlstadtFriedrichstadtPempelfortStadt...,51.224287,6.77379,5,0.020833,0.0,0.0,0.0625,0.052083,0.197917,0.020833,0.03125,0.0,0.0,0.072917,0.291667,0.25,0.0
4,40215,Friedrichstadt,51.213835,6.784225,0,0.0625,0.0,0.046875,0.109375,0.015625,0.125,0.0,0.171875,0.015625,0.046875,0.015625,0.28125,0.109375,0.0
5,40217,FriedrichstadtUnterbilk,51.212566,6.774075,6,0.014493,0.0,0.0,0.057971,0.014493,0.086957,0.057971,0.043478,0.014493,0.014493,0.072464,0.391304,0.202899,0.028986
6,40219,HafenUnterbilk,51.214015,6.762607,4,0.025974,0.012987,0.0,0.103896,0.0,0.077922,0.012987,0.0,0.012987,0.051948,0.051948,0.506494,0.142857,0.0
8,40223,BilkFleheUnterbilk,51.199995,6.771755,2,0.1,0.0,0.0,0.0,0.05,0.0,0.15,0.05,0.0,0.0,0.05,0.15,0.35,0.1
10,40227,EllerOberbilk,51.21338,6.801196,6,0.035714,0.0,0.0,0.0,0.0,0.107143,0.035714,0.107143,0.0,0.071429,0.0,0.357143,0.25,0.035714
13,40233,Flingern NordFlingern SüdLierenfeldStadtmitte,51.221545,6.812467,6,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.083333,0.0,0.0,0.041667,0.416667,0.25,0.083333


We see that the cluster 0 located center-east has the highest share of japanese restaurants (9.8% of all venues). So this seems be an environment where japanese restaurants do thrive.

In [64]:
#mean values for clusters
dus_merged.groupby('Cluster Labels').mean()

Unnamed: 0_level_0,Postal Code,Latitude,Longitude,Bakery,Bank,Business Services,Café,Culture,Drinking Place,Groceries,Hotel,Japanese,Public Transport,Recreation,Restaurant other,Shopping,Sports Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
0,40212.0,51.221615,6.787525,0.036325,0.0,0.024172,0.07946,0.027644,0.081197,0.0,0.139824,0.098691,0.024172,0.013755,0.348558,0.117655,0.008547
1,40239.0,51.2433,6.804985,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,0.285714,0.214286,0.0,0.071429,0.285714
2,40217.5,51.21191,6.776992,0.060204,0.0,0.0,0.02551,0.035204,0.015306,0.075,0.060714,0.081633,0.0,0.040306,0.161735,0.394388,0.05
3,40606.333333,51.18807,6.86842,0.020833,0.055921,0.035088,0.105263,0.0,0.076754,0.132675,0.020833,0.0,0.132675,0.0,0.202851,0.182018,0.035088
4,40412.75,51.233663,6.778073,0.042906,0.007484,0.0,0.053047,0.009732,0.076221,0.035422,0.042692,0.017608,0.028293,0.040845,0.504644,0.141107,0.0
5,40379.0,51.22702,6.762558,0.026042,0.015625,0.0,0.0625,0.041667,0.223958,0.026042,0.03125,0.0,0.015625,0.083333,0.302083,0.15625,0.015625
6,40225.666667,51.21583,6.795913,0.016736,0.0,0.0,0.019324,0.004831,0.092478,0.045117,0.077985,0.004831,0.02864,0.038043,0.388371,0.2343,0.049344


Obviously we do want as little competition as possible for our client as well. So we next check in which neighborhood within this promisiong cluster the share is the lowest. In cluster 4 only 1.56% of venues are japanese restaurants.

In [30]:
#postal code areas in the cluster
dus_merged[dus_merged['Cluster Labels']==0]

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude,Cluster Labels,Bakery,Bank,Business Services,Café,Culture,Drinking Place,Groceries,Hotel,Japanese,Public Transport,Recreation,Restaurant other,Shopping,Sports Venue
0,40210,FriedrichstadtStadtmitte,51.2215,6.789191,0,0.020833,0.0,0.0,0.052083,0.041667,0.041667,0.0,0.09375,0.229167,0.0,0.0,0.354167,0.166667,0.0
1,40211,PempelfortStadtmitte,51.22951,6.789159,0,0.025641,0.0,0.025641,0.076923,0.025641,0.076923,0.0,0.153846,0.051282,0.025641,0.025641,0.410256,0.076923,0.025641
4,40215,Friedrichstadt,51.213835,6.784225,0,0.0625,0.0,0.046875,0.109375,0.015625,0.125,0.0,0.171875,0.015625,0.046875,0.015625,0.28125,0.109375,0.0


Looking in detail we see that there is in fact only one japanese restaurant in that area, the Tokyo Lounge, listed as a general japanese restaurant with a not outstanding rating of 7.7. We also see that there is a lot of Ramen restaurants in the other areas, so they appear to be quite popular.

In [47]:
#japanese restaurants in the clusters
dus_venues_red[(dus_venues_red['Postal Code'].isin([40215,40211,40210])) & (dus_venues_red['Category simple']=='Japanese')]

Unnamed: 0,Postal Code,Neighborhood Latitude,Neighborhood Longitude,Venue_ID,Venue,Venue Latitude,Venue Longitude,Venue Category,Category simple
1,40210,51.2215,6.789191,4b3b8db9f964a520b27525e3,Kushi Tei of Tokyo,51.223275,6.789558,Japanese Restaurant,Japanese
2,40210,51.2215,6.789191,5053696ce4b08e1d3c985b79,Nagomi,51.221913,6.786502,Japanese Restaurant,Japanese
3,40210,51.2215,6.789191,4b448154f964a520bef525e3,Kagaya,51.22132,6.788232,Japanese Restaurant,Japanese
4,40210,51.2215,6.789191,4b3be8f6f964a520207e25e3,Takumi,51.223429,6.788531,Ramen Restaurant,Japanese
22,40210,51.2215,6.789191,4b7e729cf964a52072ed2fe3,Hyuga,51.224525,6.789297,Sushi Restaurant,Japanese
24,40210,51.2215,6.789191,4f115acee4b09e81d8909f8a,Waraku,51.223664,6.787536,Japanese Restaurant,Japanese
28,40210,51.2215,6.789191,53429155498e8dd5982f0e8e,Takezo Ramen Bar,51.222617,6.790507,Ramen Restaurant,Japanese
32,40210,51.2215,6.789191,4bb64cb06edc76b092b7301c,Naniwa,51.224915,6.788172,Ramen Restaurant,Japanese
33,40210,51.2215,6.789191,4b65c6a0f964a520f8fe2ae3,Yabase,51.224732,6.788633,Japanese Restaurant,Japanese
37,40210,51.2215,6.789191,59da041146e1b64f7c9cbe9f,Takumi 3rd Tori & Veggie,51.224532,6.788735,Ramen Restaurant,Japanese


In [46]:
#competitor ID

competitorID=dus_venues_red[(dus_venues_red['Postal Code']==40215) & (dus_venues_red['Category simple']=='Japanese')].Venue_ID.values[0]

In [36]:
#retrieve competitor information from FourSquare

url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(competitorID,CLIENT_ID,CLIENT_SECRET,VERSION)
         
competitor = requests.get(url).json()

In [60]:
#display the average rating

print(competitor['response']['venue']['rating'])

7.7


**5 Results**  
We have identified a cluster of three similarly structured adjacent post code areas in which japaese restaurants appear widely popular with an average density of japanese restaurants of almost 10% of all listed venues. Within this cluster, however, we made out one cluster with only one japanese restaurant, offering unspecified japanese cuisine, with an average rating 7.7. This seems to be a very promising area to open another restaurant. We further discovered that Ramen restaurants are the next common after the generalists. We would therefore recommend our client to open a Ramen joint to further avoid competition. 

**6 Discussion**  
While the results of our analysis might be a good starting point there is several things to consider. Firstly we had to exclude several neighborhoods from our analysis for a lack of data. In some areas we received information from the FourSquare API for a mere two venues. This might be because there is really few venues in that area or because the venues are missing on FourSquare or just because the 500 metre radius is too small. This leads to the second problem that the area in a 500 metre radius arount the centre of an area might not capture the area very well, either because it is smaller, bigger or just not very circular. Here a different method to retrieve venues might be preferable. Lastly we don't know if the existing venue structure really predicts the success of a new business very well. Demographic data for example might be more useful.

**7 Conclusion**  
Our recommendation to open a Ramen restaurant in the postal code area 40215 looks sound considering the underlying data. However, with more data sources and some refinements in the analysis we could gain more confidence in our advice.