# IBM Applied Data Science Capstone Course by Coursera
## Week 5 Final Report
_Bakeries in Budapest, Hungary_

Build a dataframe of districts in Budapest, Hungary by web scraping the data from Wikipedia page. Then get geographical coordinates of districts. Obtain venue data for districts from Foursquare API. Cluster districts. Select best cluster for a new bakery.

### Set up data

In [175]:
import numpy as np

import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json

from geopy.geocoders import Nominatim 
import geocoder 

import requests 
from bs4 import BeautifulSoup 

from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

print("Libraries imported.")

Libraries imported.


In [36]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Districts_of_Budapest").text

In [37]:
soup = BeautifulSoup(data, 'html.parser')

In [38]:
districtList = []

In [39]:
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    districtList.append(row.text)

In [40]:
bp_df = pd.DataFrame({"District": districtList})

bp_df.head()

Unnamed: 0,District
0,► 2nd District of Budapest‎ (6 P)
1,► 13th District of Budapest‎ (2 P)
2,► 15th District of Budapest‎ (5 P)
3,► 16th District of Budapest‎ (3 P)
4,► Belváros-Lipótváros‎ (14 P)


In [41]:
bp_df.shape

(23, 1)

### Coordinates

In [47]:
def get_latlng(district):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Budapest, Hungary'.format(district))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [48]:
coords = [ get_latlng(district) for district in bp_df["District"].tolist() ]

In [49]:
coords

[[47.50745000000006, 19.066640000000064],
 [47.510140000000035, 19.015030000000024],
 [47.49896000000007, 19.051270000000045],
 [47.53670000000005, 19.039910000000077],
 [47.50312000000008, 19.05066000000005],
 [47.472060014247916, 19.03251999012778],
 [47.43600001904926, 19.090220005759136],
 [47.50091000000003, 19.069360000000074],
 [47.47608000000008, 19.07710000000003],
 [47.49972000000008, 19.055080000000032],
 [47.55353001763743, 18.727300026792705],
 [47.41260000849729, 19.173109987365592],
 [47.597899999282276, 19.04389004178526],
 [47.54158000000007, 19.045010000000048],
 [47.43333000000007, 19.116670000000056],
 [47.47592998085355, 19.16086997797754],
 [47.371176591998136, 19.139721716606513],
 [47.365480008239615, 19.091260032305144],
 [47.50488000000007, 19.062820000000045],
 [47.371176591998136, 19.139721716606513],
 [47.261479993938224, 19.08129000993506],
 [47.39937088503967, 18.97133697021127],
 [47.6085300127084, 19.19398997872787]]

In [50]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [51]:
bp_df['Latitude'] = df_coords['Latitude']
bp_df['Longitude'] = df_coords['Longitude']

In [52]:
print(bp_df.shape)
bp_df

(23, 3)


Unnamed: 0,District,Latitude,Longitude
0,► 2nd District of Budapest‎ (6 P),47.50745,19.06664
1,► 13th District of Budapest‎ (2 P),47.51014,19.01503
2,► 15th District of Budapest‎ (5 P),47.49896,19.05127
3,► 16th District of Budapest‎ (3 P),47.5367,19.03991
4,► Belváros-Lipótváros‎ (14 P),47.50312,19.05066
5,► Budafok-Tétény‎ (1 P),47.47206,19.03252
6,► Csepel‎ (2 P),47.436,19.09022
7,► Erzsébetváros‎ (5 P),47.50091,19.06936
8,"► Ferencváros‎ (1 C, 2 P)",47.47608,19.0771
9,► Hegyvidék‎ (5 P),47.49972,19.05508


In [53]:
bp_df.to_csv("bp_df.csv", index=False)

### Districts of Budapest superimposed on top of map

In [101]:
address = 'Budapest, Hungary'

geolocator = Nominatim(user_agent='h*******l@********.edu')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Coordinates of Budapest, Hungary {}, {}.'.format(latitude, longitude))

Coordinates of Budapest, Hungary 47.48138955, 19.14607278448202.


In [123]:
map_bp = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, district in zip(bp_df['Latitude'], bp_df['Longitude'], bp_df['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_bp)  
    
map_bp

In [124]:
map_bp.save('map_bp.html')

### Foursquare API

In [125]:
CLIENT_ID = '*******************************'
CLIENT_SECRET = '*******************************'
VERSION = '20180605'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: *******************************
CLIENT_SECRET:*******************************


In [146]:
radius = 50000
LIMIT = 500

venues = []

for lat, long, district in zip(bp_df['Latitude'], bp_df['Longitude'], bp_df['District']):
    
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            district,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [147]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['District', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2300, 7)


Unnamed: 0,District,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,► 2nd District of Budapest‎ (6 P),47.50745,19.06664,Corinthia Hotel Budapest,47.502754,19.066858,Hotel
1,► 2nd District of Budapest‎ (6 P),47.50745,19.06664,Művész ArtMozi,47.506692,19.061197,Indie Movie Theater
2,► 2nd District of Budapest‎ (6 P),47.50745,19.06664,Csak a jó sör!,47.501792,19.065552,Beer Bar
3,► 2nd District of Budapest‎ (6 P),47.50745,19.06664,Bors Gasztrobár,47.496714,19.063659,Soup Place
4,► 2nd District of Budapest‎ (6 P),47.50745,19.06664,Chez Dodo - Artisan Macarons & Café,47.500022,19.052267,Dessert Shop


In [148]:
venues_df.groupby(["District"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
► 13th District of Budapest‎ (2 P),100,100,100,100,100,100
► 15th District of Budapest‎ (5 P),100,100,100,100,100,100
► 16th District of Budapest‎ (3 P),100,100,100,100,100,100
► 2nd District of Budapest‎ (6 P),100,100,100,100,100,100
► Belváros-Lipótváros‎ (14 P),100,100,100,100,100,100
► Budafok-Tétény‎ (1 P),100,100,100,100,100,100
► Csepel‎ (2 P),100,100,100,100,100,100
► Erzsébetváros‎ (5 P),100,100,100,100,100,100
"► Ferencváros‎ (1 C, 2 P)",100,100,100,100,100,100
► Hegyvidék‎ (5 P),100,100,100,100,100,100


In [149]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['District', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.tail()

(2300, 7)


Unnamed: 0,District,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
2295,"► Zugló‎ (1 C, 3 P)",47.60853,19.19399,Gellért-hegy,47.486382,19.046946,Mountain
2296,"► Zugló‎ (1 C, 3 P)",47.60853,19.19399,Büfé Đăng Mười,47.488861,19.099514,Vietnamese Restaurant
2297,"► Zugló‎ (1 C, 3 P)",47.60853,19.19399,Leonidas Gyros,47.459305,19.141187,Greek Restaurant
2298,"► Zugló‎ (1 C, 3 P)",47.60853,19.19399,Tamp & Pull Espresso Bar,47.484482,19.061144,Coffee Shop
2299,"► Zugló‎ (1 C, 3 P)",47.60853,19.19399,Ennmann Japán Étterem,47.503142,19.039503,Sushi Restaurant


In [150]:
print('There are {} categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 85 categories.


In [151]:
venues_df['VenueCategory'].unique()[:85]

array(['Hotel', 'Indie Movie Theater', 'Beer Bar', 'Soup Place',
       'Dessert Shop', 'Church', 'Bakery', 'Coffee Shop', 'Theme Park',
       'Plaza', 'Pizza Place', 'Capitol Building', 'Park', 'Cocktail Bar',
       'Toy / Game Store', 'Theater', 'Castle', 'Island', 'Historic Site',
       'Thai Restaurant', 'Wine Shop', 'Zoo', 'Restaurant', 'Track',
       'Gourmet Shop', 'Mountain', 'Fountain', 'Garden', 'Spa',
       'Bookstore', 'Wine Bar', 'Sushi Restaurant',
       'Vietnamese Restaurant', 'Outdoor Sculpture', 'Music Venue',
       'Vegetarian / Vegan Restaurant', 'Mediterranean Restaurant',
       'Donut Shop', 'Field', 'Burger Joint', 'Jewish Restaurant',
       'Climbing Gym', 'Concert Hall', 'Gastropub', 'Trail',
       'Monument / Landmark', 'Tapas Restaurant', 'Poke Place',
       'Playground', 'Waterfront', 'Lake', 'Scenic Lookout',
       'Breakfast Spot', 'Café', 'Italian Restaurant',
       'Gym / Fitness Center', 'Hungarian Restaurant', 'Grocery Store',
       'Boar

In [154]:
"Bakery" in venues_df['VenueCategory'].unique()

True

### Create Bakery dataframe

In [155]:
bp_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

bp_onehot['District'] = venues_df['District'] 

fixed_columns = [bp_onehot.columns[-1]] + list(bp_onehot.columns[:-1])
bp_onehot = bp_onehot[fixed_columns]

print(bp_onehot.shape)
bp_onehot.head()

(2300, 86)


Unnamed: 0,District,Airport,Art Museum,Bakery,Bar,Beach,Beer Bar,Board Shop,Bookstore,Breakfast Spot,Brewery,Burger Joint,Café,Capitol Building,Castle,Church,Climbing Gym,Cocktail Bar,Coffee Shop,Concert Hall,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fast Food Restaurant,Field,Flower Shop,Forest,Fountain,Furniture / Home Store,Garden,Gastropub,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym / Fitness Center,Historic Site,Hotel,Hungarian Restaurant,Ice Cream Shop,Indie Movie Theater,Island,Italian Restaurant,Jewish Restaurant,Lake,Mediterranean Restaurant,Monument / Landmark,Mountain,Music Venue,Other Great Outdoors,Outdoor Sculpture,Park,Pizza Place,Playground,Plaza,Poke Place,Pool,Racetrack,Restaurant,Salon / Barbershop,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Ski Chairlift,Snack Place,Soup Place,Spa,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Track,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Waterfront,Wine Bar,Wine Shop,Winery,Zoo
0,► 2nd District of Budapest‎ (6 P),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,► 2nd District of Budapest‎ (6 P),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,► 2nd District of Budapest‎ (6 P),0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,► 2nd District of Budapest‎ (6 P),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,► 2nd District of Budapest‎ (6 P),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [156]:
bp_grouped = bp_onehot.groupby(["District"]).mean().reset_index()

print(bp_grouped.shape)
bp_grouped

(23, 86)


Unnamed: 0,District,Airport,Art Museum,Bakery,Bar,Beach,Beer Bar,Board Shop,Bookstore,Breakfast Spot,Brewery,Burger Joint,Café,Capitol Building,Castle,Church,Climbing Gym,Cocktail Bar,Coffee Shop,Concert Hall,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fast Food Restaurant,Field,Flower Shop,Forest,Fountain,Furniture / Home Store,Garden,Gastropub,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym / Fitness Center,Historic Site,Hotel,Hungarian Restaurant,Ice Cream Shop,Indie Movie Theater,Island,Italian Restaurant,Jewish Restaurant,Lake,Mediterranean Restaurant,Monument / Landmark,Mountain,Music Venue,Other Great Outdoors,Outdoor Sculpture,Park,Pizza Place,Playground,Plaza,Poke Place,Pool,Racetrack,Restaurant,Salon / Barbershop,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Ski Chairlift,Snack Place,Soup Place,Spa,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Track,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Waterfront,Wine Bar,Wine Shop,Winery,Zoo
0,► 13th District of Budapest‎ (2 P),0.0,0.01,0.07,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.1,0.0,0.0,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.01,0.03,0.02,0.04,0.01,0.01,0.03,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.02,0.06,0.02,0.02,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.07,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0
1,► 15th District of Budapest‎ (5 P),0.0,0.01,0.08,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.15,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.02,0.02,0.04,0.01,0.0,0.03,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.02,0.06,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.02,0.0,0.01,0.02,0.02,0.0,0.01
2,► 16th District of Budapest‎ (3 P),0.01,0.0,0.06,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.02,0.01,0.01,0.01,0.13,0.0,0.0,0.04,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.03,0.0,0.0,0.02,0.02,0.04,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.02,0.06,0.02,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01
3,► 2nd District of Budapest‎ (6 P),0.0,0.0,0.08,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.01,0.01,0.17,0.01,0.0,0.05,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.02,0.06,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.02,0.01,0.01,0.01,0.01,0.01,0.02,0.02,0.0,0.01,0.02,0.01,0.0,0.01
4,► Belváros-Lipótváros‎ (14 P),0.0,0.01,0.08,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.15,0.01,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.02,0.02,0.04,0.01,0.0,0.03,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.02,0.05,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.02,0.02,0.0,0.01,0.02,0.01,0.0,0.01
5,► Budafok-Tétény‎ (1 P),0.0,0.01,0.06,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.15,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.03,0.02,0.04,0.01,0.01,0.02,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.06,0.03,0.02,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.03,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.02,0.0,0.01,0.02,0.02,0.0,0.0
6,► Csepel‎ (2 P),0.0,0.0,0.05,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.17,0.01,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.02,0.01,0.04,0.02,0.04,0.0,0.01,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.04,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.04,0.0,0.01,0.02,0.02,0.0,0.0
7,► Erzsébetváros‎ (5 P),0.0,0.0,0.07,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.01,0.16,0.01,0.0,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.03,0.01,0.0,0.01,0.02,0.04,0.0,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.02,0.06,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.02,0.0,0.01,0.02,0.02,0.0,0.01
8,"► Ferencváros‎ (1 C, 2 P)",0.0,0.0,0.06,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.01,0.17,0.01,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.01,0.04,0.02,0.04,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.04,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.01,0.04,0.0,0.01,0.02,0.02,0.0,0.01
9,► Hegyvidék‎ (5 P),0.0,0.01,0.08,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.15,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.04,0.01,0.0,0.03,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.02,0.06,0.03,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.02,0.0,0.01,0.02,0.02,0.0,0.01


In [157]:
len(bp_grouped[bp_grouped["Bakery"] > 0])

23

### Bakery Dataframe

In [160]:
bp_bake = bp_grouped[["District","Bakery"]]

In [161]:
bp_bake.head()

Unnamed: 0,District,Bakery
0,► 13th District of Budapest‎ (2 P),0.07
1,► 15th District of Budapest‎ (5 P),0.08
2,► 16th District of Budapest‎ (3 P),0.06
3,► 2nd District of Budapest‎ (6 P),0.08
4,► Belváros-Lipótváros‎ (14 P),0.08


### k-means

In [163]:
kclusters = 3

bp_clustering = bp_bake.drop(["District"], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bp_clustering)

kmeans.labels_[0:10]

array([0, 0, 2, 0, 0, 2, 1, 0, 2, 0])

In [164]:
bp_merged = bp_bake.copy()

bp_merged["Cluster Labels"] = kmeans.labels_

In [165]:
bp_merged.rename(columns={"District": "District"}, inplace=True)
bp_merged.head()

Unnamed: 0,District,Bakery,Cluster Labels
0,► 13th District of Budapest‎ (2 P),0.07,0
1,► 15th District of Budapest‎ (5 P),0.08,0
2,► 16th District of Budapest‎ (3 P),0.06,2
3,► 2nd District of Budapest‎ (6 P),0.08,0
4,► Belváros-Lipótváros‎ (14 P),0.08,0


In [166]:
bp_merged = bp_merged.join(bp_df.set_index("District"), on="District")

print(bp_merged.shape)
bp_merged.head()

(23, 5)


Unnamed: 0,District,Bakery,Cluster Labels,Latitude,Longitude
0,► 13th District of Budapest‎ (2 P),0.07,0,47.51014,19.01503
1,► 15th District of Budapest‎ (5 P),0.08,0,47.49896,19.05127
2,► 16th District of Budapest‎ (3 P),0.06,2,47.5367,19.03991
3,► 2nd District of Budapest‎ (6 P),0.08,0,47.50745,19.06664
4,► Belváros-Lipótváros‎ (14 P),0.08,0,47.50312,19.05066


In [167]:
print(bp_merged.shape)
bp_merged.sort_values(["Cluster Labels"], inplace=True)
bp_merged

(23, 5)


Unnamed: 0,District,Bakery,Cluster Labels,Latitude,Longitude
0,► 13th District of Budapest‎ (2 P),0.07,0,47.51014,19.01503
1,► 15th District of Budapest‎ (5 P),0.08,0,47.49896,19.05127
20,"► Óbuda-Békásmegyer‎ (1 C, 3 P)",0.07,0,47.54158,19.04501
3,► 2nd District of Budapest‎ (6 P),0.08,0,47.50745,19.06664
4,► Belváros-Lipótváros‎ (14 P),0.08,0,47.50312,19.05066
17,► Terézváros‎ (1 P),0.08,0,47.50488,19.06282
7,► Erzsébetváros‎ (5 P),0.07,0,47.50091,19.06936
9,► Hegyvidék‎ (5 P),0.08,0,47.49972,19.05508
12,► Kőbánya‎ (8 P),0.07,0,47.5979,19.04389
16,► Soroksár‎ (1 P),0.05,1,47.36548,19.09126


In [174]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(bp_merged['Latitude'], bp_merged['Longitude'], bp_merged['District'], bp_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [170]:
map_clusters.save('map_clusters.html')

### Clusters

#### First cluster (cluster 0)

In [171]:
bp_merged.loc[bp_merged['Cluster Labels'] == 0]

Unnamed: 0,District,Bakery,Cluster Labels,Latitude,Longitude
0,► 13th District of Budapest‎ (2 P),0.07,0,47.51014,19.01503
1,► 15th District of Budapest‎ (5 P),0.08,0,47.49896,19.05127
20,"► Óbuda-Békásmegyer‎ (1 C, 3 P)",0.07,0,47.54158,19.04501
3,► 2nd District of Budapest‎ (6 P),0.08,0,47.50745,19.06664
4,► Belváros-Lipótváros‎ (14 P),0.08,0,47.50312,19.05066
17,► Terézváros‎ (1 P),0.08,0,47.50488,19.06282
7,► Erzsébetváros‎ (5 P),0.07,0,47.50091,19.06936
9,► Hegyvidék‎ (5 P),0.08,0,47.49972,19.05508
12,► Kőbánya‎ (8 P),0.07,0,47.5979,19.04389


#### Second cluster (cluster 1)

In [172]:
bp_merged.loc[bp_merged['Cluster Labels'] == 1]

Unnamed: 0,District,Bakery,Cluster Labels,Latitude,Longitude
16,► Soroksár‎ (1 P),0.05,1,47.36548,19.09126
15,► Rákosmente‎ (5 P),0.05,1,47.371177,19.139722
14,► Pestszentlőrinc-Pestszentimre‎ (3 P),0.05,1,47.47593,19.16087
13,► Pesterzsébet‎ (1 P),0.05,1,47.43333,19.11667
11,► Kispest‎ (3 P),0.05,1,47.4126,19.17311
6,► Csepel‎ (2 P),0.05,1,47.436,19.09022
21,► Újbuda‎ (5 P),0.05,1,47.371177,19.139722


#### Third cluster (cluster 2)

In [173]:
bp_merged.loc[bp_merged['Cluster Labels'] == 2]

Unnamed: 0,District,Bakery,Cluster Labels,Latitude,Longitude
10,► Józsefváros‎ (6 P),0.06,2,47.55353,18.7273
8,"► Ferencváros‎ (1 C, 2 P)",0.06,2,47.47608,19.0771
5,► Budafok-Tétény‎ (1 P),0.06,2,47.47206,19.03252
18,"► Várkerület‎ (1 C, 15 P)",0.06,2,47.399371,18.971337
19,"► Zugló‎ (1 C, 3 P)",0.06,2,47.60853,19.19399
2,► 16th District of Budapest‎ (3 P),0.06,2,47.5367,19.03991
22,"► Újpest‎ (3 C, 4 P)",0.06,2,47.26148,19.08129


### Bakeries in Budapest

Overall, bakeries are fairly evenly spread throughout Budapest. The highest was in the first cluster (cluster 0), although cluster 2 and cluster 1 were not too far behind. This indicates that while the districts in the second cluster may provide some potential, there is no vast distinction between any of the clusters, and indeed, any of the districts.
This said, there is certainly market capacity to add another bakery - for example, Óbuda-Békásmegyer and Csepel are not too divergent in terms of districts, not in their distance from the city centre, and so a place like Csepel may well benefit from the addition of a unique bakery.