# Capstone Project - The Battle of the Neighborhoods

This document provides the details of my final peer reviewed assignment for the IBM Data Science Professional Certificate program – Coursera Capstone.

## Introduction: Business Problem of knowledge of neighborhood

Everyone who wants to buy an apartment wonders in which district it would be best to live. The proximity of parks, interesting restaurants or sports halls make making decisions extremely difficult.


Warsaw is the capital of Poland and the Masovian Voivodeship, the largest city of the country, located in its central-eastern part. It is currently divided into 18 districts, which are characterized by their interesting places. Adjacent to Śródmieście Mokotów is a district of Warsaw, which is largely considered a prestigious location. If you need to move a lot and have all service outlets and institutions at hand, and you are not afraid of the hustle and bustle of the city, it's best to live in the city center. Ursynów is a good choice for active people - both professionally and physically. Praga Południe is a district of Warsaw that apartment owners are more and more willing to choose. Old buildings from the times of the Polish People's Republic, which are a large part of Wola, may not encourage you to live in this part of the city, but it is here that the easiest way to find a place for which - given the possibility of reaching the center of the capital in a few or several minutes - we will pay really little. Ochota is currently the most densely populated district of Warsaw. Białołęka can be called a district of young people, because mainly young families are moving here. In addition to Białołęka, dynamic landscape changes also apply to the city of Wilanów, located on the opposite, southern side. Old Bielany with the characteristic Confederation Square awaits fans of quiet streets built up with small houses and villas. If you love to look into small cafes with an almost home-like atmosphere, you dream about frequent walks or runs in the park, and in addition you feel like a fish in the water among the streets inhabited by villas with small gardens, take your steps towards Stary Żoliborz. Bemowo, like Białołęka, is eagerly chosen by young people, especially families with young children. Targówek is still not one of the most-chosen districts, but most likely it will change soon, as - as in the case of Bemów - the construction of further stations of the second metro line is planned here. In terms of area, Wawer ranks first in the capital. Low real estate prices in Ursynów will certainly encourage those who are not afraid of a large distance from the city center and taking about half an hour by rail. Praga Północ could be described as a district with a soul - it was here that one of the oldest buildings in Warsaw, spared during World War II, was preserved, and here you can still hear the old Warsaw dialect, and the inhabitants are often more open and direct than busy Warsaw residents from Śródmieście. Rembertów is the least populated district of Warsaw, which can also boast of a wealth of green areas.


The multitude of districts encourages them to be categorized and to identify homogeneous groups that will have common characteristics. <b>Let's do it!</b>

## Import libraries

In [1]:
import geocoder
from bs4 import BeautifulSoup
import requests
import pandas as pd

## Get data

Based on definition of our problem, factors that will help to categorize Boroughs are:

    All venues of neighborhood
    Top venue categoeries in neighborhood
    Basic statistics of the neighborhood like density etc.

Following data sources will be needed to generate the required information:

    Wikipedia page of Warsaw neighborhoods
    All venues or neigborhood area through Foursqueare API
    Geolocator to get coordinates of neighborhoods

We will use the explore function to get the most common venue categories in each neighborhood of Warsaw. We will also cluster neighborhoods to give similarity information.


In [2]:
http = "https://pl.wikipedia.org/wiki/Podzia%C5%82_administracyjny_Warszawy"
source_txt = requests.get(http).text
Warsaw_data = BeautifulSoup(source_txt, 'lxml')
content = Warsaw_data.find('div', class_='mw-parser-output')
main_table = content.table.tbody
main_table

<tbody><tr>
<th>Dzielnica
</th>
<th data-sort-type="number">Liczba mieszkańców<br/><small>(1.01.2019)</small><sup class="reference" id="cite_ref-GUS_2019_1-0"><a href="#cite_note-GUS_2019-1">[1]</a></sup>
</th>
<th data-sort-type="number">Gęstość zaludnienia<br/>[osób/km²]<small>(1.01.2019)</small><sup class="reference" id="cite_ref-GUS_2019_1-1"><a href="#cite_note-GUS_2019-1">[1]</a></sup>
</th>
<th data-sort-type="number">Powierzchnia<br/>[km²]<sup class="reference" id="cite_ref-GUS_2019_1-2"><a href="#cite_note-GUS_2019-1">[1]</a></sup>
</th></tr>
<tr>
<td><a href="/wiki/Mokot%C3%B3w" title="Mokotów">Mokotów</a></td>
<td>217 683</td>
<td>6146</td>
<td>35,42
</td></tr>
<tr>
<td><a href="/wiki/Praga-Po%C5%82udnie" title="Praga-Południe">Praga-Południe</a></td>
<td>179 836</td>
<td>8036</td>
<td>22,38
</td></tr>
<tr>
<td><a href="/wiki/Ursyn%C3%B3w" title="Ursynów">Ursynów</a></td>
<td>150 668</td>
<td>3441</td>
<td>43,79
</td></tr>
<tr>
<td><a href="/wiki/Wola_(Warszawa)" title="Wola

In [3]:
column_names = ['Borough','Residents','Density', 'Area']
warsaw = pd.DataFrame(columns = column_names)
Borough = 0
Residents = 0
Density = 0
Area = 0
warsaw

Unnamed: 0,Borough,Residents,Density,Area


In [4]:
for tr in main_table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            Borough = td.text
            i = i + 1
        elif i == 1:
            Residents = td.text
            i = i + 1
        elif i == 2: 
            Density = td.text.strip('\n').replace(']','')
            i = i + 1
        elif i == 3: 
            Area = td.text.strip('\n').replace(']','')            
            
    warsaw = warsaw.append({'Borough': Borough,'Residents': Residents,'Density': Density, 'Area': Area},ignore_index=True)
warsaw

Unnamed: 0,Borough,Residents,Density,Area
0,0,0,0,0
1,Mokotów,217 683,6146,3542
2,Praga-Południe,179 836,8036,2238
3,Ursynów,150 668,3441,4379
4,Wola,140 958,7319,1926
5,Bielany,131 910,4079,3234
6,Targówek,124 279,5131,2422
7,Bemowo,123 932,4967,2495
8,Śródmieście,115 395,7411,1557
9,Białołęka,124 125,1699,7304


## Clean dataframe

In [5]:
warsaw = warsaw[warsaw.Borough!='Not assigned']
warsaw = warsaw[warsaw.Borough!= 0]
warsaw.reset_index(drop = True, inplace = True)
i = 0
for i in range(0,warsaw.shape[0]):
    if warsaw.iloc[i][2] == 'Not assigned':
        warsaw.iloc[i][2] = warsaw.iloc[i][1]
        i = i+1

In [6]:
print(warsaw.shape)

(18, 4)


## Add the latitude and longitude coordinates to the dataframe

In [7]:
def get_latlng(place):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, '+place+', Poland'.format())
        print(g)
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [8]:
Boroughs = warsaw['Borough']
coords = [ get_latlng(Borough) for Borough in Boroughs.tolist() ]

<[OK] Arcgis - Geocode [Mokotów, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Praga-Południe, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Ursynów, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Wola, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Bielany, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Targówek, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Bemowo, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Śródmieście, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Białołeka, Masovian Voivodeship]>
<[OK] Arcgis - Geocode [Ochota, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Wawer, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Praga Północ, Masovian Voivodeship]>
<[OK] Arcgis - Geocode [Ursus, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Żoliborz, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Włochy, Warszawa, Woj. Mazowieckie]>
<[OK] Arcgis - Geocode [Wilanów, Warszawa, Woj. Mazowieckie]>
<[OK] Arcg

In [9]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
warsaw['Latitude'] = df_coords['Latitude']
warsaw['Longitude'] = df_coords['Longitude']

### Check result

In [10]:
warsaw[warsaw.Borough == 'Bemowo']

Unnamed: 0,Borough,Residents,Density,Area,Latitude,Longitude
6,Bemowo,123 932,4967,2495,52.25269,20.91244


In [11]:
df = warsaw.copy()
df.head(10)

Unnamed: 0,Borough,Residents,Density,Area,Latitude,Longitude
0,Mokotów,217 683,6146,3542,52.19539,21.0085
1,Praga-Południe,179 836,8036,2238,52.23633,21.0984
2,Ursynów,150 668,3441,4379,52.15418,21.03786
3,Wola,140 958,7319,1926,52.23903,20.97123
4,Bielany,131 910,4079,3234,52.27697,20.94778
5,Targówek,124 279,5131,2422,52.27726,21.06594
6,Bemowo,123 932,4967,2495,52.25269,20.91244
7,Śródmieście,115 395,7411,1557,52.2356,21.01037
8,Białołęka,124 125,1699,7304,52.32127,20.97204
9,Ochota,82 774,8516,972,52.21314,20.97069


## Explore and cluster the places in Warsaw

In [12]:
import numpy as np
import json # library to handle JSON files

In [13]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

In [14]:
address = 'Warsaw'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Warsaw are {}, {}.'.format(latitude, longitude))

# create map of Toronto using latitude and longitude values
map_warsaw = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough in zip(df['Latitude'], df['Longitude'], df['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_warsaw)  
    
map_warsaw

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Warsaw are 52.2337172, 21.0714111288323.


## Define functions

In [15]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
# Function for most common venue
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### CREDENTIALS 

In [18]:
CLIENT_ID = 
CLIENT_SECRET = 
VERSION = "20191222"

LIMIT = 1000000
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)

## Check the result

In [19]:
results = requests.get(url).json()
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(12)

Unnamed: 0,name,categories,lat,lng
0,Il Caminetto,Italian Restaurant,52.233311,21.067531
1,Olimpia Fitness Club,Gym,52.229818,21.068551
2,Auto Reduta - Serwis BMW,Racetrack,52.23389,21.067156


## Use function to all Boroughs in Warsaw

In [20]:
warsaw_venues = getNearbyVenues(names=df['Borough'],latitudes=df['Latitude'], longitudes=df['Longitude'])

Mokotów
Praga-Południe
Ursynów
Wola
Bielany
Targówek
Bemowo
Śródmieście
Białołęka
Ochota
Wawer
Praga-Północ
Ursus
Żoliborz
Włochy
Wilanów
Wesoła
Rembertów


In [21]:
warsaw_venues.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mokotów,52.19539,21.0085,Bezownia,52.192392,21.01017,Dessert Shop
1,Mokotów,52.19539,21.0085,Tam i z powrotem,52.195179,21.004044,Café
2,Mokotów,52.19539,21.0085,Efes Kebab,52.196242,21.013851,Kebab Restaurant
3,Mokotów,52.19539,21.0085,II Ogród Jordanowski,52.196446,21.014244,Playground
4,Mokotów,52.19539,21.0085,Locale,52.192363,21.003873,Italian Restaurant


In [22]:
hood_venue = warsaw_venues[['Borough', 'Venue']].copy()

## Show most venues per Borough

In [23]:
hood_venues = hood_venue.groupby(['Borough']).size().reset_index(name='Venues')
hood_venues.sort_values(by=['Venues'])
hood_venues.plot.bar(x='Borough', y='Venues', rot=90,figsize=(20,10))

<matplotlib.axes._subplots.AxesSubplot at 0xcebb748>

In [24]:
df_merge = pd.merge(df, hood_venues, on='Borough')
df_merge

Unnamed: 0,Borough,Residents,Density,Area,Latitude,Longitude,Venues
0,Mokotów,217 683,6146,3542,52.19539,21.0085,16
1,Praga-Południe,179 836,8036,2238,52.23633,21.0984,9
2,Ursynów,150 668,3441,4379,52.15418,21.03786,30
3,Wola,140 958,7319,1926,52.23903,20.97123,9
4,Bielany,131 910,4079,3234,52.27697,20.94778,7
5,Targówek,124 279,5131,2422,52.27726,21.06594,5
6,Bemowo,123 932,4967,2495,52.25269,20.91244,6
7,Śródmieście,115 395,7411,1557,52.2356,21.01037,100
8,Białołęka,124 125,1699,7304,52.32127,20.97204,4
9,Ochota,82 774,8516,972,52.21314,20.97069,19


In [25]:
# one hot encoding
warsaw_onehot = pd.get_dummies(warsaw_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
warsaw_onehot['Borough'] = df['Borough'] 

# move neighborhood column to the first column
fixed_columns = [warsaw_onehot.columns[-1]] + list(warsaw_onehot.columns[:-1])
downtown_onehot = warsaw_onehot[fixed_columns]

warsaw_onehot.head()

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,Beer Bar,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Zoo Exhibit,Borough
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mokotów
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Praga-Południe
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Ursynów
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Wola
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Bielany


In [26]:
warsaw_grouped = warsaw_onehot.groupby('Borough').mean().reset_index()
warsaw_grouped.head()

Unnamed: 0,Borough,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,...,Theater,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Zoo Exhibit
0,Bemowo,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Białołęka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bielany,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Mokotów,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ochota,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Define 5 top venues 5 top venues of neighborhood

In [27]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Borough'] = warsaw_grouped['Borough']

for ind in np.arange(warsaw_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(warsaw_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Bemowo,Farmers Market,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store
1,Białołęka,Modern European Restaurant,Zoo Exhibit,Farmers Market,Czech Restaurant,Department Store
2,Bielany,Italian Restaurant,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store
3,Mokotów,Dessert Shop,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store
4,Ochota,Café,Zoo Exhibit,Creperie,Czech Restaurant,Department Store


In [28]:
warsaw_grouped.shape

(18, 119)

In [29]:
warsaw_venues.shape

(334, 7)

## Cluster Neighborhoods

In [30]:
# set number of clusters
kclusters = 5

warsaw_grouped_clustering = warsaw_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(warsaw_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 3, 0, 2, 2, 0, 4, 2, 0])

In [31]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

warsaw_merged = df

#add latitude/longitude for each neighborhood
warsaw_merged = warsaw_merged.join(neighborhoods_venues_sorted.set_index('Borough'), on='Borough')

In [32]:
warsaw_merged.head()

Unnamed: 0,Borough,Residents,Density,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Mokotów,217 683,6146,3542,52.19539,21.0085,0,Dessert Shop,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store
1,Praga-Południe,179 836,8036,2238,52.23633,21.0984,2,Café,Zoo Exhibit,Creperie,Czech Restaurant,Department Store
2,Ursynów,150 668,3441,4379,52.15418,21.03786,0,Kebab Restaurant,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store
3,Wola,140 958,7319,1926,52.23903,20.97123,0,Playground,Zoo Exhibit,Convenience Store,Cupcake Shop,Czech Restaurant
4,Bielany,131 910,4079,3234,52.27697,20.94778,3,Italian Restaurant,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store


In [33]:
most_common_in_cluster = warsaw_merged[['Cluster Labels', '1st Most Common Venue']].copy()
most_common_in_cluster = most_common_in_cluster.groupby(['Cluster Labels', '1st Most Common Venue' ]).size().reset_index(name='Venues')
most_common_in_cluster

Unnamed: 0,Cluster Labels,1st Most Common Venue,Venues
0,0,Bistro,1
1,0,Cupcake Shop,1
2,0,Dessert Shop,1
3,0,Farmers Market,1
4,0,Grocery Store,1
5,0,Indian Restaurant,1
6,0,Kebab Restaurant,1
7,0,Modern European Restaurant,1
8,0,Park,1
9,0,Playground,1


## Name clusters

In [34]:
most_common_in_cluster

Unnamed: 0,Cluster Labels,1st Most Common Venue,Venues
0,0,Bistro,1
1,0,Cupcake Shop,1
2,0,Dessert Shop,1
3,0,Farmers Market,1
4,0,Grocery Store,1
5,0,Indian Restaurant,1
6,0,Kebab Restaurant,1
7,0,Modern European Restaurant,1
8,0,Park,1
9,0,Playground,1


In [35]:
#Giving cluster names after analyzing data
cluster_name = {}
cluster_name[0] = "Free time"
cluster_name[1] = "Transport"
cluster_name[2] = "Cafe"
cluster_name[3] = "Italian food"
cluster_name[4] = "Sport"

## Results and discussion

In [36]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, den in zip(warsaw_merged['Latitude'], warsaw_merged['Longitude'], warsaw_merged['Borough'], warsaw_merged['Cluster Labels'],warsaw_merged['Density']):
    label = folium.Popup(str(poi) + ' Most of venue type: ' + str(cluster_name[cluster]) + '. Density: ' + str(den), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [37]:
warsaw_merged.set_index("Borough", inplace=True)
warsaw_merged.head()

Unnamed: 0_level_0,Residents,Density,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Mokotów,217 683,6146,3542,52.19539,21.0085,0,Dessert Shop,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store
Praga-Południe,179 836,8036,2238,52.23633,21.0984,2,Café,Zoo Exhibit,Creperie,Czech Restaurant,Department Store
Ursynów,150 668,3441,4379,52.15418,21.03786,0,Kebab Restaurant,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store
Wola,140 958,7319,1926,52.23903,20.97123,0,Playground,Zoo Exhibit,Convenience Store,Cupcake Shop,Czech Restaurant
Bielany,131 910,4079,3234,52.27697,20.94778,3,Italian Restaurant,Zoo Exhibit,Fast Food Restaurant,Czech Restaurant,Department Store


In [38]:
warsaw_merged.loc['Ursynów']

Residents                             150 668
Density                                  3441
Area                                    43,79
Latitude                              52.1542
Longitude                             21.0379
Cluster Labels                              0
1st Most Common Venue        Kebab Restaurant
2nd Most Common Venue             Zoo Exhibit
3rd Most Common Venue    Fast Food Restaurant
4th Most Common Venue        Czech Restaurant
5th Most Common Venue        Department Store
Name: Ursynów, dtype: object

## Discussion

Warsaw is a big city, but for effective data analysis we would need even more places in Foursquare, which unfortunately is not enough. Many districts have a small number of different types of places. By adjusting the k-medium grouping, we could get even better results. We can distinguish a large cluster of districts where people can spend their free time.

## Conclusion

This kind of data analysis can be helpful to choose the neighborhood you want to live in. Data should be used from the application, and the Python code for this analysis should be provided as a microservice.

<i>Thank you.

Robert Kowalczyk<i/>

Created For: COURSERA IBM Applied Data Science Capstone Project