# Prague subway surroungings analysis

<b>First we import all required libraries. </b>

In [128]:
import pandas as pd
import numpy as np

import json
import requests
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt 
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline 

from bs4 import BeautifulSoup

print('All libraries are imported!')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

All libraries are imported!


<b>Let's define our credentials. We will use the most recent version of Forsquare API.</b>

In [129]:
CLIENT_ID = 'ZC05O2WH3LQIQ2TECENIZJU1I24QKUQ432QP2XZSQQPY3RME' 
CLIENT_SECRET = 'G34BK2ZPI3NUEWE0UDJKQ2HXZMLCOGHCEWURQCO3CDAA3KUF' 
VERSION = '20200525' 

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: ZC05O2WH3LQIQ2TECENIZJU1I24QKUQ432QP2XZSQQPY3RME
CLIENT_SECRET:G34BK2ZPI3NUEWE0UDJKQ2HXZMLCOGHCEWURQCO3CDAA3KUF


<b>Since our place of interest is Prague, let's obtain it's coordinates and create a map of Prague:</b>

In [130]:
address = 'Prague'
geolocator = Nominatim(user_agent='foursquare_agent')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Coordinates of Prague are: ', latitude, longitude)

Coordinates of Prague are:  50.0874654 14.4212535


In [131]:
map_prague = folium.Map(location=[latitude, longitude], zoom_start=12, min_zoom=10, max_zoom=15)
map_prague

It is quite small, isn't it? Yes, comparing to New York and London, Prague is tiny.\
However, it has it's own subway system.\
\
<i>Since I'm not American, I'd prefer to use international world 'metro' instead of 'subway' or 'undergroud' in my further description of the process</i>

### 1. Get and process the data

<b>We will need the list of Prague metro stations.</b>\
We can easily access it on Wikipedia using BeautifulSoup.

In [132]:
url = 'https://en.wikipedia.org/wiki/List_of_Prague_Metro_stations'
page = requests.get(url)

In [133]:
soup = BeautifulSoup(page.content, 'html.parser')
tbl = soup.find("table")

In [134]:
df_metro = pd.read_html(str(tbl))[0]
df_metro.head(10)

Unnamed: 0,Name,Photo,District,Named after,Line,Opened,Notes
0,Anděl(Moskevská),,Smíchov,"a building named ""U zlatého anděla"" (""At a gol...",B,"November 2, 1985","formerly Moskevská, after Moscow"
1,Bořislavka,,Červený vrch,the surrounding suburb,A,"April 6, 2015",
2,Budějovická,,Krč,the nearby square,C,"May 9, 1974",
3,Černý Most,,Černý Most,the surrounding district,B,"November 8, 1998",means Black Bridge in EnglishLocated above-gro...
4,Českomoravská,,Vysočany,—,B,"November 22, 1990",the original planned name was Zápotockého (or ...
5,Chodov (Budovatelů),,Jižní Město,the surrounding suburb,C,"November 11, 1980",formerly Budovatelů
6,Dejvická (Leninova),,Dejvice,the surrounding suburb,A,"August 12, 1978","formerly Leninova, after Vladimir Lenin"
7,Depo Hostivař,,Strašnice,the metro depot,A,"May 26, 2006",
8,Flora,,"Vinohrady, Žižkov",—,A,"December 19, 1980",
9,Florenc (Sokolovská),,Karlín,a nearby intersection,BC,"May 9, 1974 CNovember 2, 1985 B",formerly Sokolovskálocated at the central bus ...


As you can see, this list has a lot of information.\
Let's make it more useful for us.

<b>First, let's drop all columns that we don't need:</b>

In [135]:
df_metro.drop(['Photo', 'Named after', 'Opened', 'Notes'], axis=1, inplace=True)
df_metro.head()

Unnamed: 0,Name,District,Line
0,Anděl(Moskevská),Smíchov,B
1,Bořislavka,Červený vrch,A
2,Budějovická,Krč,C
3,Černý Most,Černý Most,B
4,Českomoravská,Vysočany,B


In [136]:
df_metro.shape

(58, 3)

Now we have a list of 58 stations, the districts they belong to, and the letter that specifies the metro line.\
<i>Small hint: Prague has only 3 metro lines: A, B, and C.</i>\
\
However, some of stations have their old names (like they used to have during Soviet Era).\
It's not convinient for us, since Foursquare will not be able to find them.

<b>Let's get rid of the old names of the stations:</b>

In [137]:
df_metro['Name']=df_metro['Name'].apply(lambda x: str(x).split('(')[0])
df_metro.head()

Unnamed: 0,Name,District,Line
0,Anděl,Smíchov,B
1,Bořislavka,Červený vrch,A
2,Budějovická,Krč,C
3,Černý Most,Černý Most,B
4,Českomoravská,Vysočany,B


And since some stations are named after the districts they are located in, let's specify that it's really a station, not district:

In [138]:
df_metro[['Name']]=df_metro[['Name']] + ' ' + '(metro), Prague'
df_metro.head()

Unnamed: 0,Name,District,Line
0,"Anděl (metro), Prague",Smíchov,B
1,"Bořislavka (metro), Prague",Červený vrch,A
2,"Budějovická (metro), Prague",Krč,C
3,"Černý Most (metro), Prague",Černý Most,B
4,"Českomoravská (metro), Prague",Vysočany,B


<b>Now let's obtain geocoordinates of our stations.</b>\
First, let's create a new empty dataframe to store our new data:

In [139]:
column_names = ['Name', 'Latitude', 'Longitude'] 
df_latlng = pd.DataFrame(columns=column_names)
df_latlng

Unnamed: 0,Name,Latitude,Longitude


Second, we write a loop that will extract geocoordinates using Foursquare and store them into the new dataframe:

In [140]:
for name in zip(df_metro['Name']):
    address = name
    geolocator = Nominatim(user_agent='foursquare_agent')
    location = geolocator.geocode(address)
    if location == None:
            latitude = 'NO LATITUDE'
            longitude = 'NO LONGITUDE'
    else: 
            latitude = location.latitude
            longitude = location.longitude
    df_latlng = df_latlng.append({'Name': name,
                                'Latitude': latitude,
                                'Longitude': longitude}, ignore_index = True)

df_latlng

Unnamed: 0,Name,Latitude,Longitude
0,"(Anděl (metro), Prague,)",50.0685,14.4041
1,"(Bořislavka (metro), Prague,)",50.0986,14.3645
2,"(Budějovická (metro), Prague,)",50.044,14.4494
3,"(Černý Most (metro), Prague,)",50.1091,14.5776
4,"(Českomoravská (metro), Prague,)",50.1066,14.493
5,"(Chodov (metro), Prague,)",50.0309,14.4913
6,"(Dejvická (metro), Prague,)",50.1002,14.3917
7,"(Depo Hostivař (metro), Prague,)",NO LATITUDE,NO LONGITUDE
8,"(Flora (metro), Prague,)",50.0777,14.4613
9,"(Florenc (metro), Prague,)",50.0911,14.4397


Look! Some coordinates were not found! Let's add them manually (I used data from corresponding pages of Wikipedia):

In [141]:
df_latlng['Latitude'][7], df_latlng['Longitude'][7] = '50.0758', '14.5155'
df_latlng['Latitude'][51], df_latlng['Longitude'][51] = '50.0727', '14.4911'
df_latlng.head(10)

Unnamed: 0,Name,Latitude,Longitude
0,"(Anděl (metro), Prague,)",50.0685,14.4041
1,"(Bořislavka (metro), Prague,)",50.0986,14.3645
2,"(Budějovická (metro), Prague,)",50.044,14.4494
3,"(Černý Most (metro), Prague,)",50.1091,14.5776
4,"(Českomoravská (metro), Prague,)",50.1066,14.493
5,"(Chodov (metro), Prague,)",50.0309,14.4913
6,"(Dejvická (metro), Prague,)",50.1002,14.3917
7,"(Depo Hostivař (metro), Prague,)",50.0758,14.5155
8,"(Flora (metro), Prague,)",50.0777,14.4613
9,"(Florenc (metro), Prague,)",50.0911,14.4397


<b>Now let's unite both dataframes to have the full data:</b>

In [142]:
column_names = ['Name', 'District', 'Line', 'Latitude', 'Longitude'] 
df_metro_latlng = pd.DataFrame(columns=column_names)
df_metro_latlng

Unnamed: 0,Name,District,Line,Latitude,Longitude


In [143]:
df_metro_latlng['Name'] = df_metro['Name']
df_metro_latlng['District'] = df_metro['District']
df_metro_latlng['Line'] = df_metro['Line']
df_metro_latlng['Latitude'] = df_latlng['Latitude']
df_metro_latlng['Longitude'] = df_latlng['Longitude']
df_metro = df_metro_latlng
df_metro.head(10)

Unnamed: 0,Name,District,Line,Latitude,Longitude
0,"Anděl (metro), Prague",Smíchov,B,50.0685,14.4041
1,"Bořislavka (metro), Prague",Červený vrch,A,50.0986,14.3645
2,"Budějovická (metro), Prague",Krč,C,50.044,14.4494
3,"Černý Most (metro), Prague",Černý Most,B,50.1091,14.5776
4,"Českomoravská (metro), Prague",Vysočany,B,50.1066,14.493
5,"Chodov (metro), Prague",Jižní Město,C,50.0309,14.4913
6,"Dejvická (metro), Prague",Dejvice,A,50.1002,14.3917
7,"Depo Hostivař (metro), Prague",Strašnice,A,50.0758,14.5155
8,"Flora (metro), Prague","Vinohrady, Žižkov",A,50.0777,14.4613
9,"Florenc (metro), Prague",Karlín,BC,50.0911,14.4397


And it's seems that some stations belong to more than one district.\
It means that from these station you can reach both districts within 5 minutes.\
So let's leave just the first ones:

In [144]:
df_metro['District']=df_metro['District'].apply(lambda x: str(x).split(',')[0])
df_metro.head(10)

Unnamed: 0,Name,District,Line,Latitude,Longitude
0,"Anděl (metro), Prague",Smíchov,B,50.0685,14.4041
1,"Bořislavka (metro), Prague",Červený vrch,A,50.0986,14.3645
2,"Budějovická (metro), Prague",Krč,C,50.044,14.4494
3,"Černý Most (metro), Prague",Černý Most,B,50.1091,14.5776
4,"Českomoravská (metro), Prague",Vysočany,B,50.1066,14.493
5,"Chodov (metro), Prague",Jižní Město,C,50.0309,14.4913
6,"Dejvická (metro), Prague",Dejvice,A,50.1002,14.3917
7,"Depo Hostivař (metro), Prague",Strašnice,A,50.0758,14.5155
8,"Flora (metro), Prague",Vinohrady,A,50.0777,14.4613
9,"Florenc (metro), Prague",Karlín,BC,50.0911,14.4397


<b>Prague metro map.</b>\
Now we can create a Prague metro map and paint different lines with different colors: 

In [145]:
for lat, lng, label, line in zip(df_metro['Latitude'], df_metro['Longitude'], df_metro['Name'], df_metro['Line']):
    label = folium.Popup(label, parse_html=True)
    if line == 'A' or line == 'AB':
        linecolor = 'green'
    elif line == 'B':
        linecolor = 'yellow'
    else:
        linecolor = 'red'
    folium.CircleMarker(
        [float(lat), float(lng)],
        radius=5,
        popup=label,
        color=linecolor,
        fill=True,
        fill_color='white',
        fill_opacity=0.7,
        parse_html=False).add_to(map_prague)  
    
map_prague

Since it looks completely like a City Metro Map, we can say that our previous work was correct.\
<i>Don't be misguided by the red dot right on the way of the yellow line - yes, this how Prague stations are really located, no mistake.</i>\
Also, using this map we can always decide, which part of Prague we prefer geographically while choosing an apartment to rent.

<b>And we also can understand, which district of Prague is more "metroed':</b>

In [146]:
df_metro.groupby('District').count().sort_values('Line', ascending=False).head(6)

Unnamed: 0_level_0,Name,Line,Latitude,Longitude
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
New Town,6,6,6,6
Jižní Město,4,4,4,4
Vinohrady,4,4,4,4
Nusle,3,3,3,3
Karlín,3,3,3,3
Vysočany,3,3,3,3


### 2. Explore nearby venues

Now we can explore what we have around each station. \
\
<b>First, let's define a function that helps us get our nearby venues:</b>

In [147]:
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Name', 
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<b>Second, we use it.</b>\
Our limit is only 50, and our radius is 300, since - I told you - Prague is tiny.\
<i>The closest distance between two stations is only 300 meters (Museum and Hlavni Nadrazi). So 500  meters is really a lot.</i>

In [148]:
limit=50

prague_venues = getNearbyVenues(names=df_metro['Name'],
                                   latitudes=df_metro['Latitude'],
                                   longitudes=df_metro['Longitude']
                                  )
prague_venues

Anděl (metro), Prague
Bořislavka (metro), Prague
Budějovická (metro), Prague
Černý Most (metro), Prague
Českomoravská (metro), Prague
Chodov  (metro), Prague
Dejvická  (metro), Prague
Depo Hostivař (metro), Prague
Flora (metro), Prague
Florenc  (metro), Prague
Háje  (metro), Prague
Hlavní nádraží (metro), Prague
Hloubětín (metro), Prague
Hradčanská (metro), Prague
Hůrka (metro), Prague
I.P.Pavlova (metro), Prague
Invalidovna (metro), Prague
Jinonice (metro), Prague
Jiřího z Poděbrad (metro), Prague
Kačerov (metro), Prague
Karlovo náměstí (metro), Prague
Kobylisy (metro), Prague
Kolbenova (metro), Prague
Křižíkova (metro), Prague
Ládví (metro), Prague
Letňany (metro), Prague
Luka (metro), Prague
Lužiny (metro), Prague
Malostranská (metro), Prague
Můstek (metro), Prague
Muzeum (metro), Prague
Nádraží Holešovice  (metro), Prague
Nádraží Veleslavín (metro), Prague
Náměstí Míru (metro), Prague
Náměstí Republiky (metro), Prague
Národní třída (metro), Prague
Nemocnice Motol (metro), Prague
No

Unnamed: 0,Name,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Anděl (metro), Prague",50.0685,14.4041,Taro,50.069704,14.405722,Vietnamese Restaurant
1,"Anděl (metro), Prague",50.0685,14.4041,Kavárna co hledá jméno,50.069694,14.403952,Café
2,"Anděl (metro), Prague",50.0685,14.4041,Pauwel Kwak Bierhuis,50.069041,14.405920,Beer Bar
3,"Anděl (metro), Prague",50.0685,14.4041,Dům jógy,50.068954,14.402089,Yoga Studio
4,"Anděl (metro), Prague",50.0685,14.4041,Pastva,50.069794,14.405476,Vegetarian / Vegan Restaurant
...,...,...,...,...,...,...,...
1376,"Zličín (metro), Prague",50.0538,14.2908,Cafe Livello,50.053730,14.287311,Café
1377,"Zličín (metro), Prague",50.0538,14.2908,Celio,50.053844,14.287279,Men's Store
1378,"Zličín (metro), Prague",50.0538,14.2908,Humanic,50.053703,14.288791,Shoe Store
1379,"Zličín (metro), Prague",50.0538,14.2908,Fruitisimo,50.053571,14.286947,Juice Bar


<b>Let's group our venues by station:</b>

In [149]:
prague_venues.groupby('Name').count().sort_values('Venue', ascending=False).head(10)

Unnamed: 0_level_0,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Národní třída (metro), Prague",50,50,50,50,50,50
"Náměstí Republiky (metro), Prague",50,50,50,50,50,50
"Jiřího z Poděbrad (metro), Prague",50,50,50,50,50,50
"Staroměstská (metro), Prague",50,50,50,50,50,50
"Můstek (metro), Prague",50,50,50,50,50,50
"Náměstí Míru (metro), Prague",50,50,50,50,50,50
"Karlovo náměstí (metro), Prague",50,50,50,50,50,50
"Pankrác (metro), Prague",50,50,50,50,50,50
"Chodov (metro), Prague",50,50,50,50,50,50
"Flora (metro), Prague",49,49,49,49,49,49


In [150]:
print('There are {} uniques categories.'.format(len(prague_venues['Venue Category'].unique())))

There are 244 uniques categories.


Not a surprise, that central stations have a lot of venues.\
<i>But, of course, not everyone can afford a rent in the very center - in this terms, Prague is not different to other capital cities.</i>\
\
<b>Let's process our data further to find out which types of venues are most common for each station:</b>

In [151]:
prague_onehot = pd.get_dummies(prague_venues[['Venue Category']], prefix="", prefix_sep="")

prague_onehot['Name'] = prague_venues['Name'] 

fixed_columns = [prague_onehot.columns[-1]] + list(prague_onehot.columns[:-1])
prague_onehot = prague_onehot[fixed_columns]

prague_onehot.head()

Unnamed: 0,Name,ATM,Accessories Store,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Vineyard,Volleyball Court,Warehouse Store,Watch Shop,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,"Anděl (metro), Prague",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Anděl (metro), Prague",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Anděl (metro), Prague",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Anděl (metro), Prague",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
4,"Anděl (metro), Prague",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [152]:
prague_onehot.shape

(1381, 245)

In [153]:
prague_grouped = prague_onehot.groupby('Name').mean().reset_index()
prague_grouped

Unnamed: 0,Name,ATM,Accessories Store,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Vineyard,Volleyball Court,Warehouse Store,Watch Shop,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,"Anděl (metro), Prague",0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0
1,"Bořislavka (metro), Prague",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Budějovická (metro), Prague",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Chodov (metro), Prague",0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02
4,"Dejvická (metro), Prague",0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Depo Hostivař (metro), Prague",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Flora (metro), Prague",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0
7,"Florenc (metro), Prague",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,...,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0
8,"Hlavní nádraží (metro), Prague",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
9,"Hloubětín (metro), Prague",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [154]:
prague_grouped.shape

(58, 245)

<b>Now we can list top 5 venues for each station:</b>

In [155]:
for element in prague_grouped['Name']:
    print("----"+element+"----")
    temp = prague_grouped[prague_grouped['Name'] == element].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(5))
    print('\n')

----Anděl (metro), Prague----
                   venue  freq
0                  Hotel  0.12
1                    Pub  0.09
2  Vietnamese Restaurant  0.09
3                   Café  0.06
4                Brewery  0.03


----Bořislavka (metro), Prague----
              venue  freq
0       Pizza Place  0.18
1  Czech Restaurant  0.09
2         Newsstand  0.09
3          Pharmacy  0.09
4     Metro Station  0.09


----Budějovická (metro), Prague----
               venue  freq
0               Café  0.08
1     Clothing Store  0.08
2  Electronics Store  0.08
3   Czech Restaurant  0.06
4          Drugstore  0.06


----Chodov  (metro), Prague----
                    venue  freq
0          Clothing Store  0.12
1             Coffee Shop  0.10
2          Cosmetics Shop  0.06
3  Furniture / Home Store  0.04
4                  Bakery  0.04


----Dejvická  (metro), Prague----
         venue  freq
0          ATM  0.08
1        Hotel  0.08
2  Coffee Shop  0.08
3        Plaza  0.04
4    Cafeteria  0.04


-

<i>Looking at in, one could already say where they'd like to live, and which places look... not so attractive.\
Just compare Depo Hostivar and, for example, Flora.</i>

<b>Let's create a dataframe with top 10 most common venues for each station:</b>

In [156]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [157]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

columns = ['Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

prague_venues_sorted = pd.DataFrame(columns=columns)
prague_venues_sorted['Name'] = prague_grouped['Name']

for ind in np.arange(prague_grouped.shape[0]):
    prague_venues_sorted.iloc[ind, 1:] = return_most_common_venues(prague_grouped.iloc[ind, :], num_top_venues)

prague_venues_sorted.head()

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Anděl (metro), Prague",Hotel,Vietnamese Restaurant,Pub,Café,Cocktail Bar,Sushi Restaurant,Other Nightlife,Bike Shop,Gastropub,Beer Store
1,"Bořislavka (metro), Prague",Pizza Place,Insurance Office,Pharmacy,Czech Restaurant,Bus Stop,Café,Metro Station,Bakery,Newsstand,Dessert Shop
2,"Budějovická (metro), Prague",Electronics Store,Clothing Store,Café,Drugstore,Czech Restaurant,Bakery,Hotel Bar,Health Food Store,Bookstore,Fast Food Restaurant
3,"Chodov (metro), Prague",Clothing Store,Coffee Shop,Cosmetics Shop,Shoe Store,Bakery,Gift Shop,Salad Place,Furniture / Home Store,Bookstore,Boutique
4,"Dejvická (metro), Prague",ATM,Hotel,Coffee Shop,Café,Electronics Store,Gourmet Shop,Bookstore,Drugstore,Food Stand,Tram Station


<i>So, even looking at 5 first stations we can - probably - notice that Andel and Dejvicka look more like stations where you'd rather get together with you friends, and Budejovicka looks more like a station you want to go to after a long business day, since it's highely likely that no bar visitors will disturb you during the night sleep.</i>

### 3. Clustering

Now it's time to cluster our data.\
<b>We define the number of clusters and prepare our data for clustering:</b>

In [158]:
k_number = 5
prague_grouped_clustering = prague_grouped.drop('Name', 1)

kmeans = KMeans(n_clusters=k_number, random_state=0).fit(prague_grouped_clustering)

kmeans.labels_[0:10] 

array([1, 3, 1, 1, 1, 0, 1, 1, 1, 3])

In [159]:
prague_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
prague_merged = df_metro

In [160]:
prague_merged = prague_merged.join(prague_venues_sorted.set_index('Name'), on='Name')
prague_merged.head()

Unnamed: 0,Name,District,Line,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Anděl (metro), Prague",Smíchov,B,50.0685,14.4041,1,Hotel,Vietnamese Restaurant,Pub,Café,Cocktail Bar,Sushi Restaurant,Other Nightlife,Bike Shop,Gastropub,Beer Store
1,"Bořislavka (metro), Prague",Červený vrch,A,50.0986,14.3645,3,Pizza Place,Insurance Office,Pharmacy,Czech Restaurant,Bus Stop,Café,Metro Station,Bakery,Newsstand,Dessert Shop
2,"Budějovická (metro), Prague",Krč,C,50.044,14.4494,1,Electronics Store,Clothing Store,Café,Drugstore,Czech Restaurant,Bakery,Hotel Bar,Health Food Store,Bookstore,Fast Food Restaurant
3,"Černý Most (metro), Prague",Černý Most,B,50.1091,14.5776,1,Smoke Shop,Italian Restaurant,Convenience Store,Gourmet Shop,Warehouse Store,Pet Store,Clothing Store,Zoo Exhibit,Exhibit,Farmers Market
4,"Českomoravská (metro), Prague",Vysočany,B,50.1066,14.493,3,Bridal Shop,Gym / Fitness Center,Czech Restaurant,Pizza Place,Restaurant,Roof Deck,Organic Grocery,Buffet,Music Store,Tea Room


<b>Now we finally can cluster our data:</b>

In [161]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(k_number)
ys = [i + x + (i*x)**2 for i in range(k_number)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(prague_merged['Latitude'], prague_merged['Longitude'], prague_merged['Name'], prague_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lon)],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7
    ).add_to(map_clusters)
       
map_clusters

It's not obvious yet, but as we can judge by colors, you don't need to pay extra to live on a station with a lot of interesting places nearby. So let's explore it!

### 4. Explore the clusters

<b>Cluster 1</b>\
It seems that here we have a really quiete places.\
It's actually good for customers who works a lot and parties in downtown and doesn't want to pay a lot for rent.

In [162]:
prague_merged.loc[prague_merged['Cluster Labels'] == 0, prague_merged.columns[[0] + list(range(5, prague_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,"Depo Hostivař (metro), Prague",0,Bus Stop,Bakery,Metro Station,Diner,Tram Station,Food & Drink Shop,Fast Food Restaurant,Fish Market,Flower Shop,Zoo Exhibit
17,"Jinonice (metro), Prague",0,Supermarket,Bus Stop,Gym / Fitness Center,Salad Place,Bakery,Auto Garage,Cafeteria,Coffee Shop,Hotel,Sushi Restaurant
25,"Letňany (metro), Prague",0,Bakery,Bus Stop,Wine Shop,Convenience Store,Coffee Shop,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food & Drink Shop
36,"Nemocnice Motol (metro), Prague",0,Bus Stop,Coffee Shop,Bistro,Food & Drink Shop,Zoo Exhibit,Exhibit,Fried Chicken Joint,French Restaurant,Fountain,Food Truck
46,"Roztyly (metro), Prague",0,Coffee Shop,Restaurant,Fast Food Restaurant,Bakery,Mobile Phone Shop,Dog Run,Metro Station,Bus Stop,Food Truck,Curling Ice


<b>Cluster 2</b>\
The biggest one.\
In fact, choosing one of these stations, a client will benefit a lot from having almost everything close at hand.\
However, they should be ready that the station is going to be very crowdy, especially on pick hours.

In [163]:
prague_merged.loc[prague_merged['Cluster Labels'] == 1, prague_merged.columns[[0] + list(range(5, prague_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Anděl (metro), Prague",1,Hotel,Vietnamese Restaurant,Pub,Café,Cocktail Bar,Sushi Restaurant,Other Nightlife,Bike Shop,Gastropub,Beer Store
2,"Budějovická (metro), Prague",1,Electronics Store,Clothing Store,Café,Drugstore,Czech Restaurant,Bakery,Hotel Bar,Health Food Store,Bookstore,Fast Food Restaurant
3,"Černý Most (metro), Prague",1,Smoke Shop,Italian Restaurant,Convenience Store,Gourmet Shop,Warehouse Store,Pet Store,Clothing Store,Zoo Exhibit,Exhibit,Farmers Market
5,"Chodov (metro), Prague",1,Clothing Store,Coffee Shop,Cosmetics Shop,Shoe Store,Bakery,Gift Shop,Salad Place,Furniture / Home Store,Bookstore,Boutique
6,"Dejvická (metro), Prague",1,ATM,Hotel,Coffee Shop,Café,Electronics Store,Gourmet Shop,Bookstore,Drugstore,Food Stand,Tram Station
8,"Flora (metro), Prague",1,Café,Coffee Shop,Dessert Shop,Clothing Store,Pub,Mobile Phone Shop,Furniture / Home Store,Multiplex,Sushi Restaurant,Frozen Yogurt Shop
9,"Florenc (metro), Prague",1,Hotel,Vegetarian / Vegan Restaurant,Coffee Shop,Burger Joint,Vietnamese Restaurant,Bistro,Flower Shop,Gaming Cafe,Street Art,Breakfast Spot
11,"Hlavní nádraží (metro), Prague",1,Hotel,Indie Movie Theater,Design Studio,Clothing Store,Sporting Goods Shop,Cosmetics Shop,Wine Bar,Hungarian Restaurant,Fountain,Food Stand
13,"Hradčanská (metro), Prague",1,Café,Pub,Coffee Shop,Eastern European Restaurant,Italian Restaurant,Mexican Restaurant,Fried Chicken Joint,Dance Studio,Spa,Modern European Restaurant
15,"I.P.Pavlova (metro), Prague",1,Coffee Shop,Beer Bar,Café,Pub,Soup Place,Flower Shop,Candy Store,Middle Eastern Restaurant,Tea Room,Bistro


<b>Cluster 3</b>\
These stations are 'far from the madding crowd', but have a lot of pubs and restaurants. \
Very suitable for students: not too expensive, not too fancy, but have both places for partying and placesof household goods.

In [164]:
prague_merged.loc[prague_merged['Cluster Labels'] == 2, prague_merged.columns[[0] + list(range(5, prague_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,"Rajská zahrada (metro), Prague",2,Bus Stop,Pub,Pizza Place,Farmers Market,Metro Station,Clothing Store,Hobby Shop,Gym / Fitness Center,Bridge,Ice Cream Shop
47,"Skalka (metro), Prague",2,Bus Stop,Kebab Restaurant,Chinese Restaurant,Pharmacy,Dog Run,Hotel,Pub,Furniture / Home Store,Food & Drink Shop,Electronics Store


<b>Cluster 4</b>\
There stations can offer a lot for everyone, but it seems like they is very suitable for young families with kids: parks, playgrounds, and sport venues are more common here than in other clusters. 
At the same time, you have pubs, coffee shops and everything an active person may need, whether they have kids or not.

In [165]:
prague_merged.loc[prague_merged['Cluster Labels'] == 3, prague_merged.columns[[0] + list(range(5, prague_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Bořislavka (metro), Prague",3,Pizza Place,Insurance Office,Pharmacy,Czech Restaurant,Bus Stop,Café,Metro Station,Bakery,Newsstand,Dessert Shop
4,"Českomoravská (metro), Prague",3,Bridal Shop,Gym / Fitness Center,Czech Restaurant,Pizza Place,Restaurant,Roof Deck,Organic Grocery,Buffet,Music Store,Tea Room
12,"Hloubětín (metro), Prague",3,Drugstore,Pub,Asian Restaurant,Shopping Mall,Fast Food Restaurant,Food & Drink Shop,Supermarket,Metro Station,Tram Station,Bed & Breakfast
14,"Hůrka (metro), Prague",3,Plaza,Indian Restaurant,Dessert Shop,Czech Restaurant,Bus Stop,Pet Store,Health Food Store,Pizza Place,Market,Playground
19,"Kačerov (metro), Prague",3,Restaurant,Pharmacy,Czech Restaurant,Fountain,Deli / Bodega,Farmers Market,Hotel,Bus Stop,Zoo Exhibit,Fast Food Restaurant
24,"Ládví (metro), Prague",3,Pizza Place,Asian Restaurant,Clothing Store,Coffee Shop,Café,Brewery,Shopping Mall,Farmers Market,Metro Station,Grocery Store
30,"Muzeum (metro), Prague",3,Chinese Restaurant,Electronics Store,Bar,Cocktail Bar,Restaurant,Grocery Store,Discount Store,Vietnamese Restaurant,Art Museum,Food & Drink Shop
32,"Nádraží Veleslavín (metro), Prague",3,ATM,Coffee Shop,Outdoor Supply Store,Metro Station,Shoe Repair,Tourist Information Center,Grocery Store,Tram Station,Bakery,Drugstore
37,"Nové Butovice (metro), Prague",3,Insurance Office,Arts & Crafts Store,Bistro,Metro Station,Electronics Store,Outdoor Sculpture,Drugstore,Coffee Shop,Asian Restaurant,Gastropub
38,"Opatov (metro), Prague",3,Bakery,Pizza Place,Print Shop,Metro Station,Grocery Store,Chinese Restaurant,Farmers Market,Bed & Breakfast,Shopping Plaza,Restaurant


<b>Cluster 5</b>\
Looks good for families who don't want their neighborhood to be very crowdy. It's better for people who prefer calm, comfortable life, and prefer weekend family restaurant lunches to beer and wine meet-ups with friends.

In [166]:
prague_merged.loc[prague_merged['Cluster Labels'] == 4, prague_merged.columns[[0] + list(range(5, prague_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,"Háje (metro), Prague",4,Electronics Store,Drugstore,Pizza Place,Grocery Store,Bakery,Pet Store,Gym,Turkish Restaurant,Restaurant,Dessert Shop
21,"Kobylisy (metro), Prague",4,Bakery,Drugstore,Restaurant,Gym,Pet Store,Pharmacy,Buffet,Italian Restaurant,Diner,Exhibit
26,"Luka (metro), Prague",4,Gym,Bakery,Casino,Italian Restaurant,Farmers Market,Fast Food Restaurant,Gastropub,Restaurant,Beer Store,Grocery Store
43,"Prosek (metro), Prague",4,Bakery,Drugstore,Restaurant,Gym,Pet Store,Pharmacy,Buffet,Italian Restaurant,Diner,Exhibit
