In this project I look at the city of Prague in the Cyech Republic.


In [330]:
import pandas as pd
import numpy as np

import requests # library to handle requests
from pandas.io.json import json_normalize # tranforming json file into a pandas dataframe library
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

import folium # plotting library

# for the geocoding I used Opencage
! pip install opencage
from opencage.geocoder import OpenCageGeocode
from pprint import pprint
key = '9162917515ad475792f22cd5ca637f95'
geocoder = OpenCageGeocode(key)

# to remove the accents
! pip install Unidecode
import unidecode 

# to calculate distances between coordinates
!pip install haversine
import haversine as hs

print("All packages loaded!")



First I retrieve a list of Prague's neighborhood from Wikipedia.

In [379]:
data_raw = pd.read_html('https://cs.wikipedia.org/wiki/Seznam_katastr%C3%A1ln%C3%ADch_%C3%BAzem%C3%AD_v_Praze', header = 0)[0]
data_raw.head()

Unnamed: 0,Pořadí,Katastrální území,Sčítání 2001,Evidence 2011,Evidence 2014[2],Rozloha (ha),hustota zalidnění (obyv/km²)
0,1,Stodůlky,52 101,59 711,61 105,962,6 351
1,2,Žižkov,55 401,55 691,56 829,544,10 443
2,3,Chodov,58 140,54 659,53 771,743,7 237
3,4,Vinohrady,54 516,50 720,50 751,379,13 401
4,5,Vršovice,36 345,37 066,35 930,293,12 243


I only need the name of the neighborhoods so I can drop the rest and rename the column. 

In [380]:
neighborhoods_cs = data_raw[['Katastrální území']]
neighborhoods_cs.rename(columns={'Katastrální území':'Neighborhoods_cs'}, inplace=True)
neighborhoods_cs.head()

Unnamed: 0,Neighborhoods_cs
0,Stodůlky
1,Žižkov
2,Chodov
3,Vinohrady
4,Vršovice


Now I request the coordinates for each neighborhoods using the Opencage API and add them into the dataframe.
I use the opportunity to "standardize" the names of the neighborhoods by removing the accents.

In [381]:
# I first add the columns 
ls_lat = [] # this list will become the column 'latitude'
ls_lng = [] # this list will become the column 'longitude'
city_parts = [] # this list will become the column 'Neighborhoods' with the names without diacritics

# I define the function to retrieve the coordinate and add them into the respective lists
def geocoding(neigh):
    results = geocoder.geocode(neigh + ", Praha")
    lat = results[0]['geometry']['lat']
    lng = results[0]['geometry']['lng']       
    ls_lat.append(lat)
    ls_lng.append(lng)
    city_parts.append(unidecode.unidecode(neigh))

# I call the function for all districts 
[geocoding(x) for x in neighborhoods_cs['Neighborhoods_cs']]

# finally I create the columns from the lists
neighborhoods_cs['Neighborhood'] = np.asarray(city_parts)
neighborhoods_cs['latitude'] = np.asarray(ls_lat)
neighborhoods_cs['longitude'] = np.asarray(ls_lng)

neighborhoods_cs.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Neighborhoods_cs,Neighborhood,latitude,longitude
0,Stodůlky,Stodulky,50.048307,14.312404
1,Žižkov,Zizkov,50.081054,14.454917
2,Chodov,Chodov,50.032843,14.501643
3,Vinohrady,Vinohrady,50.075359,14.436394
4,Vršovice,Vrsovice,50.071885,14.472665


This looks good and I can now remove the first column.

In [383]:
neighborhoods = neighborhoods_cs.drop(columns='Neighborhoods_cs')

Now I retrieve the coordinate for the city and use them to visualize the Prague and its districts.

In [384]:
results = geocoder.geocode('Prague')
lat_p = results[0]['geometry']['lat']
lng_p = results[0]['geometry']['lng']  
print('The geograpical coordinates of Prague are {}, {}.'.format(lat_p, lng_p))

The geograpical coordinates of Prague are 50.0874654, 14.4212535.


In [385]:
# I use Folium for the map visualization
map_prague = folium.Map(location=[lat_p, lng_p], zoom_start=11)
for lat, lng, neigh in zip(neighborhoods['latitude'], neighborhoods['longitude'], neighborhoods['Neighborhood']):
    label = '{}'.format(neigh)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='cadetblue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_prague)  
    
map_prague

*There is 1 small anomalie: the neighborhood "Tocna" got somehow located about 30km south-west of the city. I could fix this with a more specific request (i.e. "Tocna, Praha 12"), but since I will focus on the city center and Tocna is located about 14km away, I will just ignore it.*

While there are certainly excellent opportunities in the outer parts of the city, we will focus here on the larger center by setting a maximum distance of 5km from the center.

For this I start by calculating the distance from each neighborhood to the center. For this purpose the Haversine distance calculation seems to be the prefered method and an excellent library exists for this purpose.

In [386]:
distances = []
origin = (lat_p, lng_p)  # latitude and longitude from Prague
for lat, lng in zip(neighborhoods['latitude'], neighborhoods['longitude']):
    point = (lat, lng)
    distances.append(hs.haversine(origin, point))
neighborhoods['Distance[km]'] = np.asarray(distances)

neighborhoods.head()

Unnamed: 0,Neighborhood,latitude,longitude,Distance[km]
0,Stodulky,50.048307,14.312404,8.906001
1,Zizkov,50.081054,14.454917,2.505453
2,Chodov,50.032843,14.501643,8.356001
3,Vinohrady,50.075359,14.436394,1.726055
4,Vrsovice,50.071885,14.472665,4.057035


Now I select the row within the 5km perimeter.

In [396]:
prague = neighborhoods[neighborhoods['Distance[km]'] < 5].reset_index(drop=True)
prague

Unnamed: 0,Neighborhood,latitude,longitude,Distance[km]
0,Zizkov,50.081054,14.454917,2.505453
1,Vinohrady,50.075359,14.436394,1.726055
2,Vrsovice,50.071885,14.472665,4.057035
3,Holesovice,50.100616,14.437384,1.860683
4,Smichov,50.074946,14.404844,1.819018
5,Liben,50.106103,14.476626,4.460459
6,Nusle,50.05653,14.442035,3.745966
7,Nove Mesto,50.075938,14.419946,1.285223
8,Brevnov,50.085209,14.371788,3.538113
9,Dejvice,50.102556,14.391797,2.688993


That leaves us with 21 neighborhoods. Now, let's see the map again. I can now remove the Distance column.

In [397]:
prague.drop(columns='Distance[km]', inplace=True)

In [409]:
# I use Folium for the map visualization
map_prague = folium.Map(location=[lat_p, lng_p], zoom_start=13)
for lat, lng, neigh in zip(Prague['latitude'], Prague['longitude'], Prague['Neighborhood']):
    label = '{}'.format(neigh)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=30,
        popup=label,
        color='black',
        fill=True,
        fill_color='white',
        fill_opacity=0.3,
        parse_html=False).add_to(map_prague)  
    
map_prague

On this map it is easy to see that all the neighborhoods close to the center are represented and located properly.

Now I will analys the venues of these neighborhoods using the Foursquare API. 
For this I start by setting up my credentials.

In [390]:
CLIENT_ID = '1PMTYLQUAGOCHDVMVGB3ENLJUADY1QRTYGZK1FTAMBSNEJ4N' # your Foursquare ID
CLIENT_SECRET = '3MU3CON3UCSAXM3SEA4WXNZS1ZF30YSWC1J3W5VCHQQK1AVD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

For the request, I decide to use a radius of 500m which is about the radius of the marker in the map above. This prevents from leaving out important parts especially of the old town. However, this means that I will certainly have to deal with some duplicates as it can be seen with the large overlapping areas.

In [410]:
# I define a function to create the get url
def built_url(neigh_lat,neigh_long,radius=500):
    LIMIT=100
    radius=500
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neigh_lat, 
    neigh_long, 
    radius, 
    LIMIT)
    return url

In [406]:
# I define the function to get the nearby venues of each neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [411]:
prague_venues = getNearbyVenues(names=prague['Neighborhood'],
                                   latitudes=prague['latitude'],
                                   longitudes=prague['longitude']
                                  )

Zizkov
Vinohrady
Vrsovice
Holesovice
Smichov
Liben
Nusle
Nove Mesto
Brevnov
Dejvice
Bubenec
Kosire
Troja
Podoli
Karlin
Stare Mesto
Stresovice
Mala Strana
Vysehrad
Hradcany
Radlice
Josefov


In [412]:
print(prague_venues.shape)
prague_venues.head()

(84, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Zizkov,50.081054,14.454917,Chilli & Lime,50.080432,14.454244,Asian Restaurant
1,Vinohrady,50.075359,14.436394,Náměstí Míru,50.075136,14.436554,Plaza
2,Vinohrady,50.075359,14.436394,The Craft: Food & Beers,50.076067,14.43589,Burger Joint
3,Vinohrady,50.075359,14.436394,Párek v rohlíku – Ladislav Červený,50.075042,14.437258,Hot Dog Joint
4,Vinohrady,50.075359,14.436394,Vanille,50.074865,14.436825,Ice Cream Shop


In [414]:
prague_venues[['Neighborhood', 'Venue']].groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Bubenec,6
Dejvice,2
Holesovice,3
Hradcany,8
Josefov,1
Karlin,6
Kosire,3
Liben,2
Mala Strana,15
Nove Mesto,4


In [416]:
print('There are {} uniques categories.'.format(len(prague_venues['Venue Category'].unique())))

There are 55 uniques categories.


In [417]:
# one hot encoding
prague_onehot = pd.get_dummies(prague_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
prague_onehot['Neighborhood'] = prague_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [prague_onehot.columns[-1]] + list(prague_onehot.columns[:-1])
toronto_onehot = prague_onehot[fixed_columns]

prague_onehot.head()

Unnamed: 0,Art Museum,Arts & Crafts Store,Asian Restaurant,Austrian Restaurant,Badminton Court,Bakery,Bar,Beer Bar,Belgian Restaurant,Burger Joint,...,Scenic Lookout,Shoe Store,Snack Place,Tennis Court,Theater,Tram Station,Vietnamese Restaurant,Wine Bar,Wine Shop,Neighborhood
0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Zizkov
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Vinohrady
2,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,Vinohrady
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Vinohrady
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Vinohrady


In [418]:
prague_onehot.shape

(84, 56)

In [419]:
prague_grouped = prague_onehot.groupby('Neighborhood').mean().reset_index()
prague_grouped

Unnamed: 0,Neighborhood,Art Museum,Arts & Crafts Store,Asian Restaurant,Austrian Restaurant,Badminton Court,Bakery,Bar,Beer Bar,Belgian Restaurant,...,Roof Deck,Scenic Lookout,Shoe Store,Snack Place,Tennis Court,Theater,Tram Station,Vietnamese Restaurant,Wine Bar,Wine Shop
0,Bubenec,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667
1,Dejvice,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Holesovice,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Hradcany,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Josefov,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Karlin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Kosire,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
7,Liben,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
8,Mala Strana,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0
9,Nove Mesto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0


In [422]:
prague_grouped.shape

(18, 56)

In [423]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [425]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = prague_grouped['Neighborhood']

for ind in np.arange(prague_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(prague_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bubenec,Wine Shop,Vietnamese Restaurant,French Restaurant,Bakery,Czech Restaurant,Plaza,Champagne Bar,Gift Shop,Gastropub,Food & Drink Shop
1,Dejvice,Café,Coffee Shop,Wine Shop,Champagne Bar,Gourmet Shop,Gift Shop,Gastropub,French Restaurant,Food & Drink Shop,Farmers Market
2,Holesovice,Farmers Market,Park,Café,Wine Shop,Coffee Shop,Gourmet Shop,Gift Shop,Gastropub,French Restaurant,Food & Drink Shop
3,Hradcany,Restaurant,Art Museum,Outdoor Sculpture,Coffee Shop,Plaza,Scenic Lookout,Pedestrian Plaza,Drugstore,Dog Run,Castle
4,Josefov,Belgian Restaurant,Wine Shop,Champagne Bar,Gourmet Shop,Gift Shop,Gastropub,French Restaurant,Food & Drink Shop,Farmers Market,Drugstore
5,Karlin,Pet Café,Italian Restaurant,Playground,Plaza,Restaurant,Café,Drugstore,Farmers Market,Castle,Czech Restaurant
6,Kosire,Tennis Court,Badminton Court,Modern European Restaurant,Wine Shop,Champagne Bar,Gift Shop,Gastropub,French Restaurant,Food & Drink Shop,Farmers Market
7,Liben,Indian Restaurant,Tennis Court,Wine Shop,Champagne Bar,Gift Shop,Gastropub,French Restaurant,Food & Drink Shop,Farmers Market,Drugstore
8,Mala Strana,Hotel,Historic Site,Gift Shop,Café,Pub,Restaurant,Beer Bar,Museum,Bakery,Wine Bar
9,Nove Mesto,Pizza Place,Tram Station,Drugstore,Plaza,Castle,Gift Shop,Gastropub,French Restaurant,Food & Drink Shop,Farmers Market
