# A Battle of Cities
Analyzing similarities between neighborhoods in two cities.  
This projects reuses the core of the previous exercise grouping Toronto neighborhoods and extends it so it can be used in a particular use case: an employee moving from city A to city B, who wants to know which neighborhoods in the destination city are similar to the neighborhoods in their city of origin.  

I have used in this code as city of origin Darmstadt (Germany), and destination: Santiago (Chile). The way the code is written, however, allows you to change the input cities to the ones of your liking, as long as they have an entry in the Geonames website.

Find a more detailed explanation in the final report here:  
https://github.com/raulcano/Coursera_Capstone/blob/master/Final%20Report.pdf

---
# Code

In [3]:
# we install the necessary packages
!pip install geocoder
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes

In [4]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from IPython.display import display_html
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')


Libraries imported.


## Obtain the country codes
The purpose of this step is to get the country codes as they are used in the website itself, so we can build a query that will contain the right code.  
This step will be necessary to prepare the input variables.

In [7]:
url = 'https://www.geonames.org/postal-codes/'
html = requests.get(url).content
soup = BeautifulSoup(html, 'lxml')
country_codes = soup.find('select', {"name" : "country"})

list_countries = country_codes.findAll('option')
country_codes_text = []
for l in list_countries:
    country_codes_text.append("Country: " + l.text + " | Code: " + l['value'])
print(country_codes_text[:10]) # we print the first 10 entries to see how the codes look like

['Country:  all countries | Code: ', 'Country:  Algeria | Code: DZ', 'Country:  American Samoa | Code: AS', 'Country:  Andorra | Code: AD', 'Country:  Argentina | Code: AR', 'Country:  Australia | Code: AU', 'Country:  Austria | Code: AT', 'Country:  Bangladesh | Code: BD', 'Country:  Belarus | Code: BY', 'Country:  Belgium | Code: BE']


## Prepare the input variables
The latitude and longitude will be filled automatically later on in the code. The country name will be used later to get the coordinates.

In [48]:
cities = [
    #{'city' : 'Cologne', 'country_code' : 'DE', 'country' : 'Germany', 'latitude' : '', 'longitude' : ''},
    {'city' : 'Darmstadt', 'country_code' : 'DE', 'country' : 'Germany', 'latitude' : '', 'longitude' : ''},
    {'city' : 'Santiago', 'country_code' : 'CL', 'country' : 'Chile', 'latitude' : '', 'longitude' : ''},
]

## Create a dataframe with postal codes, coordinate and neighborhoods
The dataframe shall be a merge of all the data from all the input cities

In [49]:
dataframes = []
for city in cities:
    url = 'https://www.geonames.org/postalcode-search.html?q=' + city['city'] + '&country=' + city['country_code']
    html = requests.get(url).content
    soup = BeautifulSoup(html, 'lxml')
    #tables.append(soup.find('table', class_='restable'))
    table = soup.find('table', class_='restable')
    table_df = pd.read_html(str(table),header=0)[0]
    # add city name to the table
    table_df['City'] = city['city']
    dataframes.append(table_df)

df = pd.concat(dataframes)
df = df.reset_index(drop=True)
df.head()


Unnamed: 0.1,Admin1,Admin2,Admin3,Admin4,City,Code,Country,Place,Unnamed: 0
0,Hessen,Regierungsbezirk Darmstadt,"Frankfurt am Main, Stadt","Frankfurt am Main, Stadt",Darmstadt,60437.0,Germany,Frankfurt am Main,1.0
1,,,,,Darmstadt,,,50.192/8.675,
2,Hessen,Regierungsbezirk Darmstadt,"Darmstadt, Wissenschaftsstadt","Darmstadt, Wissenschaftsstadt",Darmstadt,64291.0,Germany,Darmstadt,2.0
3,,,,,Darmstadt,,,49.911/8.657,
4,Hessen,Regierungsbezirk Darmstadt,"Darmstadt, Wissenschaftsstadt","Darmstadt, Wissenschaftsstadt",Darmstadt,64297.0,Germany,Darmstadt,3.0


The columns we want in our dataframe are  
PostalCode, Borough, Neighborhood, Latitude, Longitude, City, Country

Therefore, we have to cleanup the data towards that goal.  
The mapping between the columns we get from the HTML and the desired dataframe is as follows:  
  * __PostalCode__ <-- Code
  * __Borough__ <-- Place
  * __Neighborhood__ <-- Place
  * __Latitude__ <-- First part of the cell including the coordinates, before the slash sign
  * __Longitude__ <-- Second part of the cell including the coordinates, after the slash sign

In [50]:
# rename 
df.rename(columns={'Code':'PostalCode','Place':'Borough'}, inplace=True)

# create a column neighborhood
df['Neighborhood'] = df['Borough']

# add columns "Latitude" and "Longitude" , initialized at 0
df['Latitude'] = 0.0 
df['Longitude'] = 0.0

df['PostalCode'].fillna(0, inplace=True)
convert_dict = {'PostalCode' : int, 'Latitude' : float, 'Longitude' : float, 'Borough' : str, 'Neighborhood' : str, 'City' : str, 'Country' : str}
df = df.astype(convert_dict) 

# change the order of the columns and retain only the necessary ones
df = df[['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude', 'City', 'Country']]
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,City,Country
0,60437,Frankfurt am Main,Frankfurt am Main,0.0,0.0,Darmstadt,Germany
1,0,50.192/8.675,50.192/8.675,0.0,0.0,Darmstadt,
2,64291,Darmstadt,Darmstadt,0.0,0.0,Darmstadt,Germany
3,0,49.911/8.657,49.911/8.657,0.0,0.0,Darmstadt,
4,64297,Darmstadt,Darmstadt,0.0,0.0,Darmstadt,Germany


In [51]:
df.shape

(464, 7)

In [52]:
# more data wrangling. The lat/long is in the row following to each postal code
for i in range(0,len(df.index),2) :
    lat_lon = str(df.loc[i+1, 'Borough']).split("/")
    lat = lat_lon[0]
    lon = lat_lon[1]
    df.at[i, 'Latitude'] = lat
    df.at[i, 'Longitude'] = lon

df = df.iloc[::2].reset_index(drop=True)  # remove every second row
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,City,Country
0,60437,Frankfurt am Main,Frankfurt am Main,50.192,8.675,Darmstadt,Germany
1,64291,Darmstadt,Darmstadt,49.911,8.657,Darmstadt,Germany
2,64297,Darmstadt,Darmstadt,49.819,8.645,Darmstadt,Germany
3,64283,Darmstadt,Darmstadt,49.872,8.648,Darmstadt,Germany
4,64289,Darmstadt,Darmstadt,49.897,8.681,Darmstadt,Germany


## Use geopy library to get the latitude and longitude values of the cities
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>cities_explorer</em>, as shown below.

In [53]:
geolocator = Nominatim(user_agent="cities_explorer")
for city in cities:
    address = city['city'] + ', ' + city['country']
    location = geolocator.geocode(address)
    city['latitude'] = location.latitude
    city['longitude'] = location.longitude
    print('The geograpical coordinates of {} are {}, {}.'.format(city['city'], city['latitude'], city['longitude']))

The geograpical coordinates of Darmstadt are 49.872775, 8.651177.
The geograpical coordinates of Santiago are -33.4377968, -70.6504451.


In [54]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,City,Country
0,60437,Frankfurt am Main,Frankfurt am Main,50.192,8.675,Darmstadt,Germany
1,64291,Darmstadt,Darmstadt,49.911,8.657,Darmstadt,Germany
2,64297,Darmstadt,Darmstadt,49.819,8.645,Darmstadt,Germany
3,64283,Darmstadt,Darmstadt,49.872,8.648,Darmstadt,Germany
4,64289,Darmstadt,Darmstadt,49.897,8.681,Darmstadt,Germany


### We visualize cities neighborhoods superimposed on top

In [55]:
# rename the dataframe
neighborhoods = df
# neighborhoods = df.iloc[:3] # limit the amount of rows to test the code

# create maps using latitude and longitude values
maps = []
for city in cities:
    map = folium.Map(location=[city['latitude'], city['longitude']], zoom_start=15)
    # add markers to map
    for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map)
    maps.append(map)

In [56]:
maps[0]

In [57]:
maps[1]

## Foursquare
Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [85]:
# The code was removed by Watson Studio for sharing.

In [86]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Let's create a function to repeat the same process to all the neighborhoods passed as arguments

In [87]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We run the function to get the venues in all neighborhoods in our dataframe

In [88]:
venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Frankfurt am Main
Darmstadt
Darmstadt
Darmstadt
Darmstadt
Darmstadt
Darmstadt
Frankfurt am Main
Mossautal
Griesheim
Oberursel
Frankfurt am Main
Darmstadt
Darmstadt
Butzbach
Friedrichsdorf
Dreieich
Bad Nauheim
Bensheim
Eschborn
Dieburg
Babenhausen
Münster
Modautal
Bickenbach
Otzberg
Reichelsheim
Wiesbaden
Wiesbaden
Wiesbaden
Wiesbaden
Wiesbaden
Hattersheim
Pfungstadt
Weiterstadt
Reinheim
Ober-Ramstadt
Groß-Umstadt
Fischbachtal
Frankfurt am Main
Offenbach
Offenbach
Offenbach
Offenbach
Langen
Rödermark
Hanau
Rodenbach
Altenstadt
Oestrich-Winkel
Trebur
Wiesbaden
Wiesbaden
Schlüchtern
Frankfurt am Main
Frankfurt am Main
Nidderau
Schöneck
Karben
Usingen
Kronberg im Taunus
Offenbach
Offenbach
Rodgau
Dietzenbach
Mühlheim
Obertshausen
Hanau
Hanau
Hanau
Hanau
Maintal
Gründau
Biebergemünd
Nidda
Lorsch
Fürth
Bischofsheim
Niedernhausen
Sulzbach
Frankfurt am Main
Groß-Zimmern
Messel
Schaafheim
Eppertshausen
Niddatal
Reichelsheim
Weilrod
Hasselroth
Rimbach
Einhausen
Lautertal
Breuberg
Heidenrod
Aarbe

Let´s check the venues dataframe

In [89]:
print(venues.shape)
venues.head(10)

(1532, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Darmstadt,49.911,8.657,EDEKA Patschull,49.910998,8.658345,Supermarket
1,Darmstadt,49.911,8.657,dm-drogerie markt,49.911357,8.656326,Drugstore
2,Darmstadt,49.911,8.657,K&U Bäckerei,49.910887,8.658263,Bakery
3,Darmstadt,49.911,8.657,Zum goldnen Löwen,49.911161,8.657482,Italian Restaurant
4,Darmstadt,49.911,8.657,Arheilgen Pizza & Kebab,49.911161,8.657469,Doner Restaurant
5,Darmstadt,49.911,8.657,H Löwenplatz,49.911162,8.657466,Tram Station
6,Darmstadt,49.911,8.657,Eisboutique Da Carlo,49.908332,8.657055,Ice Cream Shop
7,Darmstadt,49.911,8.657,Weißer Schwan,49.913647,8.654749,Hotel
8,Darmstadt,49.819,8.645,Kaffeehaus,49.816031,8.64442,Café
9,Darmstadt,49.819,8.645,Bäckerei Hofmann,49.819703,8.644215,Bakery


How many venues were returned per Neighborhood?

In [90]:
venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alsbach-Hähnlein,1,1,1,1,1,1
Altenstadt,4,4,4,4,4,4
Babenhausen,5,5,5,5,5,5
Bad König,3,3,3,3,3,3
Bad Nauheim,19,19,19,19,19,19
Bad Orb,4,4,4,4,4,4
Bad Schwalbach,6,6,6,6,6,6
Bad Soden-Salmünster,4,4,4,4,4,4
Bad Vilbel,13,13,13,13,13,13
Beerfelden,5,5,5,5,5,5


Let's find out how many unique categories can be curated from all the returned venues

In [91]:
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

There are 220 uniques categories.


## Analyze Each Neighborhood

In [92]:
# one hot encoding
onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Neighborhood'] = venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [onehot.columns[-1]]  + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

onehot.head()

Unnamed: 0,Neighborhood,ATM,Afghan Restaurant,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Darmstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Darmstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Darmstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Darmstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Darmstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [93]:
onehot.shape

(1532, 221)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [94]:
grouped = onehot.groupby('Neighborhood').mean().reset_index()
grouped

Unnamed: 0,Neighborhood,ATM,Afghan Restaurant,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Alsbach-Hähnlein,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
1,Altenstadt,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
2,Babenhausen,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
3,Bad König,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
4,Bad Nauheim,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
5,Bad Orb,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
6,Bad Schwalbach,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
7,Bad Soden-Salmünster,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
8,Bad Vilbel,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.076923,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000
9,Beerfelden,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.000000


Let´s confirm the new size

In [95]:
grouped.shape

(165, 221)

Let's print each neighborhood along with the top 5 most common venues

In [96]:
num_top_venues = 5

for hood in grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = grouped[grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alsbach-Hähnlein----
           venue  freq
0     Playground   1.0
1            ATM   0.0
2         Palace   0.0
3       Mountain   0.0
4  Movie Theater   0.0


----Altenstadt----
                  venue  freq
0         Train Station  0.50
1           Supermarket  0.25
2           Gas Station  0.25
3                   ATM  0.00
4  Other Great Outdoors  0.00


----Babenhausen----
                venue  freq
0         Pizza Place   0.4
1  Italian Restaurant   0.2
2               Hotel   0.2
3   Food & Drink Shop   0.2
4                 ATM   0.0


----Bad König----
      venue  freq
0    Bakery  0.33
1     Hotel  0.33
2       Spa  0.33
3    Palace  0.00
4  Mountain  0.00


----Bad Nauheim----
                venue  freq
0                Café  0.16
1  Italian Restaurant  0.16
2                 Pub  0.11
3               Plaza  0.11
4        Cocktail Bar  0.05


----Bad Orb----
                  venue  freq
0                  Café  0.50
1                 Plaza  0.25
2           Gas Stat

Let's put that into a *pandas* dataframe.  
First, let's write a function to sort the venues in descending order.

In [97]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [98]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = grouped['Neighborhood']

for ind in np.arange(grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(grouped.iloc[ind, :], num_top_venues)



In [99]:
neighborhoods_venues_sorted.head(20)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alsbach-Hähnlein,Playground,Yoga Studio,Elementary School,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market
1,Altenstadt,Train Station,Supermarket,Gas Station,Yoga Studio,Event Space,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
2,Babenhausen,Pizza Place,Hotel,Food & Drink Shop,Italian Restaurant,Yoga Studio,Event Space,Food,Flower Shop,Flea Market,Financial or Legal Service
3,Bad König,Bakery,Hotel,Spa,Yoga Studio,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
4,Bad Nauheim,Italian Restaurant,Café,Plaza,Pub,Supermarket,Ice Cream Shop,Monument / Landmark,Drugstore,Park,Department Store
5,Bad Orb,Café,Plaza,Gas Station,Food Truck,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
6,Bad Schwalbach,Hotel,Ice Cream Shop,Botanical Garden,Bus Station,Restaurant,Dry Cleaner,Yoga Studio,Exhibit,Food,Flower Shop
7,Bad Soden-Salmünster,Lottery Retailer,Gastropub,Drugstore,Bank,Yoga Studio,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
8,Bad Vilbel,Miscellaneous Shop,Fried Chicken Joint,Falafel Restaurant,Bakery,Restaurant,Supermarket,Australian Restaurant,Café,Park,Breakfast Spot
9,Beerfelden,Bookstore,Supermarket,Kitchen Supply Store,Mattress Store,Farm,Yoga Studio,Falafel Restaurant,Food Court,Food & Drink Shop,Food


## Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 5 clusters.

In [100]:
# set number of clusters
kclusters = 5

grouped_clustering = grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 0, 4, 1, 4, 4, 4, 4, 4, 4], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [101]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
merged = merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
merged = merged.dropna() # remove the rows where we got NaN after the merge

merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,City,Country,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,60437,Frankfurt am Main,Frankfurt am Main,50.192,8.675,Darmstadt,Germany,4.0,Supermarket,Clothing Store,Ice Cream Shop,Gas Station,Italian Restaurant,Café,Bus Stop,Light Rail Station,Convenience Store,Sushi Restaurant
1,64291,Darmstadt,Darmstadt,49.911,8.657,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
2,64297,Darmstadt,Darmstadt,49.819,8.645,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
3,64283,Darmstadt,Darmstadt,49.872,8.648,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
4,64289,Darmstadt,Darmstadt,49.897,8.681,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop


Finally, let's visualize the resulting clusters

In [102]:
# create maps with clusters
all_maps_clusters = []

for city in cities:
    map_clusters = folium.Map(location=[city['latitude'], city['longitude']], zoom_start=11)

    # set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(merged['Latitude'], merged['Longitude'], merged['Neighborhood'], merged['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster)-1],
            fill=True,
            fill_color=rainbow[int(cluster)-1],
            fill_opacity=0.7).add_to(map_clusters)
    
    all_maps_clusters.append(map_clusters)

We examine the map of clusters for the first city

In [103]:
all_maps_clusters[0]

Now, the map of clusters for the second city

In [104]:
all_maps_clusters[1]

## Examine Clusters


### Cluster 1

In [105]:
merged.loc[merged['Cluster Labels'] == 0, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,City,Country,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Dreieich,Darmstadt,Germany,0.0,Supermarket,Hotel,German Restaurant,Italian Restaurant,Gastropub,Gas Station,Drugstore,Farmers Market,Farm,Yoga Studio
20,Dieburg,Darmstadt,Germany,0.0,Supermarket,Gas Station,Bank,Yoga Studio,Exhibit,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
33,Pfungstadt,Darmstadt,Germany,0.0,Brewery,Supermarket,Fast Food Restaurant,Train Station,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
48,Altenstadt,Darmstadt,Germany,0.0,Train Station,Supermarket,Gas Station,Yoga Studio,Event Space,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
57,Schöneck,Darmstadt,Germany,0.0,Supermarket,Yoga Studio,Football Stadium,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
59,Usingen,Darmstadt,Germany,0.0,Plaza,Supermarket,Gas Station,Café,Exhibit,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
74,Nidda,Darmstadt,Germany,0.0,Supermarket,Drugstore,Plaza,Movie Theater,German Restaurant,Event Space,Food,Flower Shop,Flea Market,Financial or Legal Service
76,Fürth,Darmstadt,Germany,0.0,Chinese Restaurant,Supermarket,Train Station,Greek Restaurant,Yoga Studio,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
78,Niedernhausen,Darmstadt,Germany,0.0,Supermarket,Bakery,Gas Station,Fast Food Restaurant,Train Station,Yoga Studio,Food & Drink Shop,Food,Flower Shop,Flea Market
82,Messel,Darmstadt,Germany,0.0,Supermarket,Soccer Field,Gastropub,Bank,Yoga Studio,Food Truck,Food & Drink Shop,Food,Flower Shop,Flea Market


### Cluster 2

In [106]:
merged.loc[merged['Cluster Labels'] == 1, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,City,Country,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Friedrichsdorf,Darmstadt,Germany,1.0,Hotel,Supermarket,German Restaurant,Train Station,Drugstore,Bakery,Pet Store,Financial or Legal Service,Event Space,Farmers Market
18,Bensheim,Darmstadt,Germany,1.0,Hotel,Train Station,Drugstore,Café,Supermarket,Greek Restaurant,Plaza,Water Park,Hobby Shop,Electronics Store
36,Ober-Ramstadt,Darmstadt,Germany,1.0,Hotel,Gym / Fitness Center,German Restaurant,Train Station,Hobby Shop,Yoga Studio,Farm,Exhibit,Falafel Restaurant,Fast Food Restaurant
37,Groß-Umstadt,Darmstadt,Germany,1.0,Hotel,Vineyard,Café,Liquor Store,Drugstore,Plaza,Dive Bar,Falafel Restaurant,Food & Drink Shop,Food
56,Nidderau,Darmstadt,Germany,1.0,Hotel,Italian Restaurant,Asian Restaurant,Supermarket,Yoga Studio,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop
60,Kronberg im Taunus,Darmstadt,Germany,1.0,Hotel,Auto Dealership,Ice Cream Shop,German Restaurant,Lounge,Pizza Place,Restaurant,Castle,Café,Supermarket
66,Obertshausen,Darmstadt,Germany,1.0,Bakery,Hotel,Insurance Office,Supermarket,Fast Food Restaurant,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop
99,Schmitten,Darmstadt,Germany,1.0,Hotel,Pharmacy,Cafeteria,Bar,Yoga Studio,Exhibit,Food & Drink Shop,Food,Flower Shop,Flea Market
103,Langenselbold,Darmstadt,Germany,1.0,Hotel,Bakery,Mexican Restaurant,Supermarket,Park,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
112,Büttelborn,Darmstadt,Germany,1.0,Hotel,Sporting Goods Shop,Yoga Studio,Event Space,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant


### Cluster 3

In [107]:
merged.loc[merged['Cluster Labels'] == 2, merged.columns[[2] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Neighborhood,City,Country,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,Fischbachtal,Darmstadt,Germany,2.0,German Restaurant,Yoga Studio,Football Stadium,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
71,Maintal,Darmstadt,Germany,2.0,German Restaurant,Ice Cream Shop,Bakery,Football Stadium,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
88,Hasselroth,Darmstadt,Germany,2.0,German Restaurant,Yoga Studio,Football Stadium,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
122,Idstein,Darmstadt,Germany,2.0,German Restaurant,Hotel,Restaurant,Scenic Lookout,Drugstore,Turkish Restaurant,Grocery Store,Dive Bar,Department Store,Flower Shop
131,Steinbach (Taunus),Darmstadt,Germany,2.0,German Restaurant,Ice Cream Shop,Supermarket,Event Space,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
197,Fränkisch-Crumbach,Darmstadt,Germany,2.0,Construction & Landscaping,German Restaurant,Yoga Studio,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service


### Cluster 4

In [108]:
merged.loc[merged['Cluster Labels'] == 3, merged.columns[[3] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Latitude,City,Country,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,50.23,Darmstadt,Germany,3.0,Playground,Arcade,Turkish Restaurant,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service
84,49.951,Darmstadt,Germany,3.0,Playground,Yoga Studio,Elementary School,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market
181,49.739,Darmstadt,Germany,3.0,Playground,Yoga Studio,Elementary School,Food & Drink Shop,Food,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market


### Cluster 5

In [109]:
merged.loc[merged['Cluster Labels'] == 4, merged.columns[[1] + list(range(4, merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,City,Country,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Frankfurt am Main,8.675,Darmstadt,Germany,4.0,Supermarket,Clothing Store,Ice Cream Shop,Gas Station,Italian Restaurant,Café,Bus Stop,Light Rail Station,Convenience Store,Sushi Restaurant
1,Darmstadt,8.657,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
2,Darmstadt,8.645,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
3,Darmstadt,8.648,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
4,Darmstadt,8.681,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
5,Darmstadt,8.645,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
6,Darmstadt,8.637,Darmstadt,Germany,4.0,Supermarket,Café,Italian Restaurant,Bakery,Ice Cream Shop,Bus Stop,Tram Station,Gas Station,Sushi Restaurant,Coffee Shop
7,Frankfurt am Main,8.634,Darmstadt,Germany,4.0,Supermarket,Clothing Store,Ice Cream Shop,Gas Station,Italian Restaurant,Café,Bus Stop,Light Rail Station,Convenience Store,Sushi Restaurant
9,Griesheim,8.572,Darmstadt,Germany,4.0,Bakery,Light Rail Station,Café,Falafel Restaurant,Yoga Studio,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
10,Oberursel,8.577,Darmstadt,Germany,4.0,Metro Station,Italian Restaurant,Supermarket,Gas Station,Gastropub,Thai Restaurant,Yoga Studio,Exhibit,Food,Flower Shop
