# The battle of neighbourhoods - Report

## Introduction: the business problem

The client is a big hotel chain who wants to estabilish itself in the city of Munich. The current subdivision of Munich in neighbourhood, while historically grounded, fails to mirror the internal subdivision of the city. The client is interested in getting a picture of the predominant activities in the different sectors of the city, so as to understand where to open which kind of hotel: a family resort would be ideally placed in an area with parks rather than an industrial area, a more hostel-kind of accomodation close to cafes and pubs, and so on.

It is a typical clustering problem, where we are both interested in geographical proximity of points - neighbourhood should be connected - and some kind of "cultural" proximity, i.e. similar kind of activities. This report summarizes the full notebook, to which we refer for details.

## Data

We review the situation as-is, i.e. the historical neighbourhood/postal codes subdivision:

In [2]:
pc_mun = pd.read_csv("postal_codes_munich.csv")
pc_mun.head()

Unnamed: 0,zipcode,Neighbourhood,latitude,longitude
0,80331,Altstadt-Lehel,48.1345,11.571
1,80333,Altstadt-Lehel,48.1452,11.5668
2,80333,Maxvorstadt,48.1452,11.5668
3,80335,Altstadt-Lehel,48.1427,11.5552
4,80335,Ludwigsvorstadt-Isarvorstadt,48.1427,11.5552


It contains 25 different neighbourhoods and 74 different postal codes:

In [393]:
print("Unique neighbourhoods: " + str(len(pc_mun["Neighbourhood"].unique())))
print("Unique postal codes: " + str(len(pc_mun["zipcode"].unique())))

Unique neighbourhoods: 25
Unique postal codes: 74


The tragic fact here is that postal codes aren't a refinement of the neighbourhoods, as can already be seen from the first rows of the dataset - 80333 corresponds to both Altstadt-Lehel and Maxvorstadt, covering two areas which we would expect to see in different clusters.

So we build a grid over Munich and then cluster the points of the grid:

In [394]:
latitudes = np.linspace(start = 48.09, stop = 48.20, num = 20)
longitudes = np.linspace(start = 11.48, stop = 11.65, num = 20)

grid = pd.DataFrame(index = pd.MultiIndex.from_product([latitudes, longitudes], names = ["Latitude", "Longitude"])).reset_index()
grid["Neighbourhood"] = grid.index
grid.head()

Unnamed: 0,Latitude,Longitude,Neighbourhood
0,48.09,11.48,0
1,48.09,11.488947,1
2,48.09,11.497895,2
3,48.09,11.506842,3
4,48.09,11.515789,4


Neat, right? Now let's do the clustering.

## Methodology

We set up the classification algorithm and let it run, just like in the previous course assignments. We take around every element of the grid a circle of radius 75% the distance between the points, so that we cover all of the map, at the cost of counting some element twice.

We can load the activities for each grid point from Foursquare and obtain a dataframe like this:

In [287]:
munich_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,48.09,11.48,Rossmann,48.087505,11.484407,Drugstore
1,0,48.09,11.48,Schweizer Platz,48.088626,11.479916,Plaza
2,0,48.09,11.48,REWE,48.089149,11.480729,Supermarket
3,0,48.09,11.48,Ratschiller's,48.089207,11.480467,Bakery
4,0,48.09,11.48,Wochenmarkt am Schweizer Platz,48.089108,11.480147,Farmers Market


In [288]:
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))

There are 320 uniques categories.


And now for the clustering - we drop the Neighbourhood ID for the dataset, since we don't want the algorithm to use it:

In [366]:
# set number of clusters
kclusters = 28

munich_grouped_clustering = munich_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(munich_grouped_clustering)

array([12, 12, 12, 12, 12, 12, 12, 10, 10,  7,  7,  7], dtype=int32)

Now we have the clusters and we can merge everything back in the original dataframe. We add columns with the most common venues per grid point, which will help us analyse the results later:

In [367]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = munich_grouped['Neighborhood']

for ind in np.arange(munich_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(munich_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged = grid

munich_merged = munich_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), left_on='Neighbourhood', right_on = "Neighborhood")

munich_merged.head() # check the last columns!

Unnamed: 0,Latitude,Longitude,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,48.095789,11.488947,21,12,German Restaurant,Playground,Italian Restaurant,Castle,Accessories Store,Outdoor Sculpture,Organic Grocery,Optical Shop,Opera House,Office
1,48.095789,11.497895,22,12,Supermarket,Asian Restaurant,Bowling Alley,Bakery,Spa,Massage Studio,Drugstore,Metro Station,Nightclub,Optical Shop
2,48.095789,11.506842,23,12,Gym / Fitness Center,Hotel,Furniture / Home Store,Pet Store,Modern European Restaurant,Supermarket,Organic Grocery,Drugstore,Asian Restaurant,Electronics Store
3,48.095789,11.515789,24,12,Hotel,Bank,Café,Supermarket,Pet Store,Drugstore,Pizza Place,Construction & Landscaping,Greek Restaurant,Ice Cream Shop
4,48.095789,11.524737,25,12,Bakery,Gym / Fitness Center,Greek Restaurant,Supermarket,Doner Restaurant,Rental Car Location,Café,Noodle House,Organic Grocery,Optical Shop


## Results

Here's the number of points per cluster:

In [404]:
munich_merged.groupby("Cluster Labels").agg({"Cluster Labels":"count"})

Unnamed: 0_level_0,Cluster Labels
Cluster Labels,Unnamed: 1_level_1
0,18
1,12
2,14
3,3
4,12
5,25
6,17
7,22
8,19
9,5


Of course, it's so much better to visualize this on a map - take a look at the notebook for an interactive map.

## Discussion

The city center is divided in four parts, corresponding to clusters 0, 8, 20 and 6 - let's look at them one by one as an example.

### Cluster 0 - Maxvorstadt and Schwabing West

These are the point in the cluster:

In [397]:
munich_merged.loc[munich_merged['Cluster Labels'] == 0, munich_merged.columns[list(range(2, munich_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
153,190,0,Restaurant,Plaza,Café,German Restaurant,Church,Tram Station,Botanical Garden,Nightclub,Shopping Mall,Middle Eastern Restaurant
154,191,0,Café,Plaza,Boutique,French Restaurant,Cocktail Bar,Clothing Store,Bar,Hotel,Restaurant,Italian Restaurant
170,209,0,Café,Asian Restaurant,Middle Eastern Restaurant,Theater,History Museum,Restaurant,Salad Place,Coffee Shop,Sushi Restaurant,Steakhouse
171,210,0,Café,History Museum,Art Museum,Italian Restaurant,Japanese Restaurant,Plaza,Sushi Restaurant,Peruvian Restaurant,Event Space,Field
172,211,0,Café,Italian Restaurant,Ice Cream Shop,Breakfast Spot,Bar,Cocktail Bar,Bakery,Restaurant,Burger Joint,Plaza
173,212,0,Café,Italian Restaurant,Ice Cream Shop,Surf Spot,River,Eastern European Restaurant,Frozen Yogurt Shop,Snack Place,Nightclub,Beer Garden
188,229,0,Café,Asian Restaurant,Steakhouse,Bar,Bakery,German Restaurant,Falafel Restaurant,Ramen Restaurant,Doner Restaurant,Bookstore
189,230,0,Café,Bar,Bakery,Italian Restaurant,Mediterranean Restaurant,Vietnamese Restaurant,Spanish Restaurant,Gastropub,Cocktail Bar,French Restaurant
190,231,0,Bar,Café,Italian Restaurant,Ice Cream Shop,Asian Restaurant,German Restaurant,Restaurant,Steakhouse,Burger Joint,Breakfast Spot
191,232,0,Ice Cream Shop,Irish Pub,Beer Garden,Café,Restaurant,Optical Shop,Bagel Shop,Steakhouse,Bar,Monument / Landmark


This is where we'd build the youth hostel - look at the amount of cafes and bars in the top three. Maxvorstadt and Schwabing are indeed known to be hip neighbourhoods.

### Cluster 8 - Altstadt-Lehel and Au-Haidhausen

These are the points in the cluster:

In [398]:
munich_merged.loc[munich_merged['Cluster Labels'] == 8, munich_merged.columns[list(range(2, munich_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
101,132,8,Plaza,Hotel,Burger Joint,Bakery,Beer Garden,Supermarket,Greek Restaurant,Brewery,Café,Tram Station
102,133,8,Italian Restaurant,Plaza,French Restaurant,Restaurant,Café,Supermarket,Turkish Restaurant,Organic Grocery,German Restaurant,Bus Stop
103,134,8,Hotel,Gym / Fitness Center,Climbing Gym,Pub,Nightclub,Beach Bar,Beer Bar,Gym,Supermarket,Fried Chicken Joint
104,135,8,Bus Stop,Pub,Discount Store,Shipping Store,Beach Bar,Liquor Store,Nightclub,Austrian Restaurant,Turkish Restaurant,German Restaurant
118,151,8,Café,Bavarian Restaurant,Coffee Shop,Cocktail Bar,Pizza Place,Bookstore,German Restaurant,Theater,Tea Room,Bistro
119,152,8,Indian Restaurant,Ice Cream Shop,Pizza Place,Concert Hall,Science Museum,Supermarket,Hotel,Doner Restaurant,Afghan Restaurant,Gourmet Shop
120,153,8,Italian Restaurant,Café,Plaza,Bakery,German Restaurant,Indian Restaurant,Bar,Ice Cream Shop,French Restaurant,Concert Hall
121,154,8,German Restaurant,Hotel,Italian Restaurant,Café,Plaza,Indian Restaurant,Spanish Restaurant,Vegetarian / Vegan Restaurant,Bakery,Donut Shop
122,155,8,Italian Restaurant,Hotel,Portuguese Restaurant,Indian Restaurant,Home Service,Pizza Place,Doner Restaurant,Coffee Shop,Restaurant,Climbing Gym
136,171,8,Café,Bavarian Restaurant,Coffee Shop,Hotel,Plaza,Bookstore,Pizza Place,German Restaurant,Clothing Store,Italian Restaurant


This is a more residential area - we often see drugstores and supermarkets in the first positions. It's still central, but looking at the amount of restaurants vs bars we can tell that the target is different. Hotels are also more common.

### Cluster 20 - Ludwigsvorstadt-Isarvorstadt and Sendling

These are the points in the cluster:

In [399]:
munich_merged.loc[munich_merged['Cluster Labels'] == 20, munich_merged.columns[list(range(2, munich_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
43,68,20,Plaza,Turkish Restaurant,Soccer Field,Construction & Landscaping,Hotel,Bakery,Beer Garden,BBQ Joint,Climbing Gym,Greek Restaurant
44,69,20,Park,Beach,Gastropub,Café,Rest Area,Seafood Restaurant,Beer Garden,Accessories Store,Nightclub,Optical Shop
61,88,20,Plaza,Italian Restaurant,German Restaurant,Turkish Restaurant,Gas Station,Bakery,Park,Supermarket,Organic Grocery,Hotel
62,89,20,Athletics & Sports,Restaurant,Park,Trail,Gas Station,Soccer Field,Beach,Rest Area,Newsstand,Optical Shop
63,90,20,Taverna,Plaza,Drugstore,Bus Line,Spa,Bus Stop,Café,German Restaurant,Greek Restaurant,Gym
78,107,20,German Restaurant,Doner Restaurant,Bank,Gastropub,Bus Stop,Vietnamese Restaurant,Café,Italian Restaurant,Drugstore,Spanish Restaurant
79,108,20,Italian Restaurant,Supermarket,Market,Food & Drink Shop,Gastropub,Grocery Store,Falafel Restaurant,Gym Pool,Bar,Bus Stop
80,109,20,Bar,Café,Italian Restaurant,Supermarket,Turkish Restaurant,Park,Bakery,Cocktail Bar,Food Court,Trail
81,110,20,German Restaurant,Drugstore,Soccer Field,Plaza,Cupcake Shop,Gastropub,Bar,Beach,Beer Garden,Taverna
82,111,20,Italian Restaurant,Bar,Plaza,Pizza Place,German Restaurant,Café,Bakery,Drugstore,Brewery,Doner Restaurant


Yet another cafe area - it probably got separated from Cluster 0 because the algorithm tends to prefer circular neighbourhoods.

### Cluster 6 - Schwantalerhöhe and Neuhausen

In [409]:
munich_merged.loc[munich_merged['Cluster Labels'] == 18, munich_merged.columns[list(range(2, munich_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
192,233,18,Bus Stop,Monument / Landmark,Hotel Pool,Bavarian Restaurant,Restaurant,Beer Garden,Snack Place,Recreation Center,Athletics & Sports,Hotel
193,234,18,Bank,Bakery,Supermarket,Park,Gourmet Shop,Athletics & Sports,Hostel,German Restaurant,Organic Grocery,Bus Stop
210,253,18,Bar,Tunnel,Dog Run,Comedy Club,Trattoria/Osteria,Boat Rental,German Restaurant,Snack Place,Convenience Store,Beer Garden
211,254,18,Bathing Area,Bavarian Restaurant,Tennis Court,Café,Accessories Store,Noodle House,Organic Grocery,Optical Shop,Opera House,Office
212,255,18,Bathing Area,Trattoria/Osteria,Bus Stop,Accessories Store,Newsstand,Organic Grocery,Optical Shop,Opera House,Office,Noodle House
228,273,18,Hotel,German Restaurant,Bar,Afghan Restaurant,Bus Stop,Trattoria/Osteria,Nightclub,Outlet Store,Outdoor Sculpture,Organic Grocery
229,274,18,Photography Studio,Bathing Area,Bavarian Restaurant,Park,Tennis Court,Volleyball Court,Beer Garden,Newsstand,Opera House,Office
231,276,18,Bus Stop,Asian Restaurant,Gas Station,Bakery,Theater,Bar,Italian Restaurant,Music Store,Opera House,Outlet Store
247,294,18,Stadium,Bed & Breakfast,Trail,Beer Garden,Moving Target,Newsstand,Optical Shop,Mountain,Opera House,Office
248,295,18,Indie Theater,Lake,Park,Bathing Area,Dog Run,Skate Park,Music Venue,Nature Preserve,New American Restaurant,Newsstand


This is the area around the central station - look at how many hotels!

### Other clusters

The algorithm caught some other interesting features, for example the industrial area northwest (Cluster 9), the zoo in the south (Cluster 10) and the park (Cluster 18).

## Conclusion

The algorithm provides a new rationale for the subdivision of the city, grouping and splitting old neighbourhoods into new ones.