# Capstone Project - The Battle of the Neighborhoods City- Cologne

### Cologne City New Restaurant Business Recommendation
**Sandeepan Mukherjee**

***March 2021***

### Introduction: Business Problem

Cologne is a city near which the author lives nearby, and it is well-known for its cathedral, trade fairs and conferences, shopping boulevard, and lively party scene. Since it draws a massive amount of visitors, there is a large business potential for restaurants because there is never a lack of people from all over Europe and the world visiting the beautiful region. So ther is wide variety of cousines and restaurants one can chose to start their business venture into.
Thus, the goal I want to reach with this exercise is to give a simple recommendation to businesses and stakeholders who are looking to open a new restaurant in Cologne and solve issues like in which district of the city will you find a large number or even concentration of which types of restaurants? Where to open a Mediterranean food, German food, or where to get fast food and the list of competitors in that area. The target audience is Food Entrepreneurs and Business owners.

### Data


As mentioned in the project requirements, I will use foursquare data about restaurants in Cologne. Foursquare is a US tech company from New York focusing on location data. Their technology and data powers apps such as Apple's Maps, Uber, Twitter and many other household names. Here is an example of a restaurants in Cologne on foursquare: https://de.foursquare.com/v/sattgr%C3%BCn/5c33306cc824ae002c2b414c. I will use foursquare data such as the restaurant name, ID, location and category of food (vegetarian, Italian etc.).

Also, I will use the overview of districts/city parts of Cologne from Wikipedia: https://en.wikipedia.org/wiki/Districts_of_Cologne 

Based on this criterion, we will use Data science Python libraries to create a few promising neighborhoods. The benefits of each region can then be specifically articulated so that stakeholders will choose the best possible final spot.

### Methodology

This section we will do exploratory data analysis starting with scraping of the Wikipedia page districts of Cologne: https://en.wikipedia.org/wiki/Districts_of_Cologne. To create a pandas dataframe and then perform data cleaning task to make the results look presentable and drop and remove inconsistencies in the dataframe

In [1]:
#importing required libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

In [2]:
#reading the url in pandas dataframe
df = pd.read_html('https://en.wikipedia.org/wiki/Districts_of_Cologne#Districts')[1]

#### Remove any Boroughs that are not Assigned

In [3]:
#Dropping the columns which are not required and contain Nan values
df.drop("Map", axis=1, inplace=True)
df.drop("Coat", axis=1, inplace=True)
df.drop("Town Hall", axis=1, inplace=True)
df["City district"]=df["City district"].str[11:]
df.drop([10], inplace=True)
df.drop([9], inplace=True)
df.head(9)

Unnamed: 0,City district,City parts,Area,Population1,Pop. density,District Councils
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-N...",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50..."
1,Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marien...",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-5099..."
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenth...",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 509..."
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, N...",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421,..."
4,Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, ...",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln"
5,Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühling...",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765..."
6,Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven...",78.8 km²,106.52,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-5..."
7,Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Mer...",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51..."
8,Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flit...",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln"


#### Geospatial data from Foursquare
**In this subsection i used geopy functions do extract neccessary Geospatial Data from Foursquare API where I used the nominatim function to add geospatial data to the data frame, The lat and long is on the right side of the following table**

In [4]:
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
geolocator = Nominatim(user_agent="Cologne_food")

df['Major_Dist_Coord']= df['City district'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df[['Latitude', 'Longitude']] = df['Major_Dist_Coord'].apply(pd.Series)

df.drop(['Major_Dist_Coord'], axis=1, inplace=True)
df

Unnamed: 0,City district,City parts,Area,Population1,Pop. density,District Councils,Latitude,Longitude
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-N...",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50...",50.937328,6.959234
1,Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marien...",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-5099...",50.865622,6.969718
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenth...",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 509...",50.935935,6.871246
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, N...",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421,...",50.951502,6.916529
4,Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, ...",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln",50.958994,6.941777
5,Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühling...",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765...",51.021167,6.898034
6,Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven...",78.8 km²,106.52,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-5...",50.906705,6.999129
7,Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Mer...",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51...",50.931923,7.005806
8,Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flit...",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln",50.958147,7.013526


#### Generate a Map of Cologne

In [5]:
address = 'Cologne'

geolocator = Nominatim(user_agent="cologne_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of cologne are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of cologne are 50.938361, 6.959974.


#### Importing required libraries for Visualisation and performing Clustering Using Kmeans

In [7]:
#importing required libraries for Visualisation and performing Clustering Using Kmeans
import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.4 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Libraries imported.


In [8]:
# create map of Cologne using latitude and longitude 
map_cologne = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to the map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['City district']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cologne) 
    
map_cologne

#### Explore the Neighbourhoods Using FourSquare


In [9]:
CLIENT_ID = '4JGSVKZ35LK5D1ONR3QYTHCWNGBY3K1LKJTVEVYEHK3ZHIYO' # your Foursquare ID
CLIENT_SECRET = 'XTT3YE0F4FEDE0DQ423XIYVATRDV2BO4Z112J4TDDGGS2YGW' # your Foursquare Secret
VERSION = '20210220' # what version of Foursquare you want to use
LIMIT = 20 # max limit is 50 

#### Define a Function that Grabs Nearby Venues

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City district', 
                  'City district Latitude', 
                  'City district Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Get the Venues

In [11]:
cologne_venues = getNearbyVenues(names=df['City district'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'])

Köln-Innenstadt
Köln-Rodenkirchen
Köln-Lindenthal
Köln-Ehrenfeld
Köln-Nippes
Köln-Chorweiler
Köln-Porz
Köln-Kalk
Köln-Mülheim


In [12]:
print(cologne_venues.shape)
cologne_venues.head()

(114, 7)


Unnamed: 0,City district,City district Latitude,City district Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Köln-Innenstadt,50.937328,6.959234,Craftbeer Corner,50.937222,6.958928,Beer Bar
1,Köln-Innenstadt,50.937328,6.959234,LEGO Store,50.937091,6.956605,Toy / Game Store
2,Köln-Innenstadt,50.937328,6.959234,Papa Joe's Jazzlokal,50.937882,6.962241,Jazz Club
3,Köln-Innenstadt,50.937328,6.959234,Alter Markt,50.938623,6.96007,Plaza
4,Köln-Innenstadt,50.937328,6.959234,Heumarkt,50.936161,6.960461,Plaza


### Analysis
**In this section we will perform some analysis to see the trends, hotspots and choices of food culture of the population in the city.**

In [13]:
cologne_venues.groupby('City district').count()

Unnamed: 0_level_0,City district Latitude,City district Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City district,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Köln-Chorweiler,13,13,13,13,13,13
Köln-Ehrenfeld,20,20,20,20,20,20
Köln-Innenstadt,20,20,20,20,20,20
Köln-Kalk,4,4,4,4,4,4
Köln-Lindenthal,18,18,18,18,18,18
Köln-Mülheim,20,20,20,20,20,20
Köln-Nippes,9,9,9,9,9,9
Köln-Porz,5,5,5,5,5,5
Köln-Rodenkirchen,5,5,5,5,5,5


In [14]:
# find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(cologne_venues['Venue Category'].unique())))

There are 73 uniques categories.


In [15]:
# Creating a function return Most Common Venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [16]:
# one hot encoding
cologne_onehot = pd.get_dummies(cologne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
cologne_onehot['City district'] = cologne_venues['City district'] 

# move neighborhood column to the first column
fixed_columns = [cologne_onehot.columns[-1]] + list(cologne_onehot.columns[:-1])
cologne_onehot = cologne_onehot[fixed_columns]

cologne_onehot.shape

(114, 74)

#### Group the rows together and find the frequency of occurance

In [17]:
cologne_grouped = cologne_onehot.groupby('City district').mean().reset_index()
cologne_grouped

Unnamed: 0,City district,Art Gallery,Art Museum,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Bar,Baseball Stadium,Bavarian Restaurant,...,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Thai Restaurant,Theater,Toy / Game Store,Tram Station,Trattoria/Osteria,Vegetarian / Vegan Restaurant
0,Köln-Chorweiler,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,...,0.0,0.0,0.0,0.153846,0.076923,0.0,0.0,0.0,0.0,0.0
1,Köln-Ehrenfeld,0.05,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Köln-Innenstadt,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,...,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.05
3,Köln-Kalk,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
4,Köln-Lindenthal,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.055556,0.0,...,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.111111,0.055556,0.0
5,Köln-Mülheim,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,...,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0
6,Köln-Nippes,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,...,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
7,Köln-Porz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Köln-Rodenkirchen,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**We can use this information to build a dataframe in which you can see the most popular restaurant venue styles for each city district**

In [18]:
filtered_columns = ['City district'] + [col for col in cologne_grouped.columns if col.endswith('Restaurant')]
dataframe_filtered = cologne_grouped.loc[:, filtered_columns].head()
dataframe_filtered

Unnamed: 0,City district,Bavarian Restaurant,Chinese Restaurant,Doner Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Greek Restaurant,Italian Restaurant,Kebab Restaurant,Lebanese Restaurant,Restaurant,Scandinavian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
0,Köln-Chorweiler,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0
1,Köln-Ehrenfeld,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.05,0.05,0.0,0.1,0.0,0.0,0.0
2,Köln-Innenstadt,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05
3,Köln-Kalk,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Köln-Lindenthal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0


**Now we will create here a function to retreive top 10 restaurents in city Cologne**

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
#Retrieving top 10 restaurants
num_top_restaurant = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City district']
for ind in np.arange(num_top_restaurant):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['City district'] = dataframe_filtered['City district']

for ind in np.arange(dataframe_filtered.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dataframe_filtered.iloc[ind, :], num_top_restaurant)

neighborhoods_venues_sorted.head()

Unnamed: 0,City district,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Köln-Chorweiler,Thai Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Italian Restaurant,Greek Restaurant,Falafel Restaurant
1,Köln-Ehrenfeld,Restaurant,Kebab Restaurant,Italian Restaurant,Falafel Restaurant,Ethiopian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Lebanese Restaurant,Greek Restaurant
2,Köln-Innenstadt,Vegetarian / Vegan Restaurant,Lebanese Restaurant,Fast Food Restaurant,Bavarian Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Kebab Restaurant,Italian Restaurant,Greek Restaurant
3,Köln-Kalk,Greek Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Italian Restaurant,Fast Food Restaurant,Falafel Restaurant
4,Köln-Lindenthal,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Greek Restaurant,Fast Food Restaurant,Falafel Restaurant


**The table includes the city districts and their most popular venues and Use Kmeans clustering on it create clusters.**

In [20]:
# set number of clusters
kclusters = 5

cologne_grouped_clustering = dataframe_filtered.drop('City district', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cologne_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 3, 1, 2, 0], dtype=int32)

In [21]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

cologne_merged = df

# merge cologne_merged with cologne_data to add latitude/longitude for each neighborhood
cologne_merged = cologne_merged.join(neighborhoods_venues_sorted.set_index('City district'), on='City district')

cologne_merged.head()

Unnamed: 0,City district,City parts,Area,Population1,Pop. density,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-N...",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50...",50.937328,6.959234,1.0,Vegetarian / Vegan Restaurant,Lebanese Restaurant,Fast Food Restaurant,Bavarian Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Kebab Restaurant,Italian Restaurant,Greek Restaurant
1,Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marien...",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-5099...",50.865622,6.969718,,,,,,,,,,,
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenth...",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 509...",50.935935,6.871246,0.0,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Greek Restaurant,Fast Food Restaurant,Falafel Restaurant
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, N...",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421,...",50.951502,6.916529,3.0,Restaurant,Kebab Restaurant,Italian Restaurant,Falafel Restaurant,Ethiopian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Lebanese Restaurant,Greek Restaurant
4,Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, ...",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln",50.958994,6.941777,,,,,,,,,,,


In [22]:
#Dropping Null Values
cologne_merged.dropna()

Unnamed: 0,City district,City parts,Area,Population1,Pop. density,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-N...",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50...",50.937328,6.959234,1.0,Vegetarian / Vegan Restaurant,Lebanese Restaurant,Fast Food Restaurant,Bavarian Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Kebab Restaurant,Italian Restaurant,Greek Restaurant
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenth...",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 509...",50.935935,6.871246,0.0,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Greek Restaurant,Fast Food Restaurant,Falafel Restaurant
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, N...",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421,...",50.951502,6.916529,3.0,Restaurant,Kebab Restaurant,Italian Restaurant,Falafel Restaurant,Ethiopian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Lebanese Restaurant,Greek Restaurant
5,Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühling...",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765...",51.021167,6.898034,4.0,Thai Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Italian Restaurant,Greek Restaurant,Falafel Restaurant
7,Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Mer...",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51...",50.931923,7.005806,2.0,Greek Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Italian Restaurant,Fast Food Restaurant,Falafel Restaurant


**The table includes the city districts and their most popular venues, which have now been allocated one of five distinct cluster labels ranging from 0 to 4.**

**We can now use the cluster labels to show the city districts marked with a cluster-specific color on a map using the library folium:**

In [33]:
address = 'Köln Innenstadt'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of cologne are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of cologne are 50.93732845, 6.959234323073302.


In [34]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, Citydistrict, ClusterLabels in zip(cologne_merged['Latitude'], cologne_merged['Longitude'], cologne_merged['City district'], cologne_merged['Cluster Labels']):
    label = folium.Popup(str(Citydistrict) + ' Cluster ' + str(ClusterLabels), parse_html=True)
    #label = '{}, {}'.format(Citydistrict, ClusterLabels)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[Cluster-1],
        fill=True,
        #fill_color=rainbow[Cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### What you see above is the nine bubbles for the nine city districts, with five different colors for the five different clusters.

### *Examination of the five clusters and Results*

**Now, we will examine each cluster and determine the discriminating venue categories that distinguish each cluster. This will help our stakeholders understand in which area of the city there is a more footfall of customers and their cousine choice**

#### *Cluster 1*
**Italian Cuisine Cluster(Lindenthal)**

In [35]:
cologne_merged.loc[cologne_merged['Cluster Labels'] == 0, cologne_merged.columns[[1] + list(range(5, cologne_merged.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Braunsfeld, Junkersdorf, Klettenberg, Lindenth...","Bezirksamt Lindenthal Aachener Straße 220, 509...",50.935935,6.871246,0.0,Italian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Greek Restaurant,Fast Food Restaurant,Falafel Restaurant


#### *Cluster 2*
**Vegan Cluster(Innenstadt)**

In [36]:
cologne_merged.loc[cologne_merged['Cluster Labels'] == 1, cologne_merged.columns[[1] + list(range(5, cologne_merged.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-N...","Bezirksksamt Innenstadt Brückenstraße 19, D-50...",50.937328,6.959234,1.0,Vegetarian / Vegan Restaurant,Lebanese Restaurant,Fast Food Restaurant,Bavarian Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Kebab Restaurant,Italian Restaurant,Greek Restaurant


#### *Cluster 3*
**Greek Cuisine Cluster(KalkKalker)**

In [38]:
cologne_merged.loc[cologne_merged['Cluster Labels'] == 2, cologne_merged.columns[[1] + list(range(5, cologne_merged.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Mer...","Bezirksamt KalkKalker Hauptstraße 247–273,D-51...",50.931923,7.005806,2.0,Greek Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Italian Restaurant,Fast Food Restaurant,Falafel Restaurant


#### *Cluster 4*
**Turkish Cuisine Cluster(Ehrenfeld)**

In [31]:
cologne_merged.loc[cologne_merged['Cluster Labels'] == 3, cologne_merged.columns[[1] + list(range(5, cologne_merged.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, N...","Bezirksamt Ehrenfeld Venloer Straße 419 – 421,...",50.951502,6.916529,3.0,Restaurant,Kebab Restaurant,Italian Restaurant,Falafel Restaurant,Ethiopian Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Scandinavian Restaurant,Lebanese Restaurant,Greek Restaurant


#### *Cluster 5*
**Thai Cuisine Cluster(Chorweiler)**

In [32]:
cologne_merged.loc[cologne_merged['Cluster Labels'] == 4, cologne_merged.columns[[1] + list(range(5, cologne_merged.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,"Blumenberg, Chorweiler, Esch/Auweiler, Fühling...","Bezirksamt Chorweiler Pariser Platz 1, D-50765...",51.021167,6.898034,4.0,Thai Restaurant,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Scandinavian Restaurant,Restaurant,Lebanese Restaurant,Kebab Restaurant,Italian Restaurant,Greek Restaurant,Falafel Restaurant


### Discussion and Conclusion

Answer to Business Question:
    **Cluster 1** is relects that most preffered place where customers prefer Italian Cuisine but opening an Italian Restaurant In cluster 1 would also lead to sheer level of high competition. Whereas Cluster 2,3 and 5 have less footfall for Italian Cuisine but there is a huge business opportunity for good Italian Cuisine Restaurant.
    **Cluster 2** reflects that Vegetarian/Vegan Restaurant is most preffered by customers and visitors there and opening Vegan Cuisine restaurant in this cluster would lead to lot of competition but Cluster 4 would be the best place for stakeholders and business owners to open a good Vegan cuisine restaurant 
    **Cluster 3** reflects that Greek Restaurant is most preffered by customers and visitors there and opening Greek Cuisine restaurant in this cluster would lead to lot of competition but Cluster 2 would be the best place for stakeholders and business owners to open a good Vegan cuisine restaurant and Cluster 1,4 and 5 also looks to be a good place as well.
    **Cluster 4** reflects that Turkish Restaurant is most preffered by customers and visitors there and opening Turkish Cuisine restaurant in this cluster would lead to lot of competition but Cluster 2 would be the best place for stakeholders and business owners to open a good Turkish cuisine restaurant and Cluster 1, 3 and 5 also looks to be a good place as well.
    **Cluster 5** reflects that Thai Restaurant is most preffered by customers and visitors there and opening Thai Cuisine restaurant in this cluster would lead to lot of competition but Cluster 4 would be the best place for stakeholders and business owners to open a good Turkish cuisine restaurant.