# Capstone Project - The Battle of Neighborhoods

This is my contribution to the final Peer-Reviewed Assignment for the Capstone Project for Applied Data Science Specialization from IBM on Coursera.

## Section 1: Introduction

In this section, I will define the idea of my choosing, where I use Foursquare's places API to solve an assumed business opportunity.

### Background

Shopping hours in Germany are very restrictive. Most shops are closed on Sundays, and few shops remain open after 8:00 PM on weekdays. Finding open shops at night is difficult, especially in the winter session. There are many unhappy customers and opportunities for businesses to open and serve in metropolitan cities where people from different cultures live.

This project will map the shops with extended working times and analyze how many households they serve and which neighborhoods may need a shop with more flexible shopping hours. This analysis will focus on Frankfurt am Main, which has 763,380 inhabitants as of December 31, 2019, and one of the global hubs for commerce, culture, education, tourism, and transportation.

The project will help entrepreneurs and shop owners realize the untapped opportunity. It is also assisting consumers in finding neighborhoods with stores that open at the typical closing time.

### Data

Given the project size and simplicity, I will use a few data sources in a narrow context. In this project, I will use the Foursquare API as a primary data source. Foursquare offers various end-points that can be utilized to execute the project's idea.

I will use the [Venue Categories](https://developer.foursquare.com/docs/api-reference/venues/categories/) end-point to get the list of the available categories, focus on the categories related to retailer shops and supermarkets. Then use the [Venue Search](https://developer.foursquare.com/docs/api-reference/venues/search/) to get a list of the venues using the defined categories and a limited radius. After that, I will use the [Venue Hours](https://developer.foursquare.com/docs/api-reference/venues/hours/) premium end-point to get the information about the opening hours.

# Section 2: Data Analysis

## Setting Up

In [1]:
pip install geopy

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 8.1MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import numpy as np
import requests
from geopy.geocoders import Nominatim

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

print('Libraries imported.')

Libraries imported.


In [3]:
CLIENT_ID = 'JQHTMEISFMJTHG5S3OCP2QKLQB02KEACJ5FOPLS5NMYGIGUU' # your Foursquare ID
CLIENT_SECRET = 'EU50MHAP1OMU0CMXUAGLREGRNTRT1MB3IZWCT0P5YQNJDVI0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JQHTMEISFMJTHG5S3OCP2QKLQB02KEACJ5FOPLS5NMYGIGUU
CLIENT_SECRET:EU50MHAP1OMU0CMXUAGLREGRNTRT1MB3IZWCT0P5YQNJDVI0


In [4]:
address = 'Frankfurt, Germany'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Frankfurt are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Frankfurt are 50.1106444, 8.6820917.


## Getting Venues Categories

In [5]:
def getVenueCategories():
    
    venues_cats_list=[]

    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/categories?client_id={}&client_secret={}&v={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION)

    # make the GET request
    results = requests.get(url).json()["response"]['categories']

    # return only relevant information for each nearby venue
    venues_cats_list.append([(
        v['id'], 
        v['name']) for v in results])

    venues_cats_list = pd.DataFrame([item for venues_cats_list in venues_cats_list for item in venues_cats_list])
    venues_cats_list.columns = ['Category ID',  'Category Name']
    
    return(venues_cats_list)

In [6]:
venues_cats_list = getVenueCategories()
venues_cats_list.head(n=20)

Unnamed: 0,Category ID,Category Name
0,4d4b7104d754a06370d81259,Arts & Entertainment
1,4d4b7105d754a06372d81259,College & University
2,4d4b7105d754a06373d81259,Event
3,4d4b7105d754a06374d81259,Food
4,4d4b7105d754a06376d81259,Nightlife Spot
5,4d4b7105d754a06377d81259,Outdoors & Recreation
6,4d4b7105d754a06375d81259,Professional & Other Places
7,4e67e38e036454776db1fb3a,Residence
8,4d4b7105d754a06378d81259,Shop & Service
9,4d4b7105d754a06379d81259,Travel & Transport


In [7]:
# Listing shops and food cateogories
cats_ids = ['4d4b7105d754a06374d81259', '4d4b7105d754a06378d81259']
cats_ids_str = ','.join(cats_ids)

## Getting Venues List

In [8]:
def getNearbyVenues(category, postcodes, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for postcode, lat, lng in zip(postcodes, latitudes, longitudes):            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        for v in results:
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/{}/hours?client_id={}&client_secret={}&v={}'.format(
                v['venue']['id'],
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION)

            # make the GET request
            hours_response = requests.get(url).json()["response"]

            regular_open_hour = None
            regular_close_hour = None
            weekend_open_hour = None
            weekend_close_hour = None

            if 'hours' in hours_response:
                hours = hours_response['hours']

                if 'timeframes' in hours:
                    timeframes = hours['timeframes']

                    regular_open_hour = timeframes[0]['open'][0]['start']
                    regular_close_hour = timeframes[0]['open'][0]['end']

                    weekend_open_hour = regular_open_hour
                    weekend_close_hour = regular_close_hour
                    if 1 in hours:
                        weekend_open_hour = timeframes[1]['open'][0]['start']
                        weekend_close_hour = timeframes[1]['open'][0]['end']

            # return only relevant information for each nearby venue
            venues_list.append([
                postcode,
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name'],
                regular_open_hour,
                regular_close_hour,
                weekend_open_hour,
                weekend_close_hour
            ])

    nearby_venues = pd.DataFrame(data=venues_list)
    nearby_venues.columns = [
        'Neighborhood', 
        'Neighborhood Latitude', 
        'Neighborhood Longitude', 
        'Venue', 
        'Venue Latitude', 
        'Venue Longitude', 
        'Venue Category',
        'Venue Regular Open Hour',
        'Venue Regular Close Hour',
        'Venue Weekend Open Hour',
        'Venue Weekend Close Hour'
    ]
    
    return(nearby_venues)

In [9]:
frankfurt_postal_codes = [
    [60306, 50.1159, 8.6702],
    [60308, 50.1125, 8.6529],
    [60310, 50.1107, 8.673],
    [60311, 50.1112, 8.6831],
    [60313, 50.1153, 8.6823],
    [60314, 50.1137, 8.7119],
    [60316, 50.1209, 8.6966],
    [60318, 50.1252, 8.6865],
    [60320, 50.139, 8.6725],
    [60322, 50.125, 8.6762],
    [60323, 50.1219, 8.6655],
    [60325, 50.1155, 8.6596],
    [60326, 50.1025, 8.6299],
    [60327, 50.1038, 8.6522],
    [60329, 50.1074, 8.6663],
    [60385, 50.1253, 8.7108],
    [60386, 50.1268, 8.7554],
    [60388, 50.1506, 8.7537],
    [60389, 50.1383, 8.7116],
    [60431, 50.1457, 8.6549],
    [60433, 50.1605, 8.6684],
    [60435, 50.1544, 8.6912],
    [60437, 50.1924, 8.6753],
    [60438, 50.1787, 8.632],
    [60439, 50.1605, 8.6337],
    [60486, 50.1162, 8.6365],
    [60487, 50.1257, 8.6414],
    [60488, 50.1416, 8.6155],
    [60489, 50.1252, 8.6088],
    [60528, 50.0837, 8.644],
    [60529, 50.0841, 8.5916],
    [60549, 50.0413, 8.5702],
    [60594, 50.1039, 8.6886],
    [60596, 50.0974, 8.6735],
    [60598, 50.09, 8.6816],
    [60599, 50.096, 8.7111]
]

In [10]:
# define the dataframe columns
column_names = ['Postcode', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(data=frankfurt_postal_codes, columns=column_names)
neighborhoods.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,60306,50.1159,8.6702
1,60308,50.1125,8.6529
2,60310,50.1107,8.673
3,60311,50.1112,8.6831
4,60313,50.1153,8.6823


In [14]:
veneus = getNearbyVenues(
    cats_ids_str,
    neighborhoods['Postcode'],
    neighborhoods['Latitude'],
    neighborhoods['Longitude']
)
veneus.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Regular Open Hour,Venue Regular Close Hour,Venue Weekend Open Hour,Venue Weekend Close Hour
0,60306,50.1159,8.6702,ZENZAKAN - Pan Asian Supperclub,50.114867,8.669458,Asian Restaurant,,,,
1,60306,50.1159,8.6702,Moriki,50.113863,8.66953,Japanese Restaurant,,,,
2,60306,50.1159,8.6702,The Ivory Club,50.114309,8.669109,Indian Restaurant,,,,
3,60306,50.1159,8.6702,Charlot,50.115405,8.671779,Italian Restaurant,,,,
4,60306,50.1159,8.6702,Meyer's Restaurant & Bar,50.114836,8.673319,Steakhouse,,,,


In [17]:
veneus.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Regular Open Hour,Venue Regular Close Hour,Venue Weekend Open Hour,Venue Weekend Close Hour
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
60306,67,67,67,67,67,67,0,0,0,0
60308,37,37,37,37,37,37,0,0,0,0
60310,35,35,35,35,35,35,0,0,0,0
60311,74,74,74,74,74,74,0,0,0,0
60313,100,100,100,100,100,100,0,0,0,0
60314,22,22,22,22,22,22,0,0,0,0
60316,60,60,60,60,60,60,0,0,0,0
60318,47,47,47,47,47,47,0,0,0,0
60320,14,14,14,14,14,14,0,0,0,0
60322,15,15,15,15,15,15,0,0,0,0


## Cluster Neighborhoods

In [18]:
# one hot encoding
veneus_onehot = pd.get_dummies(veneus[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
veneus_onehot['Neighborhood'] = veneus['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [veneus_onehot.columns[-1]] + list(veneus_onehot.columns[:-1])
venues_onehot = veneus_onehot[fixed_columns]

veneus_onehot.head(n=20)

Unnamed: 0,African Restaurant,American Restaurant,Apple Wine Pub,Argentinian Restaurant,Asian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,...,Taverna,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Neighborhood
0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,60306


In [19]:
veneus_grouped = veneus_onehot.groupby('Neighborhood').mean().reset_index()
veneus_grouped.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Apple Wine Pub,Argentinian Restaurant,Asian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Tapas Restaurant,Taverna,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,60306,0.0,0.0,0.014925,0.0,0.029851,0.0,0.0,0.0,0.014925,...,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,60308,0.0,0.0,0.0,0.0,0.135135,0.0,0.0,0.0,0.027027,...,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,60310,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,...,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.028571,0.0
3,60311,0.0,0.013514,0.0,0.0,0.027027,0.013514,0.0,0.0,0.040541,...,0.027027,0.0,0.027027,0.0,0.0,0.0,0.027027,0.013514,0.013514,0.0
4,60313,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.03,...,0.01,0.0,0.05,0.0,0.0,0.01,0.02,0.0,0.01,0.01


In [20]:
num_top_venues = 5

for hood in veneus_grouped['Neighborhood']:
    print("----"+str(hood)+"----")
    temp = veneus_grouped[veneus_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----60306----
                venue  freq
0  Italian Restaurant  0.16
1   German Restaurant  0.13
2          Steakhouse  0.09
3   French Restaurant  0.06
4          Restaurant  0.06


----60308----
                venue  freq
0                Café  0.14
1    Asian Restaurant  0.14
2  Italian Restaurant  0.11
3    Ramen Restaurant  0.05
4      Sandwich Place  0.05


----60310----
                 venue  freq
0                 Café  0.20
1    Indian Restaurant  0.11
2    German Restaurant  0.09
3  Japanese Restaurant  0.06
4   Turkish Restaurant  0.06


----60311----
                venue  freq
0                Café  0.22
1   German Restaurant  0.09
2          Restaurant  0.08
3        Burger Joint  0.05
4  Italian Restaurant  0.04


----60313----
                venue  freq
0                Café  0.15
1          Restaurant  0.09
2        Burger Joint  0.07
3  Italian Restaurant  0.07
4     Thai Restaurant  0.05


----60314----
                 venue  freq
0               Bakery  0.18
1 

In [30]:
# set number of clusters
kclusters = 7

manhattan_grouped_clustering = veneus_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = veneus_grouped['Neighborhood']

for ind in np.arange(veneus_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(veneus_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60306,Italian Restaurant,German Restaurant,Steakhouse,Café,French Restaurant
1,60308,Café,Asian Restaurant,Italian Restaurant,French Restaurant,Ramen Restaurant
2,60310,Café,Indian Restaurant,German Restaurant,Soup Place,Turkish Restaurant
3,60311,Café,German Restaurant,Restaurant,Burger Joint,Italian Restaurant
4,60313,Café,Restaurant,Italian Restaurant,Burger Joint,Thai Restaurant


In [31]:
#neighborhoods_venues_sorted = neighborhoods_venues_sorted.drop(columns=['Cluster Labels'])

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [32]:
veneus_merged = veneus

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
veneus_merged = veneus_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

veneus_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Regular Open Hour,Venue Regular Close Hour,Venue Weekend Open Hour,Venue Weekend Close Hour,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60306,50.1159,8.6702,ZENZAKAN - Pan Asian Supperclub,50.114867,8.669458,Asian Restaurant,,,,,0,Italian Restaurant,German Restaurant,Steakhouse,Café,French Restaurant
1,60306,50.1159,8.6702,Moriki,50.113863,8.66953,Japanese Restaurant,,,,,0,Italian Restaurant,German Restaurant,Steakhouse,Café,French Restaurant
2,60306,50.1159,8.6702,The Ivory Club,50.114309,8.669109,Indian Restaurant,,,,,0,Italian Restaurant,German Restaurant,Steakhouse,Café,French Restaurant
3,60306,50.1159,8.6702,Charlot,50.115405,8.671779,Italian Restaurant,,,,,0,Italian Restaurant,German Restaurant,Steakhouse,Café,French Restaurant
4,60306,50.1159,8.6702,Meyer's Restaurant & Bar,50.114836,8.673319,Steakhouse,,,,,0,Italian Restaurant,German Restaurant,Steakhouse,Café,French Restaurant


In [33]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(veneus_merged['Venue Latitude'], veneus_merged['Venue Longitude'], veneus_merged['Neighborhood'], veneus_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters