# Searching for the Coffee Shops Location

The Battle of the Neighborhoods - Applied Data Science Capstone Project (Week 2)

## Table of contents
* [Introduction](#introduction)
* [Data Requirement & Acquisition](#data)
* [Analysis](#analysis)

## Introduction <a name="introduction"></a>

This project is purposed to find a good location for a Coffee Shop in **Bandung** Indonesia to meet the need of stakeholders whom have interest. In this project, the term **good location for a Coffee Shop** is interpreted as **areas where people are gathered** but **with no coffee shops around**. The area where people are gathered in this project is represented by the area that have a lot of venues. Some data science techniques and tools will be employed to generate the most suitable neighborhoods that meet the criteria. The result can be used as the consideration for the stakeholder in deciding the final location. 

## Data Requirement & Acquisition <a name="data"></a>

### Data Requirement
Based on the problem definition, factors that will influence our decission are:
* number of existing venue in the neighborhood (it indicates the crowds)
* inexistence of cofee shops in the neighborhood
* similarity of venue in the neighborhood

For these purpose, this project need the data as follows:
* Bandung with district and neighborhood’s data. This data should consist of all district in Bandung city. Each district should consist with all neighborhood with postcode and geocode data. This data is used as a reference data for constructing a map and as the initial location to be explore.
* Bandung venue data.

### Data Acquisition

In [35]:
#importing the required libraries

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


#### Bandung's Neighborhood Data

In [36]:
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

In [37]:
# The code was removed by Watson Studio for sharing.

In [38]:
#getting Bandung's Neighborhood data from .csv file
PostCode = pd.read_csv(body)
pd.options.display.float_format = '{:,.4f}'.format
print(PostCode.shape)
PostCode.head()

(151, 5)


Unnamed: 0,Post_Code,District,Neighborhood,Latitude,Longitude
0,40111,Sumur Bandung,Braga,-6.9176,107.6094
1,40112,Sumur Bandung,Kebon Pisang,-6.9189,107.6171
2,40113,Sumur Bandung,Merdeka,-6.9137,107.6201
3,40114,Bandung Wetan,Cihapit,-6.9083,107.626
4,40115,Bandung Wetan,Citarum,-6.9035,107.6171


In [39]:
address = 'Alun-alun Bandung, Indonesia'

geolocator = Nominatim (user_agent="bandung_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bandung city center are  {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bandung city center are  -6.9211979, 107.6074903.


In [40]:
# create map of Bandung using latitude and longitude values
map_bandung = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(PostCode['Latitude'], PostCode['Longitude'], PostCode['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='green',
        fill=False,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bandung)  
    
map_bandung

#### Bandung's Venues Data
(Working with Foursquare API)

In [41]:
# The code was removed by Watson Studio for sharing.

In [42]:
# get neighborhood coordinates data

neighborhood_latitude = PostCode.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = PostCode.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = PostCode.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Braga are -6.917563900000001, 107.6093662.


In [48]:
radius = 500
LIMIT = 100

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
#url

In [49]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5de973c647b43d4c24cb10f0'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bandung',
  'headerFullLocation': 'Bandung',
  'headerLocationGranularity': 'city',
  'totalResults': 94,
  'suggestedBounds': {'ne': {'lat': -6.913063895499997,
    'lng': 107.613890739005},
   'sw': {'lat': -6.922063904500005, 'lng': 107.60484166099499}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '59369cf109e28312821ff13d',
       'name': 'éL Royale Hotel Bandung',
       'location': {'address': 'Jl. Merdeka No. 2',
        'crossStreet': 'Jl. Lembong',
        'lat': -6.916102092125143,
        'lng': 107.61060033675521,
        'labeledLatLngs': [{'label': 'disp

In [45]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [46]:
# clean json and create dataframe 
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
nearby_venues.head()

94 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,éL Royale Hotel Bandung,Hotel,-6.9161,107.6106
1,Starbucks,Coffee Shop,-6.917,107.6091
2,Braga Permai - Maison Bogerijen,Eastern European Restaurant,-6.9174,107.6094
3,Braga Punya Cerita,Café,-6.9172,107.6093
4,Hangover,Bar,-6.9186,107.6097


In [47]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

bandung_venues = getNearbyVenues(names=PostCode['Neighborhood'],
                                   latitudes=PostCode['Latitude'],
                                   longitudes=PostCode['Longitude']
                                  )

Braga
Kebon Pisang
Merdeka
Cihapit
Citarum
Tamansari
Babakan Ciamis
Cicadas
Sukamaju
Cihaur Geulis
Sukaluyu
Neglasari
Cikutra
Padasuka
Sukapada
Cipaganti
Lebak Gede
Lebak Siliwangi
Sadang Serang
Sekeloa
Dago
Hegarmanah
Ciumbuleuit
Ledeng
Sarijadi
Sukarasa
Geger Kalong
Isola
Pasteur
Cipedes
Sukabungah
Sukagalih
Sukawarna
Pasir Kaliki
Arjuna
Pajajaran
Pamoyanan
Husen Sastranegara
Sukaraja
Kebon Jeruk
Ciroyom
Dungus Cariang
Campaka
Garuda
Maleber (Maleer)
Cigadung
Pasirlayung
Jatihandap
Karang Pamulang
Pasir Impun
Sindang Jaya
Warung Muncang
Caringin
Cibuntu
Cijerah
Cigondewah Kaler
Cigondewah Kidul
Cigondewah Rahayu
Gempolsari
Sukahaji
Babakan
Babakan Ciparay
Margahayu Utara
Margasuka
Cirangrang
Jamika
Suka Asih
Babakan Asih
Babakan Tarogong
Kopo
Situsaeur
Kebon Lega
Cibaduyut
Mekarwangi
Cibaduyut Wetan
Cibaduyut Kidul
Cibadak
Karanganyar
Nyengseret
Panjunan
Karasak
Pelindung Hewan
Balong Gede
Ciateul
Pungkur
Cigereleng
Ancol
Pasirluyu
Ciseureuh
Wates
Cikawao
Paledang
Burangrang
Malabar


KeyError: 'groups'

In [None]:
# create new dataframe of venues in Bandung
print(bandung_venues.shape)
bandung_venues.head()

In [None]:
bdgvenuenum = bandung_venues[['Venue Category', 'Venue']].groupby('Venue Category').count().reset_index()
print(bdgvenuenum.shape)
bdgvenuenum.sort_values ('Venue',ascending=False).head()

In [None]:
# number of venues in each neighborhood
bdgvenuegrouped = bandung_venues[['Neighborhood', 'Venue']].groupby('Neighborhood').count().reset_index()
bdgvenuegrouped.sort_values ('Venue',ascending=False).head()

In [None]:
# number of unique categories
print('There are {} uniques categories.'.format(len(bandung_venues ['Venue Category'].unique())))

In [None]:
bandung_onehot = pd.get_dummies(bandung_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bandung_onehot['Neighborhood'] = bandung_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bandung_onehot.columns[-1]] + list(bandung_onehot.columns[:-1])
bandung_onehot = bandung_onehot[fixed_columns]

print(bandung_onehot.shape)
bandung_onehot.head()

In [None]:
# frequency of occurence of each category in each neighbourhood
bandung_grouped = bandung_onehot.groupby('Neighborhood').mean().reset_index()
print (bandung_grouped.shape)
bandung_grouped.head()

In [None]:
#merge to get number of venue
bandung_grouped_ven = bdgvenuegrouped
bandung_grouped_ven = bandung_grouped_ven.join(bandung_grouped.set_index('Neighborhood'), on='Neighborhood').dropna()

# Bandung's Venue Data
print (bandung_grouped_ven.shape)
bandung_grouped_ven.sort_values('Venue', ascending=False).head()

## Analysis <a name="analysis"></a>

In [None]:
#delete area that have limited number of venues & area with too many cofee shops and cafe
feasible_area = bandung_grouped_ven.loc[(bandung_grouped_ven['Coffee Shop']== 0.00) & (bandung_grouped_ven['Venue']>= 5.00) & (bandung_grouped_ven['Café']== 0.00)]
#del feasible_area['Venue']
print (feasible_area.shape)
feasible_area.head()

In [None]:
feasible_area_novenue = feasible_area.drop('Venue',1)
feasible_area_novenue.head()

In [None]:
num_top_venues = 5

for hood in feasible_area_novenue['Neighborhood']:
    #print("----"+hood+"----")
    temp = feasible_area_novenue[feasible_area_novenue['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[2:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
feasible_area_sorted = pd.DataFrame(columns=columns)
feasible_area_sorted['Neighborhood'] = feasible_area_novenue['Neighborhood']

for ind in np.arange(feasible_area_novenue.shape[0]):
    feasible_area_sorted.iloc[ind, 1:] = return_most_common_venues(feasible_area.iloc[ind, :], num_top_venues)
print(feasible_area_sorted.shape)
feasible_area_sorted.head()

In [None]:
# clustering
kclusters = 4

feasible_area_clustering = feasible_area.drop('Neighborhood',1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(feasible_area_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

In [None]:
# add clustering labels
#del feasible_area_sorted['Cluster Labels']
feasible_area_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
feasible_area_sorted.head()
feasible_area_location = PostCode

# merge to get geocode
feasible_area_location = feasible_area_location.join(feasible_area_sorted.set_index('Neighborhood'), on='Neighborhood').dropna()
feasible_area_location['Cluster Labels'] = feasible_area_location['Cluster Labels'].astype('int32')

feasible_area_location = feasible_area_location.join(bdgvenuegrouped.set_index('Neighborhood'), on='Neighborhood').dropna()

print (feasible_area_location.shape)
feasible_area_location.sort_values('Cluster Labels',ascending=True)

In [50]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(feasible_area_location['Latitude'], feasible_area_location['Longitude'], feasible_area_location['Neighborhood'], feasible_area_location['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

#### End 
created by: Fahmi