Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.


My capstone project will look at the feasibility of opening a sidewalk cafe in Washington D.C.  The audience for this project will be stakeholders in the opening of a new cafe, looking to maximize the probability of success. This project aims to identify the best café location based on the density/clustering of existing sidewalk cafés and analysis of popularity/foot traffic.

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

I will use two primary data sets in this project
-	The Sidewalk Café location data set from Open Data DC contains locations and attributes of sidewalk cafes.  The location data will be used to find sidewalk café density clusters and spatial distribution.  This data set is somewhat dated, with the last update being 6/25/2019 but I believe it will still work for the purposes of this project.
-	The Foursquare locations data will be used to get supplemental geospatial information on the sidewalk café locations.  Additionally, the Foursquare user count and tips data will be used be used to determine popularity/foot traffic at each location.
For the sake of simplicity, the following methodology will be used:
-	Scrape Open Data DC for sidewalk café location in Washington D.C.
-	Query Foursquare API for supplemental geospatial information on sidewalk café locations.
-	Find Foursquare user and tip data around each café cluster.


In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json 
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes 
import folium
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1

In [95]:
CLIENT_ID=''#Client ID removed for security
CLIENT_SECRET='' #Client Secret removed for security
VERSION='20180605' 
LIMIT=100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [62]:
# Get longitude and latitude for Toronto
address='Washington D.C., Washington D.C'
geolocator = Nominatim(user_agent="foursquare_agent")
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geograpical coordinate of Washington D.C. is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Washington D.C. is 38.8949855, -77.0365708.


In [70]:
search_query = 'Cafe'
radius = 5000
print(search_query + ' .... OK!')

Cafe .... OK!


In [71]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=2AVVRRGUGU0NPIHPCZZ0ZRSNCA2J4YNAKPEAPJUYXNZ51CJ2&client_secret=T3FXGQEAGUZXND2322MEQ54XBNZJPNVOJXMQFPMJTU2XS1BN&ll=38.8949855,-77.0365708&v=20180605&query=Cafe&radius=5000&limit=100'

In [72]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb02847ad1ab4001b80f27a'},
 'response': {'venues': [{'id': '4a9b1b0bf964a5206d3420e3',
    'name': 'Café du Parc',
    'location': {'address': '1401 Pennsylvania Ave NW',
     'crossStreet': 'at 14th St',
     'lat': 38.896497,
     'lng': -77.032618,
     'labeledLatLngs': [{'label': 'display',
       'lat': 38.896497,
       'lng': -77.032618}],
     'distance': 381,
     'postalCode': '20004',
     'cc': 'US',
     'city': 'Washington',
     'state': 'D.C.',
     'country': 'United States',
     'formattedAddress': ['1401 Pennsylvania Ave NW (at 14th St)',
      'Washington, D.C. 20004',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d10c941735',
      'name': 'French Restaurant',
      'pluralName': 'French Restaurants',
      'shortName': 'French',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/french_',
       'suffix': '.png'},
      'primary': True}],
    'venuePage': {'id': '86322170'},
    'referr

In [73]:
venues=results['response']['venues']
dataframe=json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d10c941735', 'name': 'F...",,,,,,,False,4a9b1b0bf964a5206d3420e3,1401 Pennsylvania Ave NW,US,Washington,United States,at 14th St,381,"[1401 Pennsylvania Ave NW (at 14th St), Washin...","[{'label': 'display', 'lat': 38.896497, 'lng':...",38.896497,-77.032618,,20004.0,D.C.,Café du Parc,v-1588603038,86322170.0
1,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",,,,,,,False,4eeba201d3e3d34eb11432b1,1331 Pennsylvania Ave NW,US,Washington,United States,,450,"[1331 Pennsylvania Ave NW, Washington, D.C. 20...","[{'label': 'display', 'lat': 38.89599871486189...",38.895999,-77.031535,,20004.0,D.C.,Flagship Cafe,v-1588603038,
2,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",,,,,,,False,4cd2efd240d4594138ae9a41,529 14th St NW,US,Washington,United States,,477,"[529 14th St NW, Washington, D.C. 20045, Unite...","[{'label': 'display', 'lat': 38.89693256878712...",38.896933,-77.031654,,20045.0,D.C.,Soho Cafe & Market,v-1588603038,
3,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",,,,,,,False,4fc26ad7e4b02db73e8950cb,100 Raoul Wallenberg Place SW,US,,United States,,905,"[100 Raoul Wallenberg Place SW, Washington, D....","[{'label': 'display', 'lat': 38.88713574627168...",38.887136,-77.033826,Southwest Washington,,"Washington, D.C.",Holocaust Museum Cafe,v-1588603038,
4,"[{'id': '4bf58dd8d48988d179941735', 'name': 'B...",,,,,,,False,4aa92f87f964a520575220e3,2000 K St NW,US,Washington,United States,at 20th St NW,1117,"[2000 K St NW (at 20th St NW), Washington, D.C...","[{'label': 'display', 'lat': 38.90230187352699...",38.902302,-77.045396,,20006.0,D.C.,K Street Cafe & Bagel,v-1588603038,


In [74]:
filtered_columns=['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered=dataframe.loc[:, filtered_columns]
def get_category_type(row):
    try:
        categories_list=row['categories']
    except:
        categories_list=row['venue.categories']
        
    if len(categories_list)==0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered['categories']=dataframe_filtered.apply(get_category_type, axis=1)
dataframe_filtered.columns=[column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Café du Parc,French Restaurant,1401 Pennsylvania Ave NW,US,Washington,United States,at 14th St,381,"[1401 Pennsylvania Ave NW (at 14th St), Washin...","[{'label': 'display', 'lat': 38.896497, 'lng':...",38.896497,-77.032618,,20004.0,D.C.,4a9b1b0bf964a5206d3420e3
1,Flagship Cafe,Café,1331 Pennsylvania Ave NW,US,Washington,United States,,450,"[1331 Pennsylvania Ave NW, Washington, D.C. 20...","[{'label': 'display', 'lat': 38.89599871486189...",38.895999,-77.031535,,20004.0,D.C.,4eeba201d3e3d34eb11432b1
2,Soho Cafe & Market,Café,529 14th St NW,US,Washington,United States,,477,"[529 14th St NW, Washington, D.C. 20045, Unite...","[{'label': 'display', 'lat': 38.89693256878712...",38.896933,-77.031654,,20045.0,D.C.,4cd2efd240d4594138ae9a41
3,Holocaust Museum Cafe,Café,100 Raoul Wallenberg Place SW,US,,United States,,905,"[100 Raoul Wallenberg Place SW, Washington, D....","[{'label': 'display', 'lat': 38.88713574627168...",38.887136,-77.033826,Southwest Washington,,"Washington, D.C.",4fc26ad7e4b02db73e8950cb
4,K Street Cafe & Bagel,Bagel Shop,2000 K St NW,US,Washington,United States,at 20th St NW,1117,"[2000 K St NW (at 20th St NW), Washington, D.C...","[{'label': 'display', 'lat': 38.90230187352699...",38.902302,-77.045396,,20006.0,D.C.,4aa92f87f964a520575220e3
5,Esprinto Cafe,Café,1331 Pennsylvania Ave NW,US,Washington,United States,,613,"[1331 Pennsylvania Ave NW, Washington, D.C. 20...","[{'label': 'display', 'lat': 38.89734056306548...",38.897341,-77.030168,,20004.0,D.C.,4cdc15364006a1434a72dcb2
6,Juan Valdez Cafe,Coffee Shop,1889 F St NW,US,Washington,United States,at 19th St NW,647,"[1889 F St NW (at 19th St NW), Washington, D.C...","[{'label': 'display', 'lat': 38.89745645130179...",38.897456,-77.043341,,20006.0,D.C.,44d31e22f964a5203d361fe3
7,An Uncommon Cafe,Café,1800 G St NW,US,Washington,United States,btwn 18th & 19th St NW,660,"[1800 G St NW (btwn 18th & 19th St NW), Washin...","[{'label': 'display', 'lat': 38.89841208085561...",38.898412,-77.0428,,20006.0,D.C.,4a9ffb83f964a520d63d20e3
8,Northstar Cafe,American Restaurant,,US,Washington,United States,,543,"[Washington, D.C. 20230, United States]","[{'label': 'display', 'lat': 38.891205, 'lng':...",38.891205,-77.03261,,20230.0,D.C.,57f91c81498e3ba5808c29c1
9,Gallery Cafe,Sandwich Place,1401 H St NW,US,Washington,United States,at 14th St NW,735,"[1401 H St NW (at 14th St NW), Washington, D.C...","[{'label': 'display', 'lat': 38.90063965418262...",38.90064,-77.032182,,20005.0,D.C.,4a7b0ecdf964a520e6e91fe3


In [75]:
dataframe_filtered.name
venues_map=folium.Map(location=[latitude, longitude], zoom_start=13)
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill=True,
    fill_color='red',
    fill_opacity=0.6
).add_to(venues_map)
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill=True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)
venues_map

In [92]:

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_fd91e6dceec34f3693494d977ba5026c = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='WYJB67pN1uUDvD_HOJPU_2UEAAOrrpH6EjBPIl1sOL8J',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_fd91e6dceec34f3693494d977ba5026c.get_object(Bucket='datasciencecapstonebattleofthenei-donotdelete-pr-h7lfs82oavtopg',Key='Sidewalk_Cafe.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.head()


Unnamed: 0,OBJECTID_1,OBJECTID,SQUARE,SUFFIX,LOT,IDNUM,OWNER,ADDRESS,AREA_,X,Y,ADDRID
0,1,1,4.0,N,2029.0,S693,VITTORIO TESTE,2600 PENNSYLVANIA AVE NW,787.5,395220.84,137348.04,274801
1,2,2,14.0,,28.0,S271,TRIANGLE COMM ASSN,2519 PENNSYLVANIA AVE NW,198.0,395328.86,137364.07,273879
2,3,3,,,,,,2507 PENNSYLVANIA AVE NW,,395343.02,137352.33,293225
3,4,4,,,,,,2513 PENNSYLVANIA Ave NW,,395333.74,137361.44,273878
4,5,5,15.0,,18.0,S269,L STREET,2524 L STREET NW,200.0,395287.92,137300.69,273883
