## 1. Introduction/Business Problem


Travelers nowadays are constantly trying to find destinations that offer high quality food and drink for a fair price. Businesses also have to keep track of their neighborhood in order to understand their competitors and make strategic location decisions.

In this project, I'll try to check if its possible to cluster the neighborhoods in three of the most popular US destinations by taking into account the quality and cost of the venues located per neighborhood. The main objective of this analysis is to **determine the areas in the selected destinations that would be a good option for visitors looking for _cheap, high quality eating and drinking_.** A secondary objective is to offer an overview of the competition to the business owners in their same neighborhood.


## 2. Data

For this project, I'll use the Foursquare API and a dataset with all US zipcodes downloaded from the [simplemaps website](https://simplemaps.com/data/us-zips).

I'll split the selected destinations based on the zip codes provided in the zip code dataset and I'll obtain the data related to the trending venues per neighborhood, the rating obtained by each venue and the "price" category it belongs to from the Foursquare API.

### 2.1 Foursquare API

The Foursquare API is a good source for information on popular venues per location. 

In [22]:
import pandas as pd
import requests
import json
from pandas.io.json import json_normalize

To get the venues around an specific area, we use the "search" end point and provide the latitude and longitude of the specific location. Let's say we decide to get the venues around an specific zipcode, this is how it would work:

In [23]:
CLIENT_ID='AEO45ABRSDOJNO0RZOSZHLORAT5U52BBC1FNH0OEVZSX5UXP'
CLIENT_SECRET='VDWRYWJIWT3UH4JOA0SACQXRP322BBQUQWKL5TDZXHSBE5VP'
latitude=42.3577
longitude=-71.0651
VERSION='20180604'
radius=200
LIMIT='50'

In [24]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ce1e0b69fb6b775bb907f7f'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Beacon Hill',
  'headerFullLocation': 'Beacon Hill, Boston',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 9,
  'suggestedBounds': {'ne': {'lat': 42.3595000018, 'lng': -71.06266866662793},
   'sw': {'lat': 42.3558999982, 'lng': -71.06753133337207}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4a549cd9f964a5202ab31fe3',
       'name': 'Frog Pond',
       'location': {'address': '84 Beacon St',
        'crossStreet': 'at Boston Common',
        'lat': 42.35613375967781,
        'lng': -71.06567233800888,
        'labeledLatLngs': [{'label': 'display'

In [25]:
#get relevant information from JSON
info=results['response']['groups'][0]['items']
info[0]

{'reasons': {'count': 0,
  'items': [{'summary': 'This spot is popular',
    'type': 'general',
    'reasonName': 'globalInteractionReason'}]},
 'venue': {'id': '4a549cd9f964a5202ab31fe3',
  'name': 'Frog Pond',
  'location': {'address': '84 Beacon St',
   'crossStreet': 'at Boston Common',
   'lat': 42.35613375967781,
   'lng': -71.06567233800888,
   'labeledLatLngs': [{'label': 'display',
     'lat': 42.35613375967781,
     'lng': -71.06567233800888}],
   'distance': 180,
   'postalCode': '02108',
   'cc': 'US',
   'city': 'Boston',
   'state': 'MA',
   'country': 'United States',
   'formattedAddress': ['84 Beacon St (at Boston Common)',
    'Boston, MA 02108',
    'United States']},
  'categories': [{'id': '4bf58dd8d48988d161941735',
    'name': 'Lake',
    'pluralName': 'Lakes',
    'shortName': 'Lake',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/lake_',
     'suffix': '.png'},
    'primary': True}],
  'photos': {'count': 0, 'groups': []}},
 'refe

Turn info into a dataframe (using the code from one of the labs in the course)

In [26]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [27]:
df = json_normalize(info) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in df.columns if col.startswith('venue.location.')] + ['venue.id']
df_filtered = df.loc[:, filtered_columns]

# filter the category for each row
df_filtered['venue.categories'] = df_filtered.apply(get_category_type, axis=1)

# clean columns
df_filtered.columns = [col.split('.')[-1] for col in df_filtered.columns]

df_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Frog Pond,Lake,84 Beacon St,US,Boston,United States,at Boston Common,180,"[84 Beacon St (at Boston Common), Boston, MA 0...","[{'label': 'display', 'lat': 42.35613375967781...",42.356134,-71.065672,2108,MA,4a549cd9f964a5202ab31fe3
1,No. 9 Park,French Restaurant,9 Park St,US,Boston,United States,at Beacon St.,157,"[9 Park St (at Beacon St.), Boston, MA 02108, ...","[{'label': 'display', 'lat': 42.35754020221547...",42.35754,-71.063193,2108,MA,3fd66200f964a5203eec1ee3
2,Union Club of Boston,Restaurant,8 Park St,US,Boston,United States,,199,"[8 Park St, Boston, MA 02108, United States]","[{'label': 'display', 'lat': 42.35741149126363...",42.357411,-71.062702,2108,MA,40b28c80f964a52098f71ee3
3,Tadpole Playground,Playground,,US,Boston,United States,,177,"[Boston, MA 02108, United States]","[{'label': 'display', 'lat': 42.35612689028739...",42.356127,-71.064724,2108,MA,4bd0a43acaff9521813ecff0
4,Nichols House Museum,Museum,55 Mount Vernon St,US,Boston,United States,btwn Walnut & Joy,103,"[55 Mount Vernon St (btwn Walnut & Joy), Bosto...","[{'label': 'display', 'lat': 42.35834156862221...",42.358342,-71.066019,2108,MA,4dfa075f63652db0f51908dd


I can get the venues ratings using the venues ids and getting it from the API

In [28]:
venues_ids=df_filtered['id']
venues_ids

0    4a549cd9f964a5202ab31fe3
1    3fd66200f964a5203eec1ee3
2    40b28c80f964a52098f71ee3
3    4bd0a43acaff9521813ecff0
4    4dfa075f63652db0f51908dd
5    4babee0ff964a52063d63ae3
6    4d9f3ad18ef3a14380968e10
7    4bdb65f963c5c9b6dfc92768
8    4aa91a1af964a520fe5120e3
Name: id, dtype: object

In [29]:
df_filtered['rating']=0

In [40]:
venue_id='4a549cd9f964a5202ab31fe3'
url='https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
result = requests.get(url).json()
result

{'meta': {'code': 429,
  'errorType': 'quota_exceeded',
  'errorDetail': 'Quota exceeded',
  'requestId': '5ce1e2f94c1f6753b66f05f8'},
 'response': {}}

In [None]:
for venue_id, i in zip(venues_ids,range(0,len(venues_ids))):
    url='https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    df_filtered['rating'].loc[i]=result['response']['venue']['rating']
    try:
        print(result['response']['venue']['rating'])
    except:
        print('This venue has not been rated yet.')

In [38]:
df_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id,rating
0,Frog Pond,Lake,84 Beacon St,US,Boston,United States,at Boston Common,180,"[84 Beacon St (at Boston Common), Boston, MA 0...","[{'label': 'display', 'lat': 42.35613375967781...",42.356134,-71.065672,2108,MA,4a549cd9f964a5202ab31fe3,0
1,No. 9 Park,French Restaurant,9 Park St,US,Boston,United States,at Beacon St.,157,"[9 Park St (at Beacon St.), Boston, MA 02108, ...","[{'label': 'display', 'lat': 42.35754020221547...",42.35754,-71.063193,2108,MA,3fd66200f964a5203eec1ee3,1
2,Union Club of Boston,Restaurant,8 Park St,US,Boston,United States,,199,"[8 Park St, Boston, MA 02108, United States]","[{'label': 'display', 'lat': 42.35741149126363...",42.357411,-71.062702,2108,MA,40b28c80f964a52098f71ee3,2
3,Tadpole Playground,Playground,,US,Boston,United States,,177,"[Boston, MA 02108, United States]","[{'label': 'display', 'lat': 42.35612689028739...",42.356127,-71.064724,2108,MA,4bd0a43acaff9521813ecff0,3
4,Nichols House Museum,Museum,55 Mount Vernon St,US,Boston,United States,btwn Walnut & Joy,103,"[55 Mount Vernon St (btwn Walnut & Joy), Bosto...","[{'label': 'display', 'lat': 42.35834156862221...",42.358342,-71.066019,2108,MA,4dfa075f63652db0f51908dd,4
5,Beacon Hill Capital Market,Food & Drink Shop,32 Myrtle St,US,Boston,United States,,186,"[32 Myrtle St, Boston, MA 02114, United States]","[{'label': 'display', 'lat': 42.35932827314974...",42.359328,-71.065654,2114,MA,4babee0ff964a52063d63ae3,5
6,Robert Gould Shaw Memorial,Outdoor Sculpture,Beacon St,US,Boston,United States,at Park St,129,"[Beacon St (at Park St), Boston, MA 02108, Uni...","[{'label': 'display', 'lat': 42.35758917357506...",42.357589,-71.06353,2108,MA,4d9f3ad18ef3a14380968e10,6
7,Somerset Club,Speakeasy,48 Beacon St,US,Boston,United States,,158,"[48 Beacon St, Boston, MA 02108, United States]","[{'label': 'display', 'lat': 42.35681079283094...",42.356811,-71.066608,2108,MA,4bdb65f963c5c9b6dfc92768,7
8,Primo's Restaurant,Pizza Place,28 Myrtle St,US,Boston,United States,,185,"[28 Myrtle St, Boston, MA 02114, United States]","[{'label': 'display', 'lat': 42.35932373996034...",42.359324,-71.065583,2114,MA,4aa91a1af964a520fe5120e3,8


After that, we only need to repeat this for all neighborhoods and apply the clustering to find if it is possible to say which neighborhoods are a "cheap-high quality" option for the visitors.

### 2.2 US zip codes dataset

Here is how the US zip code dataset looks like.


In order to use the dataset, I downloaded it as a csv file into my computer from [this link](https://simplemaps.com/data/us-zips).

In [11]:
path='/Users/nicolecapriles/Desktop/uszips.csv'
zipcodes=pd.read_csv(path) 

The dataset contains the zip codes for all states in the US, their longitude and latitude and some other information. Check the headers below

In [12]:
zipcodes.head()

Unnamed: 0,zip,lat,lng,city,state_id,state_name,zcta,parent_zcta,population,density,county_fips,county_name,all_county_weights,imprecise,military,timezone
0,601,18.18,-66.7522,Adjuntas,PR,Puerto Rico,True,,18570,111.4,72001,Adjuntas,"{'72001':99.43,'72141':0.57}",False,False,America/Puerto_Rico
1,602,18.3607,-67.1752,Aguada,PR,Puerto Rico,True,,41520,523.7,72003,Aguada,{'72003':100},False,False,America/Puerto_Rico
2,603,18.4544,-67.122,Aguadilla,PR,Puerto Rico,True,,54689,667.9,72005,Aguadilla,{'72005':100},False,False,America/Puerto_Rico
3,606,18.1672,-66.9383,Maricao,PR,Puerto Rico,True,,6615,60.4,72093,Maricao,"{'72093':94.88,'72121':1.35,'72153':3.78}",False,False,America/Puerto_Rico
4,610,18.2903,-67.1224,Anasco,PR,Puerto Rico,True,,29016,311.9,72011,Añasco,"{'72003':0.55,'72011':99.45}",False,False,America/Puerto_Rico


I'll only need the zip code, latitude, longitude city and state. So we can drop all other columns in the dataframe.

In [13]:
columns_drop=['zcta','parent_zcta','population','density','county_fips','county_name','all_county_weights','imprecise','military','timezone']
zipcodes.drop(columns_drop,axis=1,inplace=True)

In [14]:
zipcodes.head()

Unnamed: 0,zip,lat,lng,city,state_id,state_name
0,601,18.18,-66.7522,Adjuntas,PR,Puerto Rico
1,602,18.3607,-67.1752,Aguada,PR,Puerto Rico
2,603,18.4544,-67.122,Aguadilla,PR,Puerto Rico
3,606,18.1672,-66.9383,Maricao,PR,Puerto Rico
4,610,18.2903,-67.1224,Anasco,PR,Puerto Rico


Since I'll run the project with three specific destinations (Boston, DC and New York City), we can split this dataset to obtain the zipcodes per city.

In [15]:
Boston_zipcodes=zipcodes[(zipcodes['state_id']=='MA')&(zipcodes['city']=='Boston')]
DC_zipcodes=zipcodes[(zipcodes['state_id']=='DC')&(zipcodes['city']=='Washington')]
NYC_zipcodes=zipcodes[(zipcodes['state_id']=='NY')&(zipcodes['city']=='New York')]

In [16]:
Boston_zipcodes

Unnamed: 0,zip,lat,lng,city,state_id,state_name
462,2108,42.3577,-71.0651,Boston,MA,Massachusetts
463,2109,42.3648,-71.053,Boston,MA,Massachusetts
464,2110,42.3583,-71.0518,Boston,MA,Massachusetts
465,2111,42.3501,-71.0591,Boston,MA,Massachusetts
466,2113,42.3653,-71.0553,Boston,MA,Massachusetts
467,2114,42.3632,-71.0673,Boston,MA,Massachusetts
468,2115,42.341,-71.0946,Boston,MA,Massachusetts
469,2116,42.3505,-71.0756,Boston,MA,Massachusetts
470,2118,42.3382,-71.0708,Boston,MA,Massachusetts
501,2163,42.3663,-71.1209,Boston,MA,Massachusetts


These three datasets will be the group of zipcodes I'll use to obtain the venues from the Foursquare API.