# Capstone Project - The Battle of Neighborhoods (Week 2)

## Goals:

1. Introduction where you discuss the business problem and who would be interested in this project.
2. Data where you describe the data that will be used to solve the problem and the source of the data.
3. Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
4. Results section where you discuss the results.
5. Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
6. Conclusion section where you conclude the report.

---

## Intro

I'd like to open a restaurant in Texas, specifically a Vietnamese one.  My candidate cities are Dallas, Austin, and Houston.  I'd like to open in an area of town that already has a considerable Asian/Vietnamese restaurant presence.  Walkability and the likability of nearby businesses is important.  

First, I'll narrow down which city to build in.  Then, select a more specific neighborhood within that city.  This is important for anyone who has the flexibility to open in any nearby city and may eventually want to franchise.

---

## Data

#### Examples

- FS API Search: https://developer.foursquare.com/docs/api-reference/venues/search/
- FS API Likes: https://developer.foursquare.com/docs/api-reference/venues/likes/
- Cities DB: https://simplemaps.com/data/us-cities

#### Proposed Steps

- Collect location data for the main cities I selected from Google/Wiki.
- List neighborhood candidates for querying (pulled from external data).
- Search for Asian/Viet restaurants in each city using Foursquare's API
- Visualize on map
- Identify candidate neighborhood/areas where Asian/Viet restaurants are frequent
- Narrow down restaurants per city and re-visualize on map
- Add category data from Foursquare's API
- Add likes data from Foursquare's API
- Group by neighborhoods and run a k-means clutering alg. to score the neighborhoods
- Select where to build the restaurant

---

## Methodology

In [118]:
import pandas as pd
import requests
import bs4

In [119]:
# Gather coordinates of candidate cities

filename = 'uscities.csv'
csv_df = pd.read_csv(filename)

Cities = ['Dallas', 'Houston', 'Austin']
State = 'TX'

df = csv_df.loc[(csv_df['city'].isin(Cities)) & (csv_df['state_id'] == State)]
cities_df = df[['city','state_id','lat', 'lng']]
cities_df = cities_df.rename(columns={'city':'City','state_id':'State','lat':'Latitude', 'lng':'Longitude'})
cities_df

Unnamed: 0,City,State,Latitude,Longitude
4,Dallas,TX,32.7936,-96.7662
6,Houston,TX,29.7863,-95.3889
31,Austin,TX,30.3004,-97.7522


In [120]:
# Map cities

import folium

# Texas coordinates
tx_coord = [31.7532, -96.3832]

# create map of Toronto using latitude and longitude values
map_tx = folium.Map(location=tx_coord, zoom_start=7)

# add markers to map
for lat, lng, city in zip(cities_df['Latitude'], cities_df['Longitude'], cities_df['City']):
    label = '{}'.format(city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tx)  
    
map_tx

In [182]:
# Foursquare API

CLIENT_ID = 'ZBY35LYFENJHYRKLT041HRBCRRUEIT1DGBSJJ1ULBK3EEVJA' # your Foursquare ID
CLIENT_SECRET = '02F30RSEVP1ULNOSAMNK2EOPIYUG41ZZTZI0WXLKY21YCM2G' # your Foursquare Secret
VERSION = '20201112'
LIMIT = 30
print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: ZBY35LYFENJHYRKLT041HRBCRRUEIT1DGBSJJ1ULBK3EEVJA
CLIENT_SECRET:02F30RSEVP1ULNOSAMNK2EOPIYUG41ZZTZI0WXLKY21YCM2G


In [123]:
# Find coordinates of each city

dfw_coord = cities_df.loc[cities_df['City'] == 'Dallas'][['Latitude', 'Longitude']]
dfw_lat, dfw_lng = dfw_coord.Latitude.values[0], dfw_coord.Longitude.values[0]

htx_coord = cities_df.loc[cities_df['City'] == 'Houston'][['Latitude', 'Longitude']]
htx_lat, htx_lng = htx_coord.Latitude.values[0], htx_coord.Longitude.values[0]

atx_coord = cities_df.loc[cities_df['City'] == 'Austin'][['Latitude', 'Longitude']]
atx_lat, atx_lng = atx_coord.Latitude.values[0], atx_coord.Longitude.values[0]

print(f'Dallas coordinates are: {dfw_lat, dfw_lng}')
print(f'Houston coordinates are: {htx_lat, htx_lng}')
print(f'Austin coordinates are: {atx_lat, atx_lng}')

Dallas coordinates are: (32.7936, -96.7662)
Houston coordinates are: (29.7863, -95.3889)
Austin coordinates are: (30.3004, -97.7522)


In [270]:
# Create Foursquare /search API request

base_url = f'https://api.foursquare.com/v2/venues/search?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}'

LIMIT = 100
Cities = ['Dallas', 'Houston', 'Austin']
State = 'TX'
query_term = "Vietnamese"

dfw_url = f'{base_url}&ll={dfw_lat},{dfw_lng}&limit={LIMIT}&near={Cities[0]}, {State}&query={query_term}'
htx_url = f'{base_url}&ll={htx_lat},{htx_lng}&limit={LIMIT}&near={Cities[1]}, {State}&query={query_term}'
atx_url = f'{base_url}&ll={atx_lat},{atx_lng}&limit={LIMIT}&near={Cities[2]}, {State}&query={query_term}'

dfw_url

'https://api.foursquare.com/v2/venues/search?&client_id=ZBY35LYFENJHYRKLT041HRBCRRUEIT1DGBSJJ1ULBK3EEVJA&client_secret=02F30RSEVP1ULNOSAMNK2EOPIYUG41ZZTZI0WXLKY21YCM2G&v=20201112&ll=32.7936,-96.7662&limit=100&near=Dallas, TX&query=Vietnamese'

In [271]:
dfw_results = requests.get(dfw_url).json()
htx_results = requests.get(htx_url).json()
atx_results = requests.get(atx_url).json()

atx_results['response']

{'venues': [{'id': '4a9ebac6f964a520f63a20e3',
   'name': '888 Vietnamese Restaurant',
   'location': {'address': '2400 E Oltorf St',
    'lat': 30.22962680466954,
    'lng': -97.73011688665741,
    'labeledLatLngs': [{'label': 'display',
      'lat': 30.22962680466954,
      'lng': -97.73011688665741},
     {'label': 'entrance', 'lat': 30.229431, 'lng': -97.72963}],
    'distance': 8159,
    'postalCode': '78741',
    'cc': 'US',
    'city': 'Austin',
    'state': 'TX',
    'country': 'United States',
    'formattedAddress': ['2400 E Oltorf St',
     'Austin, TX 78741',
     'United States']},
   'categories': [{'id': '4bf58dd8d48988d14a941735',
     'name': 'Vietnamese Restaurant',
     'pluralName': 'Vietnamese Restaurants',
     'shortName': 'Vietnamese',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vietnamese_',
      'suffix': '.png'},
     'primary': True}],
   'referralId': 'v-1605648394',
   'hasPerk': False},
  {'id': '4e9cdd740aaf015a05c3dc1b',
   'na

In [272]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [273]:
# Load JSON response into DF and clean

filtered_cols = ['name', 'location.lat', 'location.lng', 'categories', 'id']

atx_venues_df = json_normalize(atx_results['response']['venues'])
atx_venues_df = atx_venues_df.loc[:, filtered_cols]
atx_venues_df['categories'] = atx_venues_df.apply(get_category_type, axis=1)
atx_venues_df = atx_venues_df.rename(columns={'name':'Name','location.lat':'Latitude','location.lng':'Longitude', 'categories':'Categories'})

atx_venues_df

Unnamed: 0,Name,Latitude,Longitude,Categories,id
0,888 Vietnamese Restaurant,30.229627,-97.730117,Vietnamese Restaurant,4a9ebac6f964a520f63a20e3
1,Duy Vietnamese Restaurant,30.378247,-97.687672,Vietnamese Restaurant,4e9cdd740aaf015a05c3dc1b
2,T&L Vietnamese Cuisine,30.253432,-97.713051,Vietnamese Restaurant,56f02681498eab944dae8438
3,Pho Dinh Vietnamese Cuisine,30.364037,-97.694928,Vietnamese Restaurant,5d60379ce4337c00085857ba
4,PhoNatic Vietnamese Cuisine,30.355697,-97.733815,Vietnamese Restaurant,4e95f5244901d620b77d4385
5,Hai ku Vietnamese Bistro,30.280947,-97.807159,Vietnamese Restaurant,54f0b178498e6bc708a18654
6,Pho Thaison Vietnamese Restaurant,30.165849,-97.792386,Vietnamese Restaurant,4b521e77f964a5206a6927e3
7,Asia Chinese Restaurant,30.215357,-97.744754,Chinese Restaurant,4acf6ea9f964a5209fd320e3
8,Hao Hao Vietnamese & Chinese Cuisine,30.510722,-97.694627,Vietnamese Restaurant,4b8318f8f964a52081f730e3
9,Austin Vietnamese Alliance Church,30.383682,-97.700791,Church,5676e8a2498e2f30abd07c98


In [274]:
print('{} venues were returned by Foursquare for Austin.'.format(atx_venues_df.shape[0]))

16 venues were returned by Foursquare for Austin.


In [275]:
# Add empty column placeholder for 'Likes' count
atx_venues_df['Likes'] = ""
atx_venues_df.head()

Unnamed: 0,Name,Latitude,Longitude,Categories,id,Likes
0,888 Vietnamese Restaurant,30.229627,-97.730117,Vietnamese Restaurant,4a9ebac6f964a520f63a20e3,
1,Duy Vietnamese Restaurant,30.378247,-97.687672,Vietnamese Restaurant,4e9cdd740aaf015a05c3dc1b,
2,T&L Vietnamese Cuisine,30.253432,-97.713051,Vietnamese Restaurant,56f02681498eab944dae8438,
3,Pho Dinh Vietnamese Cuisine,30.364037,-97.694928,Vietnamese Restaurant,5d60379ce4337c00085857ba,
4,PhoNatic Vietnamese Cuisine,30.355697,-97.733815,Vietnamese Restaurant,4e95f5244901d620b77d4385,


In [276]:
# Collect likes for businesses and add back into dataframe as new column

for index, row in atx_venues_df.iterrows():
    id = row['id']
    url = f'https://api.foursquare.com/v2/venues/{id}/likes?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}'
    results = requests.get(url).json()
    atx_venues_df.loc[index, 'Likes'] = results['response']['likes']['count']

atx_venues_df

Unnamed: 0,Name,Latitude,Longitude,Categories,id,Likes
0,888 Vietnamese Restaurant,30.229627,-97.730117,Vietnamese Restaurant,4a9ebac6f964a520f63a20e3,382
1,Duy Vietnamese Restaurant,30.378247,-97.687672,Vietnamese Restaurant,4e9cdd740aaf015a05c3dc1b,14
2,T&L Vietnamese Cuisine,30.253432,-97.713051,Vietnamese Restaurant,56f02681498eab944dae8438,0
3,Pho Dinh Vietnamese Cuisine,30.364037,-97.694928,Vietnamese Restaurant,5d60379ce4337c00085857ba,0
4,PhoNatic Vietnamese Cuisine,30.355697,-97.733815,Vietnamese Restaurant,4e95f5244901d620b77d4385,64
5,Hai ku Vietnamese Bistro,30.280947,-97.807159,Vietnamese Restaurant,54f0b178498e6bc708a18654,0
6,Pho Thaison Vietnamese Restaurant,30.165849,-97.792386,Vietnamese Restaurant,4b521e77f964a5206a6927e3,26
7,Asia Chinese Restaurant,30.215357,-97.744754,Chinese Restaurant,4acf6ea9f964a5209fd320e3,9
8,Hao Hao Vietnamese & Chinese Cuisine,30.510722,-97.694627,Vietnamese Restaurant,4b8318f8f964a52081f730e3,24
9,Austin Vietnamese Alliance Church,30.383682,-97.700791,Church,5676e8a2498e2f30abd07c98,0


In [277]:
# Clean DF for clustering
# Remove non numerical values
# Alternatively: One-Hot encode them

atx_clustering = atx_venues_df.drop(['Name', 'Categories', 'id'], 1)
atx_clustering

Unnamed: 0,Latitude,Longitude,Likes
0,30.229627,-97.730117,382
1,30.378247,-97.687672,14
2,30.253432,-97.713051,0
3,30.364037,-97.694928,0
4,30.355697,-97.733815,64
5,30.280947,-97.807159,0
6,30.165849,-97.792386,26
7,30.215357,-97.744754,9
8,30.510722,-97.694627,24
9,30.383682,-97.700791,0


In [278]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(atx_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2], dtype=int32)

In [279]:
# Add Cluster Label back into DF

atx_venues_df.insert(0, 'Cluster Label', kmeans.labels_)
atx_venues_df.head()

Unnamed: 0,Cluster Label,Name,Latitude,Longitude,Categories,id,Likes
0,1,888 Vietnamese Restaurant,30.229627,-97.730117,Vietnamese Restaurant,4a9ebac6f964a520f63a20e3,382
1,2,Duy Vietnamese Restaurant,30.378247,-97.687672,Vietnamese Restaurant,4e9cdd740aaf015a05c3dc1b,14
2,2,T&L Vietnamese Cuisine,30.253432,-97.713051,Vietnamese Restaurant,56f02681498eab944dae8438,0
3,2,Pho Dinh Vietnamese Cuisine,30.364037,-97.694928,Vietnamese Restaurant,5d60379ce4337c00085857ba,0
4,2,PhoNatic Vietnamese Cuisine,30.355697,-97.733815,Vietnamese Restaurant,4e95f5244901d620b77d4385,64


In [280]:
# Map Austin businesses

# Matplotlib and associated plotting modules
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

# Texas coordinates
atx_coord = [atx_lat, atx_lng]

# create map of Austin using latitude and longitude values
map_atx = folium.Map(location=atx_coord, zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
for lat, lng, name, likes, cluster in zip(atx_venues_df['Latitude'], atx_venues_df['Longitude'], atx_venues_df['Name'], atx_venues_df['Likes'], atx_venues_df['Cluster Label']):
    label = '{}: {}'.format(name, likes)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_atx)  
    
map_atx

# ---

## Goals:

3. Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
4. Results section where you discuss the results.
5. Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
6. Conclusion section where you conclude the report.

---

## Results

---

## Discussion

---

## Conclusion