**Introduction/Business Problem **

Determine in a quantifiable manner where is the best location to open a brewery in the city of Seattle.  As an aspiring brewer I believe the best place to open a brewery is **A)** a location that is part of a tight cluster of oth[](http://)er breweries **B)** of the clusters - the best cluster is the one that has the most food/dining options in close proximity.  The goal is to leverage a data driven approach for identifying the appropriate geographic clusters, ranking them based on proximity/magnitude of food/dining options, and then presenting the output in an easily consumable map.

**Data**

The data for this exercise is going to be captured leveraging the Foursquare API.  Specifically we will be leveraging the **search** endpoint to first find all breweries within the Seattle area.  After this we will then be clustering these breweries based on geography.  Finally we will be using the centerpoints of these clusters to then once again leverage the **search** endpoint to identify the cluster with the greatest preponderance of food/dining options in close proximity.

**Methodology** 

Section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import DBSCAN
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
#set parameters for api call
client_id = 'THIIRDS3XMAMXQHJMABQD0FY3SGAU2VOKO5L5GHZUIMJXP4L' # your Foursquare ID
client_secret = '2SQU0R45T05O0ESS5JCBYIRZCRIPODIXBEG4VOSMM3KZ0LSF' # your Foursquare Secret
version = '20180605' # Foursquare API version

limit = 50
intent = 'browse'
ne1 = '47.687772, -122.251590'
sw1 = '47.6, -122.430775'

ne2 = '47.599999, -122.251590'
sw2 = '47.531678, -122.430775'

brewery = '50327c8591d4c4b30a586d5d'
food = '4d4b7105d754a06374d81259'

In [3]:
address = 'Seattle, WA'

geolocator = Nominatim(user_agent="sea_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Seattle are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Seattle are 47.6038321, -122.3300624.


In [4]:
url1 = 'https://api.foursquare.com/v2/venues/search?client_id=%s&client_secret=%s&v=%s&intent=%s&ne=%s&sw=%s&categoryId=%s&limit=%s' % (client_id, client_secret, version, intent, ne1, sw1, brewery, limit)
url2 = 'https://api.foursquare.com/v2/venues/search?client_id=%s&client_secret=%s&v=%s&intent=%s&ne=%s&sw=%s&categoryId=%s&limit=%s' % (client_id, client_secret, version, intent, ne2, sw2, brewery, limit)

In [5]:
results1 = requests.get(url1).json()
results2 = requests.get(url2).json()

In [6]:
results1_df = json_normalize(results1['response']['venues'])
results1_df = results1_df[['name', 'location.lat', 'location.lng']]

results2_df = json_normalize(results2['response']['venues'])
results2_df = results2_df[['name', 'location.lat', 'location.lng']]

In [7]:
df = pd.concat([results1_df, results2_df])
df=df.reset_index(drop=True)
df

Unnamed: 0,name,location.lat,location.lng
0,Fremont Brewing Company,47.648974,-122.344491
1,Old Stove Brewing Co - Marketfront,47.609591,-122.343041
2,Optimism Brewing Company,47.612816,-122.320571
3,Redhook Brewlab,47.614222,-122.322761
4,Pike Brewing Company,47.608161,-122.339923
5,Stoup Brewing,47.666551,-122.371277
6,Reuben's Brews,47.665398,-122.373270
7,Cloudburst Brewing,47.611565,-122.345212
8,Peddler Brewing Company,47.663757,-122.377057
9,Holy Mountain Brewing Company,47.630795,-122.374597


In [8]:
# create map of Seattle using latitude and longitude values
map_seattle = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, name in zip(df['location.lat'], df['location.lng'], df['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_seattle)  
    
map_seattle

In [9]:
X = df[['location.lat', 'location.lng']]

In [10]:
epsilon = 0.01
minimumSamples = 4
db = DBSCAN(eps=epsilon, min_samples=minimumSamples).fit(X)
labels = db.labels_
labels

array([ 0, -1,  1,  1, -1,  2,  2, -1,  2, -1,  2,  2,  1, -1,  2, -1,  2,
        2,  2,  2, -1, -1,  2,  2, -1, -1,  2,  2,  2,  2, -1,  0, -1,  1,
        0,  2,  0,  2,  2,  3,  3,  3, -1, -1, -1,  5, -1, -1,  4,  4,  4,
        5, -1,  4, -1,  5, -1,  5, -1,  4,  4,  4, -1,  5,  3])

In [11]:
df.insert(0, 'Cluster Labels', labels)

In [12]:
df

Unnamed: 0,Cluster Labels,name,location.lat,location.lng
0,0,Fremont Brewing Company,47.648974,-122.344491
1,-1,Old Stove Brewing Co - Marketfront,47.609591,-122.343041
2,1,Optimism Brewing Company,47.612816,-122.320571
3,1,Redhook Brewlab,47.614222,-122.322761
4,-1,Pike Brewing Company,47.608161,-122.339923
5,2,Stoup Brewing,47.666551,-122.371277
6,2,Reuben's Brews,47.665398,-122.373270
7,-1,Cloudburst Brewing,47.611565,-122.345212
8,2,Peddler Brewing Company,47.663757,-122.377057
9,-1,Holy Mountain Brewing Company,47.630795,-122.374597


In [13]:
clusters = df['Cluster Labels'].unique()
clusters

array([ 0, -1,  1,  2,  3,  5,  4])

In [14]:
# create map of Seattle using latitude and longitude values
cluster_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
#x = np.arange(clusters)
#ys = [i + x + (i*x)**2 for i in range(clusters)]
#colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
#rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
markers_colors = []
for lat, lng, name, cluster in zip(df['location.lat'], df['location.lng'], df['name'], df['Cluster Labels']):
    label = folium.Popup(str(name) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(cluster_map)  
    
cluster_map

**Results**

Section where you discuss the results.

**Discussion**

Section where you discuss any observations you noted and any recommendations you can make based on the results.

**Conclusion**

Section where you conclude the report.