**Introduction/Business Problem **

Determine in a quantifiable manner where is the best location to open a brewery in the city of Seattle.  As an aspiring brewer I believe the best place to open a brewery is **A)** a location that is part of a tight cluster of oth[](http://)er breweries **B)** of the clusters - the best cluster is the one that has the most food/dining options in close proximity.  The goal is to leverage a data driven approach for identifying the appropriate geographic clusters, ranking them based on proximity/magnitude of food/dining options, and then presenting the output in an easily consumable map.

**Data**

The data for this exercise is going to be captured leveraging the Foursquare API.  Specifically we will be leveraging the **search** endpoint to first find all breweries within the Seattle area.  After this we will then be clustering these breweries based on geography.  Finally we will be using the centerpoints of these clusters to then once again leverage the **search** endpoint to identify the cluster with the greatest preponderance of food/dining options in close proximity.

**Methodology** 

Section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import DBSCAN
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
#set parameters for api call
client_id = 'THIIRDS3XMAMXQHJMABQD0FY3SGAU2VOKO5L5GHZUIMJXP4L' # your Foursquare ID
client_secret = '2SQU0R45T05O0ESS5JCBYIRZCRIPODIXBEG4VOSMM3KZ0LSF' # your Foursquare Secret
version = '20180605' # Foursquare API version

limit = '50'
intent = 'browse'
ne1 = '47.687772, -122.251590'
sw1 = '47.6, -122.430775'

ne2 = '47.599999, -122.251590'
sw2 = '47.531678, -122.430775'

brewery = '50327c8591d4c4b30a586d5d'
food = '4d4b7105d754a06374d81259'

In [3]:
address = 'Seattle, WA'

geolocator = Nominatim(user_agent="sea_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Seattle are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Seattle are 47.6038321, -122.3300624.


In [4]:
url1 = 'https://api.foursquare.com/v2/venues/search?client_id=%s&client_secret=%s&v=%s&intent=%s&ne=%s&sw=%s&categoryId=%s&limit=%s' % (client_id, client_secret, version, intent, ne1, sw1, brewery, limit)
url2 = 'https://api.foursquare.com/v2/venues/search?client_id=%s&client_secret=%s&v=%s&intent=%s&ne=%s&sw=%s&categoryId=%s&limit=%s' % (client_id, client_secret, version, intent, ne2, sw2, brewery, limit)

In [5]:
results1 = requests.get(url1).json()
results2 = requests.get(url2).json()

In [6]:
results1_df = json_normalize(results1['response']['venues'])
results1_df = results1_df[['name', 'location.lat', 'location.lng']]

results2_df = json_normalize(results2['response']['venues'])
results2_df = results2_df[['name', 'location.lat', 'location.lng']]

In [7]:
df = pd.concat([results1_df, results2_df])
df=df.reset_index(drop=True)
df

Unnamed: 0,name,location.lat,location.lng
0,Old Stove Brewing Co - Marketfront,47.609591,-122.343041
1,Fremont Brewing Company,47.648974,-122.344491
2,Optimism Brewing Company,47.612816,-122.320571
3,Redhook Brewlab,47.614222,-122.322761
4,Pike Brewing Company,47.608161,-122.339923
5,Stoup Brewing,47.666551,-122.371277
6,Reuben's Brews,47.665398,-122.373270
7,Cloudburst Brewing,47.611565,-122.345212
8,Peddler Brewing Company,47.663757,-122.377057
9,Holy Mountain Brewing Company,47.630795,-122.374597


In [8]:
# create map of Seattle using latitude and longitude values
map_seattle = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, name in zip(df['location.lat'], df['location.lng'], df['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_seattle)  
    
map_seattle

In [9]:
X = df[['location.lat', 'location.lng']]

In [10]:
epsilon = 0.01
minimumSamples = 4
db = DBSCAN(eps=epsilon, min_samples=minimumSamples).fit(X)
labels = db.labels_
labels

array([-1,  0,  1,  1, -1,  2,  2, -1,  2, -1,  2,  2,  1, -1,  2, -1,  2,
        2,  2,  2, -1, -1,  2,  2, -1, -1,  2,  2,  2,  2,  0, -1, -1,  2,
        0,  1,  0,  2,  2,  3,  3,  3, -1, -1, -1, -1,  4, -1,  5,  4,  4,
        5, -1,  4, -1, -1,  5,  5,  4, -1,  4,  4, -1,  5,  3])

In [11]:
df.insert(0, 'Cluster Labels', labels)

In [12]:
df

Unnamed: 0,Cluster Labels,name,location.lat,location.lng
0,-1,Old Stove Brewing Co - Marketfront,47.609591,-122.343041
1,0,Fremont Brewing Company,47.648974,-122.344491
2,1,Optimism Brewing Company,47.612816,-122.320571
3,1,Redhook Brewlab,47.614222,-122.322761
4,-1,Pike Brewing Company,47.608161,-122.339923
5,2,Stoup Brewing,47.666551,-122.371277
6,2,Reuben's Brews,47.665398,-122.373270
7,-1,Cloudburst Brewing,47.611565,-122.345212
8,2,Peddler Brewing Company,47.663757,-122.377057
9,-1,Holy Mountain Brewing Company,47.630795,-122.374597


In [13]:
clusters = df['Cluster Labels'].unique()
clusters

array([-1,  0,  1,  2,  3,  4,  5])

In [14]:
# create map of Seattle using latitude and longitude values
cluster_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
#x = np.arange(clusters)
#ys = [i + x + (i*x)**2 for i in range(clusters)]
#colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
#rainbow = [colors.rgb2hex(i) for i in colors_array]

def label_color(label):
    if label == -1:
        return 'grey'
    elif label == 0:
        return 'yellow',
    elif label == 1:
        return 'green',
    elif label == 2:
        return 'blue',
    elif label == 3:
        return 'red',
    elif label == 4:
        return 'pink',
    elif label == 5:
        return 'orange',
    else:
        return 'black'

# add markers to map
markers_colors = []
for lat, lng, name, cluster in zip(df['location.lat'], df['location.lng'], df['name'], df['Cluster Labels']):
    label = folium.Popup(str(name) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=label_color(cluster),
        fill=True,
        fill_color=label_color(cluster),
        fill_opacity=0.7).add_to(cluster_map)  
    
cluster_map

In [15]:
cluster_df = df.groupby(['Cluster Labels']).mean()
cluster_df

Unnamed: 0_level_0,location.lat,location.lng
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
-1,47.608288,-122.338148
0,47.65001,-122.344238
1,47.614793,-122.318963
2,47.662776,-122.374754
3,47.592219,-122.332647
4,47.550587,-122.321091
5,47.568315,-122.336133


In [16]:
radius = '100'

In [17]:
for x in range (0,6):
    record = cluster_df.loc[x]
    lat = record['location.lat']
    long = record['location.lng']
    url3 = 'https://api.foursquare.com/v2/venues/search?client_id=%s&client_secret=%s&v=%s&intent=%s&ll=%s,%s&radius=%s&categoryId=%s&limit=%s' % (client_id, client_secret, version, intent, lat, long, radius, food, limit)
    results3 = requests.get(url3).json()
    results3_df = json_normalize(results3['response']['venues'])
    print (str(x) + ' ' + str(results3_df.shape[0]))

0 0
1 29
2 5
3 24
4 0
5 2


**Results**

Based on the analysis it was clear that there were only a number of defined clusters of breweries within the city of Seattle.  The can be categorized as:
* Cluster 0: Fremont
* Cluster 1: Capitol Hill
* Cluster 2: Ballard
* Cluster 3: Downtown
* Cluster 4: SoDo

While Ballard is clearly the most pronounced cluster in terms of cluster points.  The criteria for stated analysis was to identify clusters and then chose the most optimal location based off of close proximity of food options.  Taking the central point of each cluster and searching against the Foursquare API using a radius of 100 meters Capitol Hill becomes the clear winner.

Thus I would recommend a location at/near the central point of the Capitol Hill Cluster for my new brewery - (47.614793	-122.318963).

**Discussion**

While this finding is interesting there is clearly room for improvement.  Two specific items that stand out are as follows.

DBScan was not fully optimized - and ideally there would be additional analysis done on how the model parameters impacted the outcomes and identification of what the ideal parameters are.

Proximity to Food Locations was not fully vetted.  While 100 meter radius produced a certain result - changing this parameter to greater/lesser values affects the results.

**Conclusion**

By leveraging the DBScan module, identifying clusters of categorical venues, and then assessing those clusters attractiveness based on the cluster's focal points' proximity to other complementary venues (in this case food) we can create an objective means of assessing potential business locations.

This basic framework could be further hardended and parameterized to create a tool for searching for categorical clusters and then assessing favorability in a more user friendly experience/application.  e.g. I want to find clusters of bowling alley's near parks, or so on and so forth - and could become an interesting business location opportunity assessment tool.