# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera 

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

4 years ago I opened an Instagram business account that I used as a personal blog. Since then I’m documenting all my visits to local breweries around Mexico (Chihuahua, Baja California, CDMX, Guadalajara) and the US (Texas, Washington State, Oregon).

I started my project in Mexico, and then when I move to Washington state 3 years ago I have been continue with this activity around the US being Washington State the headquarters of my blog activity.

I want to continue with my personal blog to try different kind of beers (every single brewery has it’s unique and seasonal kind of beer), tell people about the local breweries and crafted beer located around the world, make of this personal blog a profitable business in the future and allow me to travel around the world.


## Data <a name="data"></a>

Actually, Washington state has 430 breweries around it’s huge territory. The most located in the Seattle area, according to the Washington state craft beer website (https://washingtonbeer.com/breweries/). This website distribute brochures throughout Washington state for tourist or craft beer enthusiasts to visit the breweries located in the state.

Everything is fine so far, but some of the information in the brochures is outdated. So, if you don’t know about the breweries located on Washington state or the existence of the web site, you are going to waste a lot of time trying to look for breweries and find that some are already closed or have been moved.

So, to fix this problem I’m going to use the Foursquare API to have updated information about the breweries located in Washington State and in future locations, to update this task and make it easier for further enthusiasts. Also, information about the cities on Washington State.

Let's start with the necessary libraries to work along the project

In [1]:
import pandas as pd #library for data analysis
import requests #library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library

The necessary data was obtained from Washington Geospatial Open Data Portal https://geo.wa.gov/ using the request python library.


In [2]:
url = 'https://services2.arcgis.com/J4VMdGWiZXReffvo/arcgis/rest/services/CityLimits/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json'
results = requests.get(url).json()
#results

All the relevant data is into the features key, which is basically a list of the Washington State cities. So, let's define the cities variable that includes this data and transform this data of nested Python dictionaries into a pandas dataframe. So let's start by creating an empty dataframe.

In [3]:
city_data = results['features']
column_names = ['City Name','County', 'Latitude', 'Longitude']
cities = pd.DataFrame(columns = column_names)

In [4]:
cities

Unnamed: 0,City Name,County,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [5]:
for data in city_data:
    city_name = data['attributes']['CITY_NM'] 
    county = data['attributes']['COUNTY_NM']
    lat = data['attributes']['StatePla_1']
    lon = data['attributes']['StatePlane']
    
    cities = cities.append({'City Name':city_name,
                                'County': county,
                                'Latitude': lat,
                                'Longitude': lon}, ignore_index=True)

The result, 288 rows that represents Washington State citites. Washington State have just 281 cities, but some cities like Auburn, Bothell, Enumclaw, Milton, Pacific, Woodland, Coulee Dam spans multiple counties.

In [6]:
cities.sort_values(by = ['City Name'], inplace = True)
cities.reset_index(drop = True)

Unnamed: 0,City Name,County,Latitude,Longitude
0,Aberdeen,Grays Harbor,46.976117,-123.809870
1,Airway Heights,Spokane,47.645849,-117.579228
2,Albion,Whitman,46.791653,-117.251139
3,Algona,King,47.281997,-122.250430
4,Almira,Lincoln,47.710544,-118.937096
...,...,...,...,...
283,Yacolt,Clark,45.865278,-122.406809
284,Yakima,Yakima,46.592506,-120.549388
285,Yarrow Point,King,47.646051,-122.218170
286,Yelm,Thurston,46.939771,-122.626363


Use geopy library to get the latitude and longitude values of Washington State. To define an instance of the geocoder, we need to define a user_agent. We will name our agent wa_explorer.

In [7]:
address = 'Washington State'

geolocator = Nominatim(user_agent="wa_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Washington State are {}, {}.'.format(latitude, longitude))

The geographical coordinate of Washington State are 47.2868352, -120.2126139.


Filtering by unique values gives the correct number of Washington State cities.

In [8]:
print('The dataframe has {} counties and {} cities.'.format(
        len(cities['County'].unique()),
        len(cities['City Name'].unique())
    )
)

The dataframe has 39 counties and 281 cities.


Once we have the geolocation information, let's create the map of Washington State, with the name of the cities superimposed on top.

In [9]:
# create map of Washington State using latitude and longitude values
map_waState = folium.Map(location=[latitude, longitude], zoom_start=7)

# add markers to map
for lat, lng, ciname, cou in zip(cities['Latitude'], cities['Longitude'], cities['City Name'], cities['County']):
    label = '{}'.format(ciname)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_waState)  
    
map_waState

## Methodology <a name="methodology"></a>

For this project I’m going to request the information from the Foursquare API and manipulate it to get the information about the breweries located in Washington 
State using the geolocation coordinates and filter by venue category to get all the categories related to breweries.

The result of the information requested would be divided into counties and in the future filtered and clustered to separate it by cities. Once done a brewery tour can be planned if you want to try different breweries or this is a good idea to start a beer tour business.

The filter for this project was set for Snohomish County (the county where I’m living). Snohomish County has 20 cities and 36 breweries having Everett and Snohomish the most (6 each). Some of the breweries are not listed in the results, this may be because there is a new brewery, small brewery, re-located brewery or there is no review of the brewery yet. Also some breweries were located in the wrong city, this may be due to the closeness of the cities and the geolocation service is locating the breweries in another city.

Let's filter by Snohomish County to get it's cities

In [10]:
snohomish_data = cities[cities['County'] == 'Snohomish'].reset_index(drop=True)
snohomish_data.head(20)

Unnamed: 0,City Name,County,Latitude,Longitude
0,Arlington,Snohomish,48.168741,-122.145098
1,Bothell,Snohomish,47.792296,-122.211375
2,Brier,Snohomish,47.792469,-122.273569
3,Darrington,Snohomish,48.25464,-121.603046
4,Edmonds,Snohomish,47.811287,-122.353208
5,Everett,Snohomish,47.950463,-122.220855
6,Gold Bar,Snohomish,47.856446,-121.692088
7,Granite Falls,Snohomish,48.087409,-121.970464
8,Index,Snohomish,47.821483,-121.555939
9,Lake Stevens,Snohomish,48.003001,-122.096194


And get the latitude and longitude of Snohomish County to locate the cities in the map 

In [11]:
address = 'Snohomish county, WA'

geolocator = Nominatim(user_agent="wa_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Snohomish are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Snohomish are 48.0074736, -121.7304882.


In [12]:
map_snohomish = folium.Map(location=[latitude, longitude], zoom_start=8)

# add markers to map
for lat, lng, ciname, cou in zip(snohomish_data['Latitude'], snohomish_data['Longitude'], snohomish_data['City Name'], snohomish_data['County']):
    label = '{}'.format(ciname)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_snohomish)  
    
map_snohomish

Let's continue with Foursquare and enter credentials and version

In [13]:
CLIENT_ID = 'H3H0QKIA1PEXVHNAQYL0TORR5LSMS0Z1DTIKXGFRZVPOKOZP' # your Foursquare ID
CLIENT_SECRET = 'PAPZXJNY4IQVWV3BO23KIBQ0Y5USLZRMSJ43HAYO5DXSZ3C2' # your Foursquare Secret
VERSION = '20200708' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: H3H0QKIA1PEXVHNAQYL0TORR5LSMS0Z1DTIKXGFRZVPOKOZP
CLIENT_SECRET:PAPZXJNY4IQVWV3BO23KIBQ0Y5USLZRMSJ43HAYO5DXSZ3C2


Now that we have filtered by county, let't get the latitude and longitude of the City of Snohomish to put it as a our reference city

In [14]:
snohomish_data.loc[16, 'City Name']

'Snohomish'

In [15]:
city_lat = snohomish_data.loc[16, 'Latitude'] # city latitude value
city_lon = snohomish_data.loc[16, 'Longitude'] # city longitude value

city_name = snohomish_data.loc[16, 'City Name'] # city name

print('Latitude and longitude values of {} are {}, {}.'.format(city_name, 
                                                               city_lat, 
                                                               city_lon))

Latitude and longitude values of Snohomish are 47.9278377091, -122.097608349.


In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City Name', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
snohomish_venues = getNearbyVenues(names=snohomish_data['City Name'],
                                   latitudes=snohomish_data['Latitude'],
                                   longitudes=snohomish_data['Longitude']
                                  )


Arlington
Bothell
Brier
Darrington
Edmonds
Everett
Gold Bar
Granite Falls
Index
Lake Stevens
Lynnwood
Marysville
Mill Creek
Monroe
Mountlake Terrace
Mukilteo
Snohomish
Stanwood
Sultan
Woodway


In [18]:
snb = snohomish_venues[snohomish_venues['Venue Category'] == 'Brewery']
print(snb.shape)
snb.head(100)

(21, 7)


Unnamed: 0,City Name,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
8,Arlington,48.168741,-122.145098,Whitewall Brewing Company,48.128195,-122.184305,Brewery
25,Arlington,48.168741,-122.145098,Skookum Brewery,48.158907,-122.150649,Brewery
53,Bothell,47.792296,-122.211375,9 Yards Brewery,47.756638,-122.242402,Brewery
56,Bothell,47.792296,-122.211375,Cairn Brewing,47.757219,-122.243891,Brewery
72,Brier,47.792469,-122.273569,Hemlock State Brewing Company,47.784585,-122.308162,Brewery
74,Brier,47.792469,-122.273569,9 Yards Brewery,47.756638,-122.242402,Brewery
76,Brier,47.792469,-122.273569,Diamond Knot Brewpub,47.787882,-122.30985,Brewery
81,Brier,47.792469,-122.273569,Cairn Brewing,47.757219,-122.243891,Brewery
103,Edmonds,47.811287,-122.353208,Salish Sea Brewing Co.,47.809612,-122.376873,Brewery
186,Gold Bar,47.856446,-121.692088,Timber Monster Brewing Company,47.86208,-121.81556,Brewery


## Results and discussion <a name="results"></a>

Our analysis show that there are a low number of breweries in Snohomish county and all it's cities in a radius of 10km. This can be a good start if somebody wants to open a new brewery.

Also, in the Snohomish breweries dataframe we can see that some breweries are listed in different cities, this is due to the geographical location, and some breweries are located at the boundaries of two cities.

This analysis was made just as a hobby, if you are looking for a business in craft beer this can be a good beginning. In the case of Washington State there are a lot of information about craft beer, but there are cities into the U.S. that has no much information about it.

Finally, modifitying some of the data and some parameters, we can obtain the breweries or data for different venues, cities including coutries to continue with different analysis.



## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify or locate the different Breweries around Washington State areas particularly the once located at Snohomish County (this filter can be modified by change the latitude and longitude of the desired County). Of course like in every big city, a great number of breweries are located in the Seattle area.

The result of the information requested would be divided into counties and in the future filtered and clustered to separate it by cities. Once done, a brewery tour can be planned if you want to try different breweries or this is a good idea to start a beer tour business. That's why is a good idea to work with counties or cities with a low number of breweries.

The filter for this project was set for Snohomish County (the county where I’m living). Snohomish County has 20 cities and 36 breweries having Everett and Snohomish the most (6 each). Some of the breweries are not listed in the results, this may be because there is a new brewery, small brewery, re-located brewery or there is no review of the brewery yet. Also some breweries were located in the wrong city.