# Coursera Capstone
## The Battle of the Neighborhoods
This document will be used for the Coursera Capstone project, part of the IBM Data Science Professional Certificate.

In [1]:
import pandas as pd
import numpy as np

import json # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

### Create a dataframe of neighborhoods in Minneapolis
As far as I could tell, the data for Minneapolis neighborhoods did not come attached with latitude and longitude values. The following section creates a dataframe with neighborhood names and attached lat/long values.

In [2]:
# Data retrieved from
# https://opendata.minneapolismn.gov/datasets/minneapolis-neighborhoods
# Data downloaded and placed in local directory

with open('Minneapolis_Neighborhoods.geojson.json') as jsondata:
    mpls_data = json.load(jsondata)

Define `mpls_nbhds` to be the features of the geojson file imported above.

In [3]:
mpls_nbhds = mpls_data['features']

In [None]:
# Look at the first entry. Note the neighborhood name has key 'BDNAME'.
mpls_nbhds[0]

In [6]:
# Initalize the dataframe
column_names = ['Neighborhood','Latitude', 'Longitude']
nbhds = pd.DataFrame(columns=column_names)
nbhds

Unnamed: 0,Neighborhood,Latitude,Longitude


In [7]:
# Fill in the neighborhood names. The latitude and longitude are filled in the next step.
for data in mpls_nbhds:
    nbhd_name = data['properties']['BDNAME']
    nbhds = nbhds.append({'Neighborhood':nbhd_name}, ignore_index=True)

In [8]:
nbhds.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Phillips West,,
1,Downtown West,,
2,Downtown East,,
3,Ventura Village,,
4,Sumner - Glenwood,,


#### Populate the latitude and longitude using Geopy

In [9]:
counter = 0
missing_addresses = []
for nbhd in nbhds['Neighborhood']:
    address = nbhd + ', Minneapolis, MN'
    geolocator = Nominatim(user_agent="mpls_explorer")
    location = geolocator.geocode(address)
    if location is None:
        print(address+' is not found on geopy.')
        missing_addresses = missing_addresses +[nbhd]
    else:
        latitude = location.latitude
        longitude = location.longitude
        nbhds['Latitude'][counter] = latitude
        nbhds['Longitude'][counter] = longitude
    counter+=1
print(missing_addresses)
nbhds.head()

Downtown West, Minneapolis, MN is not found on geopy.
Ventura Village, Minneapolis, MN is not found on geopy.
Humboldt Industrial Area, Minneapolis, MN is not found on geopy.
South Uptown, Minneapolis, MN is not found on geopy.
Mid - City Industrial, Minneapolis, MN is not found on geopy.
Nicollet Island - East Bank, Minneapolis, MN is not found on geopy.
['Downtown West', 'Ventura Village', 'Humboldt Industrial Area', 'South Uptown', 'Mid - City Industrial', 'Nicollet Island - East Bank']


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Phillips West,44.9539,-93.2663
1,Downtown West,,
2,Downtown East,44.975,-93.2599
3,Ventura Village,,
4,Sumner - Glenwood,44.9837,-93.2914


There are 6 neighborhoods that Geopy could not identify. We can manually input these by pulling the location data from a google search.

In [10]:
Downtown_West =[44.9742, -93.2733]
Ventura_Village = [44.9618, -93.2582]
Humboldt_Industrial_Area = [45.0421, -93.3077]
South_Uptown = [44.9411, -93.2911]
Mid_City_Industrial = [44.9989, -93.2178]
Nicollet_Island = [44.9879, -93.2629]

missing_latlon = [Downtown_West, Ventura_Village, Humboldt_Industrial_Area, South_Uptown, Mid_City_Industrial, Nicollet_Island]

In [11]:
missing_dict={}
for j in range(0,6):
    entry ={missing_addresses[j]:missing_latlon[j]}
    missing_dict.update(entry)
missing_dict

{'Downtown West': [44.9742, -93.2733],
 'Ventura Village': [44.9618, -93.2582],
 'Humboldt Industrial Area': [45.0421, -93.3077],
 'South Uptown': [44.9411, -93.2911],
 'Mid - City Industrial': [44.9989, -93.2178],
 'Nicollet Island - East Bank': [44.9879, -93.2629]}

In [12]:
# Fill in missing lat/long values into our dataframe
for j in range(0,6):
    value=nbhds.index[nbhds['Neighborhood']==missing_addresses[j]]
    nbhds.loc[value[0], 'Latitude'] = missing_dict[missing_addresses[j]][0]
    nbhds.loc[value[0], 'Longitude'] = missing_dict[missing_addresses[j]][1]

nbhds.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Phillips West,44.9539,-93.2663
1,Downtown West,44.9742,-93.2733
2,Downtown East,44.975,-93.2599
3,Ventura Village,44.9618,-93.2582
4,Sumner - Glenwood,44.9837,-93.2914


In [13]:
nbhds.shape

(87, 3)

According to Wikipedia, there are 81 official neighborhoods. Here we tally 87. Some cross-referencing perhaps is in order.

### Use FourSquare API to gather data about breweries in Minneapolis

In [14]:
# Define FourSquare credentials
CLIENT_ID = 'JSN5MX1DKF5XI3CXVZADJMU5LZE5FMLT2COF00LRJDFMFWIK' # your Foursquare ID
CLIENT_SECRET = 'ZST1WYPJCG2J2LGQGUER23BPAC1OMF1BKYC4WQKSSRD3WC1T' # your Foursquare Secret
VERSION = '20200101' # Foursquare API version

# Find Minneapolis latitude and longitude
address = 'Minneapolis, MN'
geolocator = Nominatim(user_agent="mpls_explorer")
location = geolocator.geocode(address)
mpls_latitude = location.latitude # neighborhood latitude value
mpls_longitude = location.longitude # neighborhood longitude value

category = '50327c8591d4c4b30a586d5d' # Foursquare category for brewery

radius = 25000 # in meters
LIMIT = 200

url = 'https://api.foursquare.com/v2/venues/\
explore?client_id={}&client_secret={}&v={}\
&ll={},{}&categoryId={}&radius={}&limit={}'\
.format(CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        mpls_latitude,
        mpls_longitude,
        category,
        radius,
        LIMIT)

results = requests.get(url).json()

# The [0] below threw me off. This article helped.
# https://medium.com/@aboutiana/a-brief-guide-to-using-foursquare-api-with-a-hands-on-example-on-python-6fc4d5451203
venues = results['response']['groups'][0]['items']
# tranform venues into a dataframe
#venues
df_brew = pd.json_normalize(venues)
df_brew.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,...,venue.photos.count,venue.photos.groups,venue.location.neighborhood,venue.venuePage.id,venue.delivery.id,venue.delivery.url,venue.delivery.provider.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.icon.name
0,e-0-4c8d128bc37a6dcb86d0fc7a-0,0,"[{'summary': 'This spot is popular', 'type': '...",4c8d128bc37a6dcb86d0fc7a,Fulton Brewing Company,414 6th Ave N,at 5th St. N,44.984862,-93.278828,"[{'label': 'display', 'lat': 44.98486196804921...",...,0,[],,,,,,,,
1,e-0-5047b636e4b04db60102f96d-1,0,"[{'summary': 'This spot is popular', 'type': '...",5047b636e4b04db60102f96d,Dangerous Man Brewing Co,1300 2nd St NE,,45.001049,-93.266337,"[{'label': 'display', 'lat': 45.00104863763394...",...,0,[],,,,,,,,
2,e-0-52190bad11d28f3e1ce73946-2,0,"[{'summary': 'This spot is popular', 'type': '...",52190bad11d28f3e1ce73946,Surly Brewing Company,520 Malcolm Ave SE,SE 5th St,44.973226,-93.210072,"[{'label': 'display', 'lat': 44.97322598772595...",...,0,[],"Prospect Park, Minneapolis, MN",,,,,,,
3,e-0-56242508498e6aeb80142c2c-3,0,"[{'summary': 'This spot is popular', 'type': '...",56242508498e6aeb80142c2c,Lakes & Legends Brewing Company,1368 Lasalle Ave,,44.968908,-93.279479,"[{'label': 'display', 'lat': 44.96890779142156...",...,0,[],,465641085.0,,,,,,
4,e-0-5aa9baebff03062a4b1dccba-4,0,"[{'summary': 'This spot is popular', 'type': '...",5aa9baebff03062a4b1dccba,Finnegans House,817 5th Ave S,btwn S 8th & 9th St,44.972301,-93.26641,"[{'label': 'display', 'lat': 44.97230088563604...",...,0,[],,,,,,,,


In [15]:
df_brew.shape

(100, 28)

In [16]:
df_brew.columns

Index(['referralId', 'reasons.count', 'reasons.items', 'venue.id',
       'venue.name', 'venue.location.address', 'venue.location.crossStreet',
       'venue.location.lat', 'venue.location.lng',
       'venue.location.labeledLatLngs', 'venue.location.distance',
       'venue.location.postalCode', 'venue.location.cc', 'venue.location.city',
       'venue.location.state', 'venue.location.country',
       'venue.location.formattedAddress', 'venue.categories',
       'venue.photos.count', 'venue.photos.groups',
       'venue.location.neighborhood', 'venue.venuePage.id',
       'venue.delivery.id', 'venue.delivery.url',
       'venue.delivery.provider.name', 'venue.delivery.provider.icon.prefix',
       'venue.delivery.provider.icon.sizes',
       'venue.delivery.provider.icon.name'],
      dtype='object')

In [17]:
drop_columns = ['referralId', 'reasons.count', 'reasons.items', 'venue.location.labeledLatLngs',
                'venue.photos.count', 'venue.photos.groups', 'venue.location.postalCode',
                'venue.location.cc', 'venue.location.city', 'venue.location.state', 'venue.location.country',
                'venue.location.formattedAddress', 'venue.venuePage.id', 'venue.delivery.id', 'venue.delivery.url',
                'venue.delivery.provider.name', 'venue.delivery.provider.icon.prefix', 'venue.delivery.provider.icon.sizes',
                'venue.delivery.provider.icon.name', 'venue.location.neighborhood', 'venue.categories',
                'venue.location.distance', 'venue.location.crossStreet']

df_brew.drop(columns = drop_columns, inplace=True)
df_brew.head(10)

Unnamed: 0,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng
0,4c8d128bc37a6dcb86d0fc7a,Fulton Brewing Company,414 6th Ave N,44.984862,-93.278828
1,5047b636e4b04db60102f96d,Dangerous Man Brewing Co,1300 2nd St NE,45.001049,-93.266337
2,52190bad11d28f3e1ce73946,Surly Brewing Company,520 Malcolm Ave SE,44.973226,-93.210072
3,56242508498e6aeb80142c2c,Lakes & Legends Brewing Company,1368 Lasalle Ave,44.968908,-93.279479
4,5aa9baebff03062a4b1dccba,Finnegans House,817 5th Ave S,44.972301,-93.26641
5,570d7c00cd102c0fabd7d768,Inbound BrewCo,701 North 5th Street,44.98564,-93.281496
6,52d4225b498e07070c180a62,Sisyphus Brewing,712 Ontario Ave W,44.973214,-93.28904
7,4f3dcd09e4b0e4bab927d3d6,Indeed Brewing Company,711 15th Ave NE Ste 102,45.003368,-93.251563
8,41326e00f964a52060171fe3,Minneapolis Town Hall Brewery,1430 Washington Ave S,44.97332,-93.247628
9,51f2d31c498eed6962fdc888,Day Block Brewing Company,1105 Washington Ave S,44.97519,-93.253207


In [28]:
df_brew.tail()

Unnamed: 0,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng
95,4dd9aa848877f115102094a5,The Beamer Bar,2008 2nd Ave S,44.962178,-93.274213
96,4283ee00f964a520b1221fe3,Chatterbox Pub,2229 E 35th St,44.939535,-93.239138
97,4de30667ae60e7f3abfeadd7,Black & Tan Brewery,2227 Monroe St NE,45.01018,-93.252054
98,5d71a01fb3950a00079d87f0,Falling Knife Brewing,783 Harding St NE,44.998394,-93.22109
99,5bf5a33dc03635002cc313a8,Minneapolis Cider Company,701 SE 9th Street,44.98945,-93.241404


### Clean and format brewery data

By looking at the columns and head, we see some candidates for removal. For example, the second listing is the twincities brewery tours, which is not actually a brewery. Additionally, its `location.address` is NaN. This row is a candidate for removal. Let's see if we can further whittle this data set down.

### Generating a Choropleth Map

In [45]:
mpls_map = folium.Map(location=[mpls_latitude, mpls_longitude], zoom_start=13)
mpls_map

In [None]:
folium.GeoJson(mpls_data).add_to(m)
m

In [46]:
# Create brewery instances
breweries = folium.map.FeatureGroup()

for lat, lon in zip(df_brew['venue.location.lat'], df_brew['venue.location.lng']):
    folium.features.CircleMarker(
        [lat, lon],
        radius=5,
        color='blue',
        fill=True,
        #popup=label,
        fill_color='magenta',
        fill_opacity=0.6
    ).add_to(mpls_map)
    

# add breweries to map
mpls_map

NameError: name 'displayHTML' is not defined