# Capstone Project - The Battle of Neighborhoods
---

## Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

# Introduction: Business Problem <a name="introduction"></a>
---

In this project we will try to find an optimal location for a restaurant in Seattle, Washington. Specifically, it will be targeted on those interested in opening an Italian restaurant.

We will try to detect locations that are not already crowded with restaurants. We are also particularly interested in areas with no Italian restaurants in vicinity. 

We will use our data science powers to generate a few most promising neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

# Data <a name="data"></a>
---

We are going to be using this data set with Seattle Neighborhoods by Zip Codes from this page 'http://www.agingkingcounty.org/wp-content/uploads/sites/185/2016/09/SubRegZipCityNeighborhood.pdf'. It is a pdf file containing excel spreadsheet "Sub-Regional, City and Neighborhood Designations by Zip Code". 

I used Adobe Acrobat to extract it into an excel file from the pdf. We are going to use only the data for City of Seattle and its Neighborhoods. We will need only the data from the first page in the section sorted by Seattle Neighborhood which contains all neighborhood in the City of Seattle specifically and corresponding zip codes.

Also, we will use Foursquare API to later extract data for venues in the corresponding zip codes. 


### Loading and Extracting Data

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from bs4 import BeautifulSoup as bs

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import requests
from bs4 import BeautifulSoup as bs
import html5lib
print('Libraries imported.')

Libraries imported.


In [2]:
df = pd.read_excel('SubRegZipCityNeighborhood.xlsx')
df = df.rename(columns={'Seattle Neighborhood': 'Neighborhood'})
df = df.sort_values(by=['ZIP'])
df = df.reset_index(drop=True)
df

Unnamed: 0,ZIP,City Name,Sub Region,Neighborhood
0,98101,Seattle,Seattle,Downtown
1,98102,Seattle,Seattle,Capitol Hill
2,98103,Seattle,Seattle,Lake Union
3,98104,Seattle,Seattle,Downtown
4,98105,Seattle,Seattle,Northeast
5,98106,Seattle,Seattle,Delridge
6,98107,Seattle,Seattle,Ballard
7,98108,Seattle,Seattle,Duwamish
8,98109,Seattle,Seattle,Queen Anne/Magnolia
9,98111,Seattle,Seattle,Downtown


Now we are going to use zip-codes.com and their Free Level API to obtain geospatial data for our zip codes from the dataframe

In [3]:
latitudes = []
longitudes = []
for code in df['ZIP']:
    info = requests.get('https://api.zip-codes.com/ZipCodesAPI.svc/1.0/QuickGetZipCodeDetails/{}?key=<AE7LR79I8JC8CNQRLGZF>'.format(code)).json()
    lati = info['Latitude']
    latitudes.append(lati)
    long = info['Longitude']
    longitudes.append(long)
df['Latitude'] = latitudes
df['Longitude'] = longitudes

In [4]:
df.head()

Unnamed: 0,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude
0,98101,Seattle,Seattle,Downtown,47.611012,-122.333523
1,98102,Seattle,Seattle,Capitol Hill,47.635749,-122.324362
2,98103,Seattle,Seattle,Lake Union,47.670294,-122.348306
3,98104,Seattle,Seattle,Downtown,47.602134,-122.328431
4,98105,Seattle,Seattle,Northeast,47.6604,-122.28053


### Visualizing
We will map the neighborhoods on the map using the acquired geodata

In [7]:
import folium

address = 'Seattle, Washington'

geolocator = Nominatim(user_agent="seattle_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map = folium.Map(location=[latitude, longitude], zoom_start=10)
neighborhoods = df

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Sub Region'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
map

### Connect to Foursquare API and extract the data for nearby venues in each neighborhood
---

In [8]:
CLIENT_ID = '3OWWV1IKR1G4UXICT2E5V4QS224B4HWYE5XJ0QBSESX0SP14' # your Foursquare ID
CLIENT_SECRET = 'WGZ0KKG1IVLJL424MEC3Z3I0AN3X5WY4TDAV5TI53SKT1ORD' # your Foursquare Secret
VERSION = '20180323' # Foursquare API version

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
seattle_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Downtown
Capitol Hill
Lake Union
Downtown
Northeast
Delridge
Ballard
Duwamish
Queen Anne/Magnolia
Downtown
Capitol Hill
Downtown
Northeast
Southwest
Ballard
Southeast
Queen Anne/Magnolia
Downtown
Central
Duwamish
North
Delridge
Downtown
Northwest
Duwamish
Southwest
Southeast
Northeast
Southwest
Downtown
Downtown
Downtown
Downtown
Northwest
Downtown
Northeast
Downtown
Northeast
Queen Anne/Magnolia


In [13]:
seattle_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Downtown,47.611012,-122.333523,Timbuk2,47.612561,-122.334223,Accessories Store
1,Downtown,47.611012,-122.333523,ACT Theatre,47.610763,-122.332905,Theater
2,Downtown,47.611012,-122.333523,Monorail Espresso,47.610828,-122.335048,Coffee Shop
3,Downtown,47.611012,-122.333523,The 5th Avenue Theatre,47.608996,-122.334162,Theater
4,Downtown,47.611012,-122.333523,Din Tai Fung Dumpling House,47.612671,-122.335073,Dumpling Restaurant
