# Capstone Project - The Battle of Neighborhoods
---

## Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction: Business Problem <a name="introduction"></a>
---

In this project we will try to find an optimal location for a restaurant in Seattle, Washington. Specifically, it will be targeted on those interested in opening an Italian restaurant.

We will try to detect locations that are not already crowded with restaurants. We are also particularly interested in areas with no Italian restaurants in vicinity. We may take in consideration average incomes or total populations living in the areas to draw a final decision.

We will use our data science powers to generate a few most promising neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

# Data <a name="data"></a>
---

We are going to be using this data set with Seattle Neighborhoods by Zip Codes from this page 'http://www.agingkingcounty.org/wp-content/uploads/sites/185/2016/09/SubRegZipCityNeighborhood.pdf'. It is a pdf file containing excel spreadsheet "Sub-Regional, City and Neighborhood Designations by Zip Code". 

I used Adobe Acrobat to extract it into an excel file from the pdf. We are going to use only the data for City of Seattle and its Neighborhoods. We will need only the data from the first page in the section sorted by Seattle Neighborhood which contains all neighborhood in the City of Seattle and its adjacent suburbs and corresponding zip codes.

We will use Foursquare API to later extract data for venues in the corresponding zip codes. Also, we are going to retrieve different data by zip code from zip-codes.com database using their API.

### Loading and Extracting Data

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from bs4 import BeautifulSoup as bs

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from geopy.distance import distance

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from matplotlib import pyplot as plt

from sklearn.cluster import KMeans 
from sklearn.neighbors import NearestNeighbors
import requests
from bs4 import BeautifulSoup as bs
import html5lib
print('Libraries imported.')

Libraries imported.


In [3]:
df = pd.read_excel('SubRegZipCityNeighborhood.xlsx')
df = df.rename(columns={'Seattle Neighborhood': 'Neighborhood'})
df = df.sort_values(by=['ZIP'])
df = df.reset_index(drop=True)
df

Unnamed: 0,ZIP,City Name,Sub Region,Neighborhood
0,98004,Bellevue,East Urban,Bellevue
1,98005,Bellevue,East Urban,Bellevue
2,98006,Bellevue,East Urban,Bellevue
3,98007,Bellevue,East Urban,Bellevue
4,98008,Bellevue,East Urban,Bellevue
5,98009,Bellevue,East Urban,Bellevue
6,98011,Bothell,North,Bothell
7,98015,Bellevue,East Urban,Bellevue
8,98028,Kenmore,North,Kenmore
9,98033,Kirkland,East Urban,Kirkland


Now we are going to use zip-codes.com and their API to obtain geospatial data for our zip codes from the dataframe

In [4]:
latitudes = []
longitudes = []
for code in df['ZIP']:
    info = requests.get('https://api.zip-codes.com/ZipCodesAPI.svc/1.0/QuickGetZipCodeDetails/{}?key=<AE7LR79I8JC8CNQRLGZF>'.format(code)).json()
    lati = info['Latitude']
    latitudes.append(lati)
    long = info['Longitude']
    longitudes.append(long)
df['Latitude'] = latitudes
df['Longitude'] = longitudes

In [5]:
df.head()

Unnamed: 0,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude
0,98004,Bellevue,East Urban,Bellevue,47.617746,-122.210797
1,98005,Bellevue,East Urban,Bellevue,47.620068,-122.173086
2,98006,Bellevue,East Urban,Bellevue,47.552758,-122.150589
3,98007,Bellevue,East Urban,Bellevue,47.619741,-122.142986
4,98008,Bellevue,East Urban,Bellevue,47.60563,-122.108288


### Visualizing
We will map the neighborhoods on the map using the acquired geodata

In [6]:
import folium

address = 'Seattle, Washington'

geolocator = Nominatim(user_agent="seattle_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map = folium.Map(location=[latitude, longitude], zoom_start=10)
neighborhoods = df

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['ZIP'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
map

### Connect to Foursquare API and extract the data for nearby venues in each neighborhood

In [7]:
CLIENT_ID = '3OWWV1IKR1G4UXICT2E5V4QS224B4HWYE5XJ0QBSESX0SP14' # your Foursquare ID
CLIENT_SECRET = 'WGZ0KKG1IVLJL424MEC3Z3I0AN3X5WY4TDAV5TI53SKT1ORD' # your Foursquare Secret
VERSION = '20180323' # Foursquare API version

In [8]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [10]:
seattle_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Bellevue
Bellevue
Bellevue
Bellevue
Bellevue
Bellevue
Bothell
Bellevue
Kenmore
Kirkland
Kirkland
Medina
Mercer Island
Bothell
Redmond
Redmond
Renton
Renton
Renton
Renton
Redmond
Bothell
Kirkland
Downtown
Capitol Hill
Lake Union
Downtown
Northeast
Delridge
Ballard
Duwamish
Queen Anne/Magnolia
Downtown
Capitol Hill
Downtown
Northeast
Southwest
Ballard
Southeast
Queen Anne/Magnolia
Downtown
Central
Duwamish
North
Delridge
Downtown
Northwest
Duwamish
Southwest
Tukwila
Southeast
Northeast
Southwest
SeaTac
Downtown
Shoreline
SeaTac
Shoreline
Downtown
Downtown
Seattle
Downtown
Northwest
Downtown
SeaTac
Downtown
Northeast
Queen Anne/Magnolia


In [11]:
seattle_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bellevue,47.617746,-122.210797,Oil & Vinegar,47.61607,-122.20484,Gourmet Shop
1,Bellevue,47.617746,-122.210797,QFC,47.618611,-122.205037,Supermarket
2,Bellevue,47.617746,-122.210797,Happy Lemon,47.61607,-122.204912,Bubble Tea Shop
3,Bellevue,47.617746,-122.210797,InSpa,47.616589,-122.204904,Spa
4,Bellevue,47.617746,-122.210797,Marketplace Cafe,47.617075,-122.204513,Café


Now let's just limit ourselves to the Venue category 'Italian Restaurant'

In [12]:
ital = seattle_venues[seattle_venues['Venue Category'] == 'Italian Restaurant']
ital = ital.reset_index(drop=True)
ital.drop_duplicates(inplace=True)
ital

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bellevue,47.6106,-122.1997,Pogacha Restaurant & Café,47.61066,-122.199308,Italian Restaurant
1,Bellevue,47.6106,-122.1997,Carmines Bellevue,47.611205,-122.203936,Italian Restaurant
2,Bellevue,47.6106,-122.1997,Cantinetta,47.610175,-122.204917,Italian Restaurant
6,Bothell,47.7602,-122.2044,Amaro,47.761626,-122.208009,Italian Restaurant
7,Redmond,47.680496,-122.120938,Tropea Ristorante Italiano,47.680132,-122.12355,Italian Restaurant
8,Redmond,47.680496,-122.120938,Blu Sardinia,47.681041,-122.125342,Italian Restaurant
9,Downtown,47.611012,-122.333523,Barolo Ristorante,47.614298,-122.337838,Italian Restaurant
10,Capitol Hill,47.635749,-122.324362,Serafina,47.63811,-122.325994,Italian Restaurant
11,Capitol Hill,47.635749,-122.324362,Cicchetti,47.638095,-122.326392,Italian Restaurant
12,Downtown,47.602134,-122.328431,Salumi,47.599052,-122.332822,Italian Restaurant


## Analysis <a name="analysis"></a>
---

At first, let's have a separate dataframe with neighborhoods that don't have an Italian restaurant

In [13]:
other = df
other = other[~other['Neighborhood'].isin(ital['Neighborhood'].values)]
other.head()

Unnamed: 0,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude
8,98028,Kenmore,North,Kenmore,47.754876,-122.247104
9,98033,Kirkland,East Urban,Kirkland,47.673156,-122.197628
10,98034,Kirkland,East Urban,Kirkland,47.715193,-122.210637
11,98039,Medina,East Urban,Medina,47.627636,-122.24317
12,98040,Mercer Island,East Urban,Mercer Island,47.565229,-122.233149


Now we will map both the Italian Restaurants we have found in blue and the neighborhoods without any in red

In [14]:
address = 'Seattle, Washington'

geolocator = Nominatim(user_agent="seattle_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map = folium.Map(location=[latitude, longitude], zoom_start=10)
neighborhoods = ital

for lat, lng, venue, neighborhood in zip(neighborhoods['Venue Latitude'], neighborhoods['Venue Longitude'], neighborhoods['Venue'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(venue, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
for lat, lng, borough, neighborhood in zip(other['Latitude'], other['Longitude'], other['ZIP'], other['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#FF0000',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
    
map

Let's remove those which have an Italian Restaurant in a 4 mile radius too

In [15]:
import math

def haversine(coord1, coord2):
    R = 6372800  # Earth radius in meters
    lat1, lon1 = coord1
    lat2, lon2 = coord2
    
    phi1, phi2 = math.radians(lat1), math.radians(lat2) 
    dphi       = math.radians(lat2 - lat1)
    dlambda    = math.radians(lon2 - lon1)
    
    a = math.sin(dphi/2)**2 + \
        math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
    
    return 2*R*math.atan2(math.sqrt(a), math.sqrt(1 - a))

In [16]:
for code in other['ZIP']:
    x = (other.loc[other['ZIP'] == code]['Latitude'].values[0], other.loc[other['ZIP'] == code]['Longitude'].values[0])
    for venue in ital['Venue']:
        y = (ital.loc[ital['Venue'] == venue]['Venue Latitude'].values[0], ital.loc[ital['Venue'] == venue]['Venue Longitude'].values[0])
        d = haversine(x, y)
        if d < 6400: #4 mile radius
            other = other.loc[other['ZIP'] != code]

Which leaves us with:

In [30]:
other = other.reset_index(drop=True)
other

Unnamed: 0,Cluster Labels,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude
0,2,98055,Renton,South Urban,Renton,47.45108,-122.196316
1,2,98056,Renton,South Urban,Renton,47.508872,-122.19496
2,2,98058,Renton,South Urban,Renton,47.435088,-122.116522
3,2,98059,Renton,East Urban,Renton,47.504128,-122.109663
4,0,98108,Seattle,Seattle,Duwamish,47.541083,-122.313312
5,0,98118,Seattle,Seattle,Southeast,47.541963,-122.267649
6,1,98125,Seattle,Seattle,North,47.715789,-122.293458
7,1,98133,Shoreline & Seattle,North & Seattle,Northwest,47.739569,-122.344948
8,0,98136,Seattle,Seattle,Southwest,47.53603,-122.393154
9,0,98146,Seattle,South & Seattle,Southwest,47.500346,-122.363335


### Clustering

We are going to use K-Means to cluster our zip code areas without an Italian Restaurant

In [18]:
seattle_clustering = other[['Latitude', 'Longitude']]
kclusters = 3
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(seattle_clustering)

In [19]:
other.insert(0, 'Cluster Labels', kmeans.labels_)
other

Unnamed: 0,Cluster Labels,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude
16,2,98055,Renton,South Urban,Renton,47.45108,-122.196316
17,2,98056,Renton,South Urban,Renton,47.508872,-122.19496
18,2,98058,Renton,South Urban,Renton,47.435088,-122.116522
19,2,98059,Renton,East Urban,Renton,47.504128,-122.109663
30,0,98108,Seattle,Seattle,Duwamish,47.541083,-122.313312
38,0,98118,Seattle,Seattle,Southeast,47.541963,-122.267649
43,1,98125,Seattle,Seattle,North,47.715789,-122.293458
46,1,98133,Shoreline & Seattle,North & Seattle,Northwest,47.739569,-122.344948
48,0,98136,Seattle,Seattle,Southwest,47.53603,-122.393154
52,0,98146,Seattle,South & Seattle,Southwest,47.500346,-122.363335


In [20]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(other['Latitude'], other['Longitude'], other['Neighborhood'], other['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
for k in range(kclusters):
    label = folium.Popup('Cluster Centroid ' + str(k), parse_html=True)
    folium.CircleMarker(
        [kmeans.cluster_centers_[k, 0], kmeans.cluster_centers_[k, 1]],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='000000',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Finding the Average Household Income and Total Population by Cluster

So we have the clusters and their respective centroids as reference locations for potential restaurants. To decide which prospective location to pick we are going to look at the average household income and population among clusters.

In [21]:
group0 = other[other['Cluster Labels'] == 0]
group1 = other[other['Cluster Labels'] == 1]
group2 = other[other['Cluster Labels'] == 2]

A function to retrieve household incomes and populations from zip codes in a dataframe. If there is one equal to zero it will be dropped from the dataframe and excluded from later calculations.

In [22]:
def getIncomePop(df):
    incomes = []
    population = []
    for code in df['ZIP']:
        info = requests.get('https://api.zip-codes.com/ZipCodesAPI.svc/1.0/GetZipCodeDetails/{}?key=<AE7LR79I8JC8CNQRLGZF>'.format(code)).json()
        income = info['item']['IncomePerHousehold']
        pop = info['item']['ZipCodePopulation']
        incomes.append(income)
        population.append(pop)
    df['Income'] = incomes
    df['Income'] = df['Income'].astype(float)
    df['Population'] = population
    df['Population'] = df['Population'].astype(int)
    df = df[df['Income'] != 0]
    df = df[df['Population'] != 0]
    return df

In [23]:
group0 = getIncomePop(group0)
group1 = getIncomePop(group1)
group2 = getIncomePop(group2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .

In [24]:
group0

Unnamed: 0,Cluster Labels,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude,Income,Population
30,0,98108,Seattle,Seattle,Duwamish,47.541083,-122.313312,58526.0,22374
38,0,98118,Seattle,Seattle,Southeast,47.541963,-122.267649,72545.0,42731
48,0,98136,Seattle,Seattle,Southwest,47.53603,-122.393154,106240.0,14770
52,0,98146,Seattle,South & Seattle,Southwest,47.500346,-122.363335,67556.0,25922
53,0,98148,SeaTac,South Urban,SeaTac,47.446545,-122.321828,56044.0,10010
64,0,98188,SeaTac,South Urban,SeaTac,47.44821,-122.277851,57191.0,23111


In [25]:
group1

Unnamed: 0,Cluster Labels,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude,Income,Population
43,1,98125,Seattle,Seattle,North,47.715789,-122.293458,64429.0,37081
46,1,98133,Shoreline & Seattle,North & Seattle,Northwest,47.739569,-122.344948,64001.0,44555
55,1,98155,"Lake Forest Park, Shoreline",North,Shoreline,47.755304,-122.295911,86525.0,32778
62,1,98177,Shoreline & Seattle,North & Seattle,Northwest,47.739168,-122.375316,109571.0,19030


In [26]:
group2

Unnamed: 0,Cluster Labels,ZIP,City Name,Sub Region,Neighborhood,Latitude,Longitude,Income,Population
16,2,98055,Renton,South Urban,Renton,47.45108,-122.196316,70647.0,21904
17,2,98056,Renton,South Urban,Renton,47.508872,-122.19496,85178.0,32489
18,2,98058,Renton,South Urban,Renton,47.435088,-122.116522,89608.0,41938
19,2,98059,Renton,East Urban,Renton,47.504128,-122.109663,102775.0,34463


Let's create a resulting dataframe with cluster centroids locations and respective average incomes and total populations for each cluster

In [27]:
avg_income = [group0['Income'].mean(), group1['Income'].mean(), group2['Income'].mean()]
total_pop = [group0['Population'].sum(), group1['Population'].sum(), group2['Population'].sum()]
data = {'Cluster': [0, 1, 2], 'Latitude': kmeans.cluster_centers_[:, 0], 
        'Longitude': kmeans.cluster_centers_[:, 1], 'Average Income': avg_income, 'Total Population': total_pop}
final = pd.DataFrame(data, columns=['Cluster', 'Latitude', 'Longitude', 'Average Income', 'Total Population'])
final

Unnamed: 0,Cluster,Latitude,Longitude,Average Income,Total Population
0,0,47.494956,-122.32045,69683.666667,138918
1,1,47.743946,-122.339707,81131.5,133444
2,2,47.474792,-122.154365,87052.0,130794


Here we can see that the cluster in Renton area has the highest average income, while the cluster north of Seattle is right behind. Also, the total populations in the area do not differ much.

In [37]:
from folium.features import DivIcon
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(other['Latitude'], other['Longitude'], other['Neighborhood'], other['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

for k in range(kclusters):
    label = folium.Popup('Cluster ' + str(k), parse_html=True)
    folium.CircleMarker(
        [kmeans.cluster_centers_[k, 0], kmeans.cluster_centers_[k, 1]],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='000000',
        fill_opacity=0.7).add_to(map_clusters)
    folium.Marker(location=[kmeans.cluster_centers_[k, 0], kmeans.cluster_centers_[k, 1]], 
                  icon=DivIcon(icon_size=(150,36), icon_anchor=(0,0),
        html='<div style="font-size: 16pt; color : {}">{}</div>'.format('black', 
                                                                        str(k)))).add_to(map_clusters)

       
map_clusters

## Results and Discussion <a name="results"></a>
---

Our analysis shows that although there are a lot of restaurants in Seattle Area there are not that many Italian Restaurants specifically. Moreover, a lot of them are highly concentrated in several neighborhoods like Seattle Downtown, Bellevue Downtown, or even South Lake Union and Ballard. This leaves out a big area in the south free and a smaller one in the north.

We narrowed down our attention on the areas which do not have any Italian Restaurants in a ~4 mile vicinity radius. Here it would be fair to point out that since we are dealing with non-uniform terrain - mainly with Lake Washington laying in the middle of the area we are looking at, then the method we used to compute distances may not be the best one in comparison with real drive route length. 

Basically, this left us with neighborhoods where we would not have any competition whatsoever. We used K-Means clustering algorithm with 3 clusters, which was deemed reasonable in this case. In general, areas nearby the centers of the clusters may be used as optimal locations for a restaurant.

In order to get some insight into which one of them could be the better one we decided to take a little look at the socioeconomic demographic available in the areas. Looking at the latter one might see Renton or Shoreline areas as more preferable, but not necessarily. 

## Conclusion <a name="conclusion"></a>
---

Purpose of this project was to identify Seattle areas with low number of restaurants (particularly Italian restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Italian restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.