# Capstone Project - The Battle of Neighborhoods 

### Introduction
When ever someone wants to open a new business, they need to do some research. Without having a proper information about new business, they will end up in a mess.  
Our client want to invest in a his venture: Sushi Bar. It will be located in Santiago de Chile therefore we should investigate where is the best location.  

#### Business Problem
To open a Sushi Bar, you must use the Foursquare information on the communes or localities of Santiago (Chile). Santiago has 52 different localities (communes) and our challenge is find the best one.  
For this we define our target audience:  

1. High schools
2. Universities
3. Offices  
The above serves to ensure that we have enough customers and that we are not so close to other sushi places

### Data
To find the best location for our sushi place, we will use the following sources of information:  

(we will use Beautifulsoup to extract data from these wikipedia pages) From Wikipedia (tables)  
Locations https://es.wikipedia.org/wiki/Anexo:Comunas_de_Chile_por_poblaci%C3%B3n  
Post Codes https://es.wikipedia.org/wiki/Anexo:C%C3%B3digos_postales_de_Chile  

From Files Geo Location (latitude, longitude for each locality) https://raw.githubusercontent.com/ssikam/My-Capstone-Project/master/chile%20geo%20public.csv  

From Foursquare Venues Categories https://developer.foursquare.com/docs/resources/categories  

Sushi - 4bf58dd8d48988d1d2941735  
Highschool - 4bf58dd8d48988d13d941735  
University - 4bf58dd8d48988d1ae941735  
Office - 4d4b7105d754a06375d81259  

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# for webscraping import Beautiful Soup 
from bs4 import BeautifulSoup

import xml

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [48]:
# Extracting data from wikipedia using BeautifulSoup
wurl = requests.get("https://es.wikipedia.org/wiki/Anexo:Comunas_de_Chile_por_poblaci%C3%B3n").text
soup = BeautifulSoup(wurl,'lxml')

In [49]:
# Converting extracted Data into readable formate i.e. DataFrame
table_post = soup.find('table')
fields = table_post.find_all('td')

comuna = []
region = []
provincia = []
pob2017 = []
pos_nac = []
pos_reg = []
pos_pro = []
pob2002 = []
pob1992 = []



for i in range(0, len(fields), 9):
    comuna.append(fields[i].text.strip())
    region.append(fields[i+1].text.strip())
    provincia.append(fields[i+2].text.strip())
    pob2017.append(fields[i+3].text.strip())
    pos_nac.append(fields[i+4].text.strip())
    pos_reg.append(fields[i+5].text.strip())
    pos_pro.append(fields[i+6].text.strip())
    pob2002.append(fields[i+7].text.strip())
    pob1992.append(fields[i+8].text.strip())
   

df = pd.DataFrame(data=[region,comuna]).transpose()
df.columns = ['Borough','Neighborhood']
df.head()

Unnamed: 0,Borough,Neighborhood
0,Metropolitana de Santiago,Puente Alto
1,Metropolitana de Santiago,Maipú
2,Metropolitana de Santiago,Santiago
3,Metropolitana de Santiago,La Florida
4,Antofagasta,Antofagasta


In [37]:
df.shape

(346, 2)

In [50]:
df1 = df[(df.Borough == 'Metropolitana de Santiago')]
df1.shape

(52, 2)

In [51]:
# Extracting data from wikipedia using BeautifulSoup
url = requests.get("https://es.wikipedia.org/wiki/Anexo:C%C3%B3digos_postales_de_Chile").text 
soup2 = BeautifulSoup(url,'lxml')

In [52]:
# Converting extracted Data into readable formate i.e. DataFrame
table = soup2.find('table')
ff = table.find_all('td')

comuna = []
codigo = []
                   
for i in range(0, len(ff), 2):
    comuna.append(ff[i].text.strip())
    codigo.append(ff[i+1].text.strip())
    
df2 = pd.DataFrame(data=[comuna,codigo]).transpose()
df2.columns = ['Neighborhood','PostalCode']
df2.head()

Unnamed: 0,Neighborhood,PostalCode
0,Algarrobo,2710000
1,Alhué,9650000
2,Alto Biobío,4590000
3,Alto del Carmen,1650000
4,Alto Hospicio,1130000


In [53]:
# Getting data and reading through pandas
url="https://raw.githubusercontent.com/ssikam/My-Capstone-Project/master/chile%20geo%20public.csv"
df3 = pd.read_csv(url, encoding="ISO-8859-1", sep=";", names=["Pais", "Region", "Ciudad", "Neighborhood", "Lat", "Lon"])
df3.head()

Unnamed: 0,Pais,Region,Ciudad,Neighborhood,Lat,Lon
0,Chile,Antofagasta,Antofagasta,Antofagasta,-23.651,-70.395
1,Chile,Antofagasta,Antofagasta,Mejillones,-23.11,-70.456
2,Chile,Antofagasta,Antofagasta,Sierra Gorda,-22.898,-69.323
3,Chile,Antofagasta,Antofagasta,Taltal,-25.41,-70.489
4,Chile,Antofagasta,El Loa,Calama,-22.474,-68.924


In [54]:
# Getting GeoLocation and merging it with our dataframe
dff = pd.merge(df1, df3[['Neighborhood','Lat', 'Lon']],
                       how='left', on=['Neighborhood'])
dff[39:52]

Unnamed: 0,Borough,Neighborhood,Lat,Lon
39,Metropolitana de Santiago,Talagante,-33.667,-70.931
40,Metropolitana de Santiago,Paine,-33.812,-70.723
41,Metropolitana de Santiago,Padre Hurtado,-33.576,-70.8
42,Metropolitana de Santiago,Isla de Maipo,-33.754,-70.886
43,Metropolitana de Santiago,El Monte,-33.684,-71.017
44,Metropolitana de Santiago,Curacaví,-33.399,-71.137
45,Metropolitana de Santiago,Pirque,-33.65,-70.564
46,Metropolitana de Santiago,Calera de Tango,-33.628,-70.785
47,Metropolitana de Santiago,Tiltil,-33.085,-70.925
48,Metropolitana de Santiago,San José de Maipo,-33.644,-70.353


In [55]:
dff.shape

(52, 4)

In [56]:
dff1 = dff.copy()
dff1.head(10)

Unnamed: 0,Borough,Neighborhood,Lat,Lon
0,Metropolitana de Santiago,Puente Alto,-33.616,-70.57
1,Metropolitana de Santiago,Maipú,-33.49,-70.788
2,Metropolitana de Santiago,Santiago,-33.425,-70.566
3,Metropolitana de Santiago,La Florida,-33.525,-70.538
4,Metropolitana de Santiago,San Bernardo,-33.582,-70.687
5,Metropolitana de Santiago,Las Condes,-33.4,-70.503
6,Metropolitana de Santiago,Peñalolén,-33.482,-70.538
7,Metropolitana de Santiago,Pudahuel,-33.411,-70.836
8,Metropolitana de Santiago,Quilicura,-33.361,-70.729
9,Metropolitana de Santiago,Ñuñoa,-33.454,-70.604


In [58]:
# Getting decimal coordinates of Santiago
address = 'Santiago, Chile'

geolocator = Nominatim(user_agent="capstoneProject")
location = geolocator.geocode(address, timeout=60, exactly_one=True)
latitude = location.latitude
longitude = location.longitude
print('The decimal coordinates of Santiago are {}, {}.'.format(latitude, longitude))

The decimal coordinates of Santiago are -33.4377968, -70.6504451.


In [59]:
# Plotting  Map of Santiago, Chile
map_stgo = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, local in zip(dff1['Lat'], dff1['Lon'], dff1['Neighborhood']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_stgo)  
    
map_stgo

In [60]:

import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Localidad', 
                  'Localidad Latitude', 
                  'Localidad Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)

In [61]:
LIMIT = 500 
radius = 5000 
CLIENT_ID = 'HOGUNVD0HOW1SCXZQPCGHCJCJNGWDLOUWZ4RQWFPEYG0LSNK'
CLIENT_SECRET = 'MZG5BHCXKKO5ZTIUUXK4LIIS0NRAM0M4GTXX1ZK4WWV4AYMX'
VERSION = '20181020'

In [63]:
# Making Foursquare call for sushi Restaurant
#https://developer.foursquare.com/docs/resources/categories
#Sushi = 4bf58dd8d48988d1d2941735
stgo_venues_sushi = getNearbyVenues(names=dff1['Neighborhood'], latitudes=dff1['Lat'], longitudes=dff1['Lon'], radius=1000, categoryIds='4bf58dd8d48988d1d2941735')
stgo_venues_sushi.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Puente Alto,-33.616,-70.57,Sushi Han’ El Delivery,-33.610953,-70.57273,Sushi Restaurant
1,Puente Alto,-33.616,-70.57,Fi Sushi,-33.609349,-70.570438,Sushi Restaurant
2,Puente Alto,-33.616,-70.57,Sushi Illadi,-33.612373,-70.574906,Sushi Restaurant
3,Puente Alto,-33.616,-70.57,Mazushi,-33.612084,-70.576012,Sushi Restaurant
4,Puente Alto,-33.616,-70.57,Sushi Bar Otai,-33.609399,-70.575004,Sushi Restaurant


In [64]:
stgo_venues_sushi.shape

(186, 7)

In [65]:
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Localidad'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [66]:
# Plotting Sushi Restaurant
map_stgo_sushi = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(stgo_venues_sushi, 'red', map_stgo_sushi)
map_stgo_sushi

In [71]:
#Making Foursquare call for Highschools
#Highschools = 4bf58dd8d48988d13d941735
stgo_venues_highschools = getNearbyVenues(names=dff1['Neighborhood'], latitudes=dff1['Lat'], longitudes=dff1['Lon'], radius=1000, categoryIds='4bf58dd8d48988d13d941735')
stgo_venues_highschools.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Puente Alto,-33.616,-70.57,Complejo Eduacional Consolidada,-33.611524,-70.571431,High School
1,Puente Alto,-33.616,-70.57,Colegio Polivalente Domingo Matte Mesias,-33.608298,-70.57975,High School
2,Puente Alto,-33.616,-70.57,Colegio Nidal,-33.611392,-70.573604,High School
3,Puente Alto,-33.616,-70.57,Colegio Tacora,-33.609515,-70.570331,High School
4,Maipú,-33.49,-70.788,Colegio Alicante El Rosal,-33.490642,-70.779636,High School


In [68]:
stgo_venues_highschools.shape

(137, 7)

In [72]:
#Plotting HighSchools
map_stgo_highschools = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(stgo_venues_highschools, 'green', map_stgo_highschools)
map_stgo_highschools

In [77]:
# Making Foursquare call for Colleges/University
stgo_venues_uni = getNearbyVenues(names=dff1['Neighborhood'], latitudes=dff1['Lat'], longitudes=dff1['Lon'], radius=1000, categoryIds='4bf58dd8d48988d1ae941735')
stgo_venues_uni.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Santiago,-33.425,-70.566,Sala de Lenguaje,-33.425443,-70.569583,College Auditorium
1,Santiago,-33.425,-70.566,Sala de sonido-Incacea,-33.420655,-70.557696,University
2,Santiago,-33.425,-70.566,Casa Estudios,-33.418552,-70.566596,University
3,Santiago,-33.425,-70.566,Aula 3- Incacea,-33.417995,-70.556946,University
4,Las Condes,-33.4,-70.503,Escuela de Comunicaciones DuocUC,-33.400398,-70.505778,University


In [74]:
stgo_venues_uni.shape

(137, 7)

In [75]:
map_stgo_universities = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(stgo_venues_uni, 'gold', map_stgo_universities)
map_stgo_universities

In [76]:
# Making Foursquare call for Office Venues
stgo_venues_office = getNearbyVenues(names=dff1['Neighborhood'], latitudes=dff1['Lat'], longitudes=dff1['Lon'], radius=1000, categoryIds='4d4b7105d754a06375d81259')
stgo_venues_office.head()

Unnamed: 0,Localidad,Localidad Latitude,Localidad Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Puente Alto,-33.616,-70.57,Dimao Municipalidad Puente Alto,-33.610656,-70.57634,Distribution Center
1,Puente Alto,-33.616,-70.57,Volcán Planta Concha y Toro,-33.620133,-70.573563,Office
2,Puente Alto,-33.616,-70.57,Molino Puente Alto,-33.612333,-70.573923,Factory
3,Puente Alto,-33.616,-70.57,CMPC Tissue S.A.,-33.611117,-70.565389,Office
4,Puente Alto,-33.616,-70.57,Easy Dent,-33.608245,-70.574998,Dentist's Office


In [78]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Localidad').count()
    
    for n in startDf['Localidad']:
        try:
            startDf.loc[startDf['Localidad'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Localidad'] == n,columnTitle] = 0

In [79]:
df_data = dff1.copy()
df_data.rename(columns={'Neighborhood':'Localidad'}, inplace=True)
addColumn(df_data, 'Sushi', stgo_venues_sushi)
addColumn(df_data, 'High Schools', stgo_venues_highschools)
addColumn(df_data, 'Universities', stgo_venues_uni)
addColumn(df_data, 'Offices', stgo_venues_office)
df_data.head()

Unnamed: 0,Borough,Localidad,Lat,Lon,Sushi,High Schools,Universities,Offices
0,Metropolitana de Santiago,Puente Alto,-33.616,-70.57,6.0,4.0,0.0,50.0
1,Metropolitana de Santiago,Maipú,-33.49,-70.788,2.0,1.0,0.0,16.0
2,Metropolitana de Santiago,Santiago,-33.425,-70.566,6.0,9.0,4.0,50.0
3,Metropolitana de Santiago,La Florida,-33.525,-70.538,0.0,0.0,0.0,19.0
4,Metropolitana de Santiago,San Bernardo,-33.582,-70.687,3.0,6.0,0.0,48.0


In [80]:
# negative weight
weight_sushi = -1

# positive weight, ascending
weight_schools = 1
weight_uni = 2
weight_offices = 3

In [81]:
df_weighted = df_data[['Localidad']].copy()

In [82]:
df_weighted['Score'] = df_data['Sushi'] * weight_sushi + df_data['High Schools'] * weight_schools + df_data['Universities'] * weight_uni + df_data['Offices'] * weight_offices
df_weighted = df_weighted.sort_values(by=['Score'], ascending=False)
df_weighted[39:52]

Unnamed: 0,Localidad,Score
41,Padre Hurtado,27.0
46,Calera de Tango,24.0
40,Paine,21.0
7,Pudahuel,12.0
50,San Pedro,6.0
45,Pirque,3.0
20,La Granja,3.0
38,Cerrillos,0.0
37,San Ramón,0.0
34,Peñaflor,0.0


In [83]:
map_stgo_result = folium.Map(location=[latitude, longitude], zoom_start=15)

stgo_win = dff1[dff1['Neighborhood'] == 'Providencia']

for lat, lng, local in zip(stgo_win['Lat'], stgo_win['Lon'], stgo_win['Neighborhood']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_stgo_result) 

addToMap(stgo_venues_sushi[stgo_venues_sushi['Localidad'] == 'Providencia'], 'red', map_stgo_result)
addToMap(stgo_venues_highschools[stgo_venues_highschools['Localidad'] == 'Providencia'], 'green', map_stgo_result)
addToMap(stgo_venues_uni[stgo_venues_uni['Localidad'] == 'Providencia'], 'gold', map_stgo_result)
addToMap(stgo_venues_office[stgo_venues_office['Localidad'] == 'Providencia'], 'fuchsia', map_stgo_result)

map_stgo_result