# Capstone Project - The Battle of the Neighborhoods

This notebook is for the <a href="https://www.coursera.org/professional-certificates/ibm-data-science">IBM Data Science Professional Certificate.</a>.

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a hair salon. Specifically, this report will be targeted to stakeholders interested in opening a **Hair Salon** in **Boise, Idaho**.

Since there are lots of restaurants in Boise we will try to detect **locations that are not already crowded with Hair Salons**. We are also particularly interested in **areas with no Hair Salons in the vicinity**. We would also prefer locations **as close to center of the city as possible**, assuming that first two conditions are met.

We will use data science techniques to generate the most promissing neighborhoods based on the above criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing hair salons in the neighborhood 
* number of and distance to hair salons in the neighborhood, if any
* distance of the neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Berlin center will be obtained using **Google Maps API geocoding** of well known Berlin location (Alexanderplatz)

### Make the necessary imports

In [1]:
import requests
import pandas as pd
import numpy as np
import re

from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup
import folium 

import sys

!{sys.executable} -m pip install geopy



### Get the zipcodes from www.zip-codes.com

In [3]:
link = "https://www.zip-codes.com/city/id-boise.asp"
r = requests.get(link)

soup = BeautifulSoup(r.content)
print("Zip codes loaded in.")

Zip codes loaded in.


In [4]:
p = re.compile("ZIP Code (\d{5})")

table_data = soup.find('table', attrs = {'class': 'statTable'})
content = table_data.find_all('a')

zip_code = []

for l in content:
    if p.match(l.text) != None:
        tmp = int (pd.Series([p.search(l.text).group(1)]))
        zip_code.append(tmp)
    
zips=pd.DataFrame(zip_code)
zips.rename(columns={0: "Zip"}, inplace=True)

### Import the lat lon coordinates 

In [5]:
link_lat = (
    "https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/" 
    "?format=csv&timezone=America/Mexico_City&lang=en&use_labels_for_header=true&csv_separator=%3B"
    )

df_lat_lon = pd.read_csv(link_lat, sep=";")
print("Latitude and Longitude data loaded in.")

Latitude and Longitude data loaded in.


In [6]:
type(df_lat_lon)

df_merged = pd.merge(zips, df_lat_lon, on="Zip", how="inner")
df_merged.drop(columns=["City", "State", "Timezone", "Daylight savings time flag", "geopoint"], inplace=True)

In [7]:
address = 'Boise, Idaho'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Boise, Idaho are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Boise, Idaho are 43.6166163, -116.200886.


In [12]:
# create map of Boise using latitude and longitude values
map_boise = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, zip_code in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Zip']):
    label = '{}'.format(zip_code)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_boise)
    
map_boise