# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Vegan restaurant** in **Goiânia**, Brazil.

Since there aren't many vegan restaurants in Goiânia we will try to detect **locations that are mostly likely to receive well a new vegan restaurant** looking to places that alreary are this kind of restaurant and detecting similar neighborhoods. We are also particularly interested in **areas close to city center as possible**.

We will use our data science powers to generate a few most promissing neighborhoods based os this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:

* number of existing restaurantes in the neighborhood (any type of restaurant)
* number of Vegan restaurants int he neighborhood, if any
* similarity between neighborhoods
* distance of neighborhood form city center
    
Following data sources will be neede to extract/generate the required information:

* centers of candidate areas will be generated algoritmically and approximate addresses fo centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurantes and their tpe and location in every neighborhood will be obtained using Foursquare API
* coordinate of Berlin center will be obtained using Google Maps API geocoding

### Neighborhood Candidates

1. Scraping the page [https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=334547&class=boundary] to importing Goiânia borough data
2. Pre-prosesing Borough data
3. Getting Borough data Coordinates
4. Pre-prosessing Coodinates data

#### 1. Scraping page to import Goiânia borough data

Importing necessary Libraries

In [1]:
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options

import pandas as pd
import numpy as np

Scraping and cleaning the pages

In [2]:
   # Set the path to chromedriver
url = r"https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=334547&class=boundary"
    
driver = webdriver.Chrome(r"C:\Users\ErikaS\Documents\Projetos Érika\Coursera__Capstone/chromedriver")

    # Get the url
driver.get(url)

    # Select the div correct and extrating the table in html format
driver.find_element_by_xpath("/html/body/div[2]/div[3]/div/table").click()
element = driver.find_element_by_xpath("/html/body/div[2]/div[3]/div/table")
html_content = element.get_attribute('outerHTML')

    # Close the browser
driver.quit()
    
    # Tranforming
soup = BeautifulSoup(html_content, 'html.parser')
table = soup.find(name = 'table')

    # Using pandas to structure the table in a DataFrame
df_full = pd.read_html(str(table))[0]


In [3]:
df_full.head()

Unnamed: 0,Local name,Type,OSM,Address rank,Admin level,Distance,Unnamed: 6
0,Goiânia,boundary:administrative,relation 334547,16.0,8.0,0.0,details >
1,Microrregião de Goiânia,boundary:administrative,relation 4857379,14.0,7.0,0.1012,details >
2,Região Geográfica Intermediária de Goiânia,boundary:administrative,relation 4873222,10.0,5.0,0.8761,details >
3,Goiás,boundary:administrative(state),relation 334443,8.0,4.0,0.7385,details >
4,Região Centro-Oeste,boundary:administrative,relation 3359944,6.0,3.0,5.2385,details >


#### 2. Pre-prosesing Borough data

In [4]:
# Selecting only the borough
borough_gyn = df_full[df_full[ 'Address rank'] > 17.0] .sort_values("Local name")

# Selecting useful columns
borough_gyn = borough_gyn.loc[:, ['Local name', 'Distance']]

# Cleaning column Distance
borough_gyn['Distance'] = borough_gyn['Distance'].str.replace(" km", "").str.replace("~", "").str.replace(" m", "").astype(float)

# Excluding duplicate data 
borough_gyn = borough_gyn.groupby('Local name').max()
borough_gyn.reset_index(level=0, inplace=True)

borough_gyn.head()

Unnamed: 0,Local name,Distance
0,Aldeia do Vale,8.7
1,Alphaville Flamboyant Residencial Araguaia,3.8
2,Bairro Boa Vista,13.6
3,Bairro Capuava,6.6
4,Bairro Feliz,3.0


In [5]:
borough_gyn.shape

(333, 2)

Saving the data to use later

In [6]:
borough_gyn.to_csv("bairros_gyn.csv", index=False)

#### Importing Borough data

In [None]:
borough_gyn = pd.read_csv("bairros_gyn.csv")
Endere = borough_gyn['BAIRRO'] + ", " + borough_gyn['CIDADE/ESTADO'] 
Endere.head()

#### 2. Data coordinates

Importing libraries to get geocoordinates

In [None]:
import geopandas as gpd
import pandas as pd

Using Nominatim API to get the coordinates

In [None]:
# Testing
gpd.tools.geocode("Jardim Califórnia, Goiânia, GO", provider = 'nominatim', user_agent="imp geocode")

In [None]:
coor

### Foursquare

## Methodology <a name="methodology"></a>

## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>