# Capstone Project - The Battle of the Neighborhoods (Week 2) #
## Applied Data Science Capstone by IBM/Coursera ##
### Fernando Tauscheck ###

## Table of contents
1. [Introduction: Business Problem](#introduction)<br>
2. [Data](#data)<br>
 2.1 [Geographic Data:](#data_geolocated)<br>

## 1. Introduction: Business Problem <a name="introduction"></a>

What define a success of an comercial business? Is location an important factor?

In this project we will try to understand how the location affects an business. Althrough the analysis can, in theory, be replicated for any type of business, we will focus on stakeholders interested in opening a **bakery** in Curitiba, Brazil.

Since there are lots of bakerys in Curitiba we will try to detect **the geographic and social characteristics of the existing bakeries and try to classify the best unexplored points with these characteristics.**

As result, we will try to provide to our stakeholders a list of the best possibles location to open a bakery 

## 2. Data: <a name="data"></a>


Some factors will have influence in our decission:
* number of existing bakerys in the neighborhood (or some correlated store)
* Socioeconomic data of the neighborhoods of Curitiba (Per capta income, population density, ...)
* Zones from City Master Plan
* Proximity to streets and avenues of great circulation
* If possible, we can try to compare the classification (stars) of each bekary and undestand if location and classification have any correlation

### 2.1 Geographic Data: <a name="data_geolocated"></a>###

We will get geographic information from Curitiba at the website **Instituto de Pesquisa e Planejamento Urbano de Curitiba** (Institute of Urban Planning and Research of Curitiba also know as IPPUC). The Institute provide maps with the zones of City Master Plan and with the mains streets. These maps was in SHP format (ESRI) and are converted do GeoJSON in a proper representation (WGS84).
These GeoJSON files will be inserted in a RDMBS (MySQL 8.0), where we will use the Spatial Analysis Functions to analyse. At the GitHub of this project you can find all scripts of support and the structure of the tables (https://github.com/ftauscheck/The-Battle-of-the-Neighborhoods/tree/main/support).

In [18]:
import configparser
import MySQLdb
import numpy as np
import pandas as pd
import plotly.express as px    
import json

# Retrieve configuration
config = configparser.ConfigParser()
config.read('config.ini')
database = config['database']

# Connect to database
db = MySQLdb.connect(host = database['host'],
                     port = int(database['port']),
                     user = database['user'],
                     passwd = database['passwd'])
cur = db.cursor()

with open('support/zones.geojson', 'r') as myfile:
	data=myfile.read()

temp_content=[]
gj = json.loads(data)

names = []
for k in gj['features']:
    names += [k['properties']['SG_ZONA']]
unique_names = set(names)

for zona in unique_names:
    row = {}
    cur.execute("SELECT * FROM project.zones_adjust WHERE sg_zone = '" + zona + "';")
    records = cur.fetchall()
    row['sg_zone'] = zona
    row['sg_short'] = records[0][1]
    temp_content.append(row)
df=pd.DataFrame(temp_content)
df.head()
             
##### Create the Choropleth map with Plotly

fig = px.choropleth_mapbox(df, geojson=gj, color="sg_short",
                           locations="sg_zone", 
                           featureidkey="properties.SG_ZONA",
                           hover_data= ['sg_short'],                       
                           center={"lat": -25.40409592, "lon": -49.26429676},
                           mapbox_style="carto-positron", 
                           zoom=5.6,
                           opacity = 0.7,)
                           
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig.show()

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

From Wikipedia (https://pt.wikipedia.org/wiki/Lista_de_bairros_de_Curitiba) we are scrapping the Socioeconimic data of neighborhoods of Curitiba. These data will be inserted at the MySQL. Scripts can be found at GitHub too (https://github.com/ftauscheck/The-Battle-of-the-Neighborhoods/blob/main/support/neighbour_wiki.py).


Another source of information will be the FourSquare API, where we 
** 

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:

centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API
coordinate of Berlin center will be obtained using Google Maps API geocoding of well known Berlin location (Alexanderplatz)

Para nossa análise utilizaremos dados do IPPUC **Instituto de Pesquisa e Planejamento Urbano de Curitiba** (Institute of Urban Planning and Research of Curitiba also know as IPPUC) sobre o Plano Diretor de Curitiba e principais vias da cidade, junto com os dados do Foursquare (obtidos através de requisições via API).





 O Plano Diretor de Curitiba divide a área do município em Zonas, com regras claras sobre ocupação do solo, altura das edificações e possibilidade (ou não) de comércio ou indústrias no local. Na imagem abaixo (Google Earth) podemos ver o Plano Diretor aplicado. Em vermelho temos Avenidas com grande capacidade. Em laranja uma via dedicada ao transporte público e serviços. No entorno desta via é possível a instalação de prédios mais altos e serviços, enquanto que duas quadras para o lado não temos mais prédios altos.

![title](img/GE_zones.png)

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

In [1]:
import configparser
config = configparser.ConfigParser()
config.read('config.ini')
print(config.sections())
config['database']['host']

for key in config['database']:  
    print(key)
    

['database', 'foursquare_api', 'google_api']
host
port
user
passwd
