<a href="https://colab.research.google.com/github/jtkomati/Portfolio/blob/master/Pre%C3%A7o_de_Im%C3%B3veis_em_S%C3%A3o_Paulo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Preço de Imóveis em São Paulo

Neste projeto iremos treinar um modelo para fazer a previsão do preço de venda de apartamentos na cidade de São Paulo e usar esse modelo para alimentar uma aplicação web mediante *deploy*.

Como o objetivo é focar na construção do *webapp* e em como subir uma aplicação, a etapa da análise exploratória será suprimida. 

As colunas desnecessárias e redundantes foram identificadas. Vamos mostrar como exportar e importar o modelo com a biblioteca `joblib`.

## Dados de Imóveis

Os dados usados aqui foram obtidos [neste link](https://www.kaggle.com/argonalyst/sao-paulo-real-estate-sale-rent-april-2019), e foram disponibilizados publicamente pela startup OpenImob.

Para facilitar o projeto, o professor Carlos Melo disponibilizou o arquivo `csv` [neste link](https://www.dropbox.com/s/h8blgaphkfpqsn5/sao-paulo-properties-april-2019.csv?dl=1).

## Análise e Tratamento dos Dados

Os dados originais contém 13.640 entradas e 16 colunas, sendo a coluna `Price` a nossa variável alvo.

In [None]:
# importar os pacotes necessários
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# importar o dataset para um dataframe
url_dataset = "https://www.dropbox.com/s/h8blgaphkfpqsn5/sao-paulo-properties-april-2019.csv?dl=1"
df = pd.read_csv(url_dataset)

# ver as 5 primeiras entradas
display(df.head())

Unnamed: 0,Price,Condo,Size,Rooms,Toilets,Suites,Parking,Elevator,Furnished,Swimming Pool,New,District,Negotiation Type,Property Type,Latitude,Longitude
0,930,220,47,2,2,1,1,0,0,0,0,Artur Alvim/São Paulo,rent,apartment,-23.543138,-46.479486
1,1000,148,45,2,2,1,1,0,0,0,0,Artur Alvim/São Paulo,rent,apartment,-23.550239,-46.480718
2,1000,100,48,2,2,1,1,0,0,0,0,Artur Alvim/São Paulo,rent,apartment,-23.542818,-46.485665
3,1000,200,48,2,2,1,1,0,0,0,0,Artur Alvim/São Paulo,rent,apartment,-23.547171,-46.483014
4,1300,410,55,2,2,1,1,1,0,0,0,Artur Alvim/São Paulo,rent,apartment,-23.525025,-46.482436


Os nomes dos bairros tinham uma informação desnecessária para este *dataset* específico, acrescentando a *string* `"/São Paulo"` ao final de cada nome. Usando `df_clean['District'].apply(lambda x: x.split('/')[0]` foi removido essa informação para deixar a coluna mais limpa.

Vale comentar que este *dataset* contempla duas situações: aluguel ou venda.

In [None]:
df_clean = df.copy()

# Limpar os nomes do bairros
df_clean['District'] = df_clean['District'].apply(lambda x: x.split('/')[0])

# ver as 5 primeiras entradas
df_clean.head()

Unnamed: 0,Price,Condo,Size,Rooms,Toilets,Suites,Parking,Elevator,Furnished,Swimming Pool,New,District,Negotiation Type,Property Type,Latitude,Longitude
0,930,220,47,2,2,1,1,0,0,0,0,Artur Alvim,rent,apartment,-23.543138,-46.479486
1,1000,148,45,2,2,1,1,0,0,0,0,Artur Alvim,rent,apartment,-23.550239,-46.480718
2,1000,100,48,2,2,1,1,0,0,0,0,Artur Alvim,rent,apartment,-23.542818,-46.485665
3,1000,200,48,2,2,1,1,0,0,0,0,Artur Alvim,rent,apartment,-23.547171,-46.483014
4,1300,410,55,2,2,1,1,1,0,0,0,Artur Alvim,rent,apartment,-23.525025,-46.482436


## Modelo de Machine Learning

Arbitrariamente, foi escolhido o modelo Random Forest para treinar o modelo e foi abservado três principais métricas de avaliação.

In [None]:
# dummy variables
df_clean = pd.get_dummies(df_clean)

# separar entre variáveis X e y
X_simp = df_clean.drop('Price', axis=1)
y_simp = df_clean['Price']

# split entre datasets de treino e teste
X_train_simp, X_test_simp, y_train_simp, y_test_simp = train_test_split(X_simp, y_simp, test_size=0.33)

# instanciar e treinar o modelo
model = RandomForestRegressor(random_state=42)
model.fit(X_train_simp, y_train_simp)

# fazer as previsões em cima do dataset de teste
y_pred_simp = model.predict(X_test_simp)

# métricas de avaliação
print("r2: \t{:.4f}".format(r2_score(y_test_simp, y_pred_simp)))
print("MAE: \t{:.4f}".format(mean_absolute_error(y_test_simp, y_pred_simp)))
print("MSE: \t{:.4f}".format(mean_squared_error(y_test_simp, y_pred_simp)))

r2: 	0.9362
MAE: 	48312.9533
MSE: 	22234419247.8523


#### Salvando o modelo

Agora o modelo está treinado e é capaz de realizar previsões. No entanto, está "preso" ao *kernel* rodando dentro do Google Colab.

Imagine precisar rodar todas as células novamente a cada vez que fosse fazer uma previsão. Isso seria inviável!

Para conseguir exportar o modelo de *machine learning* vamos usar a biblioteca `joblib`.

In [None]:
# salvar o modelo em formato joblib
from joblib import dump, load

dump(model, 'model.joblib') 

['model.joblib']

Uma vez que você exporta o modelo, é extremamente importante que também salve os nomes das *features* que esse modelo espera receber, e tem que ser na ordem exata que ele foi treinado.

Da mesma maneira que fizemos com o modelo, os nomes das variáveis foram salvos em `features_simples.names`.

In [None]:
# salvar os nomes das features do modelo simples
features = X_train_simp.columns.values

dump(features, 'features.names') 

['features.names']

#### Carregando o modelo

Uma vez que o modelo foi salvo em um arquivo, conseguimos carregar ele novamente usando o `pickle.load()`

In [None]:
# importar modelo e feature names
new_model = load('model.joblib') 
features = load('features.names') 

In [None]:
# ver o tipo da nova variável
type(new_model)

sklearn.ensemble._forest.RandomForestRegressor

In [None]:
import sklearn
sklearn.__version__

'0.22.2.post1'

In [None]:
X_simp

Unnamed: 0,Condo,Size,Rooms,Toilets,Suites,Parking,Elevator,Furnished,Swimming Pool,New,Latitude,Longitude,District_Alto de Pinheiros,District_Anhanguera,District_Aricanduva,District_Artur Alvim,District_Barra Funda,District_Bela Vista,District_Belém,District_Bom Retiro,District_Brasilândia,District_Brooklin,District_Brás,District_Butantã,District_Cachoeirinha,District_Cambuci,District_Campo Belo,District_Campo Grande,District_Campo Limpo,District_Cangaíba,District_Capão Redondo,District_Carrão,District_Casa Verde,District_Cidade Ademar,District_Cidade Dutra,District_Cidade Líder,District_Cidade Tiradentes,District_Consolação,District_Cursino,District_Ermelino Matarazzo,...,District_Perus,District_Pinheiros,District_Pirituba,District_Ponte Rasa,District_Raposo Tavares,District_República,District_Rio Pequeno,District_Sacomã,District_Santa Cecília,District_Santana,District_Santo Amaro,District_Sapopemba,District_Saúde,District_Socorro,District_São Domingos,District_São Lucas,District_São Mateus,District_São Miguel,District_São Rafael,District_Sé,District_Tatuapé,District_Tremembé,District_Tucuruvi,District_Vila Andrade,District_Vila Curuçá,District_Vila Formosa,District_Vila Guilherme,District_Vila Jacuí,District_Vila Leopoldina,District_Vila Madalena,District_Vila Maria,District_Vila Mariana,District_Vila Matilde,District_Vila Olimpia,District_Vila Prudente,District_Vila Sônia,District_Água Rasa,Negotiation Type_rent,Negotiation Type_sale,Property Type_apartment
0,220,47,2,2,1,1,0,0,0,0,-23.543138,-46.479486,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1
1,148,45,2,2,1,1,0,0,0,0,-23.550239,-46.480718,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1
2,100,48,2,2,1,1,0,0,0,0,-23.542818,-46.485665,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1
3,200,48,2,2,1,1,0,0,0,0,-23.547171,-46.483014,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1
4,410,55,2,2,1,1,1,0,0,0,-23.525025,-46.482436,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13635,420,51,2,1,0,1,0,0,0,0,-23.653004,-46.635463,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1
13636,630,74,3,2,1,2,0,0,1,0,-23.648930,-46.641982,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1
13637,1100,114,3,3,1,1,0,0,1,0,-23.649693,-46.649783,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1
13638,48,39,1,2,1,1,0,1,1,0,-23.652060,-46.637046,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1


Na sequencia utilizamos os códigos abaixo para transformar em modelo json e conseguir rodar no Insomnia.

In [None]:
import numpy as np
import json

dict(zip(X_simp.columns.values, np.zeros(X_simp.shape[0]).astype(int)))

{'Condo': 0,
 'District_Alto de Pinheiros': 0,
 'District_Anhanguera': 0,
 'District_Aricanduva': 0,
 'District_Artur Alvim': 0,
 'District_Barra Funda': 0,
 'District_Bela Vista': 0,
 'District_Belém': 0,
 'District_Bom Retiro': 0,
 'District_Brasilândia': 0,
 'District_Brooklin': 0,
 'District_Brás': 0,
 'District_Butantã': 0,
 'District_Cachoeirinha': 0,
 'District_Cambuci': 0,
 'District_Campo Belo': 0,
 'District_Campo Grande': 0,
 'District_Campo Limpo': 0,
 'District_Cangaíba': 0,
 'District_Capão Redondo': 0,
 'District_Carrão': 0,
 'District_Casa Verde': 0,
 'District_Cidade Ademar': 0,
 'District_Cidade Dutra': 0,
 'District_Cidade Líder': 0,
 'District_Cidade Tiradentes': 0,
 'District_Consolação': 0,
 'District_Cursino': 0,
 'District_Ermelino Matarazzo': 0,
 'District_Freguesia do Ó': 0,
 'District_Grajaú': 0,
 'District_Guaianazes': 0,
 'District_Iguatemi': 0,
 'District_Ipiranga': 0,
 'District_Itaim Bibi': 0,
 'District_Itaim Paulista': 0,
 'District_Itaquera': 0,
 'Dis

In [None]:
# Serializing json    
json_object = json.dumps(dict(zip(X_simp.columns.values, np.zeros(X_simp.shape[0]).astype(int).tolist())), indent = 4)   
print(json_object)  

{
    "Condo": 0,
    "Size": 0,
    "Rooms": 0,
    "Toilets": 0,
    "Suites": 0,
    "Parking": 0,
    "Elevator": 0,
    "Furnished": 0,
    "Swimming Pool": 0,
    "New": 0,
    "Latitude": 0,
    "Longitude": 0,
    "District_Alto de Pinheiros": 0,
    "District_Anhanguera": 0,
    "District_Aricanduva": 0,
    "District_Artur Alvim": 0,
    "District_Barra Funda": 0,
    "District_Bela Vista": 0,
    "District_Bel\u00e9m": 0,
    "District_Bom Retiro": 0,
    "District_Brasil\u00e2ndia": 0,
    "District_Brooklin": 0,
    "District_Br\u00e1s": 0,
    "District_Butant\u00e3": 0,
    "District_Cachoeirinha": 0,
    "District_Cambuci": 0,
    "District_Campo Belo": 0,
    "District_Campo Grande": 0,
    "District_Campo Limpo": 0,
    "District_Canga\u00edba": 0,
    "District_Cap\u00e3o Redondo": 0,
    "District_Carr\u00e3o": 0,
    "District_Casa Verde": 0,
    "District_Cidade Ademar": 0,
    "District_Cidade Dutra": 0,
    "District_Cidade L\u00edder": 0,
    "District_Cidade T

Para fazer o deploy utilizamos o PyCharm e salvamos os arquivos features.names e model.joblib dentro da pasta model.

Criamos um arquivo app.py com o código abaixo.
```
#importar os pacotes necessários

import numpy as np
from flask import Flask, jsonify, request
from flask_restful import Resource, Api
from joblib import load

# instanciar Flask object

app = Flask(__name__)

api = Api(app)

# carregar modelo

model = load('model/model.joblib')

class PrecoImoveis(Resource):
    def get(self):
        return {'Nome': 'Jeferson Komati'}
    def post(self):
        args = request.get_json(force=True)

        input_values = np.asarray(list(args.values())).reshape(1, -1)
        predict = model.predict(input_values)[0]

        return jsonify({'previsao': float(predict)})

api.add_resource(PrecoImoveis, '/')

if __name__ == '__main__':
    app.run()
```

Usaremos o framework Flask para fazer o deploy de uma API voltada para aplicações de Machine Learning.

No Pycharm instalamos os pacotes abaixo:
```
pip install gunicorn
pip install sklearn
pip install numpy
pip install flask
pip install flask-restful
```
**Deploy no Heroku**

Criamos um arquivo "Procfile" conforme abaixo:
```
web: gunicorn app:app
```
Usamos o comando abaixo para criar um arquivos com as versões utilizadas.
```
pip3 freeze > requirements.txt
```

Iniciamos o git e damos login no Heroku
```
git init
heroku login
```

Com o comando create o Heroku cria o nome do seu endereço.
```
heroku create
```
No git adicionamos e comitamos.
```
git add .
git commit -m "Enviar para o API"
```

Informamos o nome da app abaixo.
```
heroku git:remote -a nomedasuaapp
```

Finalmente utilizamos o comando abaixo para criar a API na web.
```
git push heroku master
```


Usando o Insomnia na função POST informamos o endereço criado pelo Heroku. No meu caso foi https://desolate-journey-54529.herokuapp.com/
Na sequencia copiamos os parâmetros abaixo e alterarmos os parâmetros que queremos utilizar mudando de "0" para "1". Na tela ao lado dentro do Insomnia o cálculo é feito dando a previsão de valor do imóvel.
```
{
    "Condo": 0,
    "Size": 0,
    "Rooms": 0,
    "Toilets": 0,
    "Suites": 0,
    "Parking": 0,
    "Elevator": 0,
    "Furnished": 0,
    "Swimming Pool": 0,
    "New": 0,
    "Latitude": 0,
    "Longitude": 0,
    "District_Alto de Pinheiros": 0,
    "District_Anhanguera": 0,
    "District_Aricanduva": 0,
    "District_Artur Alvim": 0,
    "District_Barra Funda": 0,
    "District_Bela Vista": 0,
    "District_Bel\u00e9m": 0,
    "District_Bom Retiro": 0,
    "District_Brasil\u00e2ndia": 0,
    "District_Brooklin": 0,
    "District_Br\u00e1s": 0,
    "District_Butant\u00e3": 0,
    "District_Cachoeirinha": 0,
    "District_Cambuci": 0,
    "District_Campo Belo": 0,
    "District_Campo Grande": 0,
    "District_Campo Limpo": 0,
    "District_Canga\u00edba": 0,
    "District_Cap\u00e3o Redondo": 0,
    "District_Carr\u00e3o": 0,
    "District_Casa Verde": 0,
    "District_Cidade Ademar": 0,
    "District_Cidade Dutra": 0,
    "District_Cidade L\u00edder": 0,
    "District_Cidade Tiradentes": 0,
    "District_Consola\u00e7\u00e3o": 0,
    "District_Cursino": 0,
    "District_Ermelino Matarazzo": 0,
    "District_Freguesia do \u00d3": 0,
    "District_Graja\u00fa": 0,
    "District_Guaianazes": 0,
    "District_Iguatemi": 0,
    "District_Ipiranga": 0,
    "District_Itaim Bibi": 0,
    "District_Itaim Paulista": 0,
    "District_Itaquera": 0,
    "District_Jabaquara": 0,
    "District_Jaguar\u00e9": 0,
    "District_Jaragu\u00e1": 0,
    "District_Jardim Helena": 0,
    "District_Jardim Paulista": 0,
    "District_Jardim S\u00e3o Luis": 0,
    "District_Jardim \u00c2ngela": 0,
    "District_Ja\u00e7an\u00e3": 0,
    "District_Jos\u00e9 Bonif\u00e1cio": 0,
    "District_Lajeado": 0,
    "District_Lapa": 0,
    "District_Liberdade": 0,
    "District_Lim\u00e3o": 0,
    "District_Mandaqui": 0,
    "District_Medeiros": 0,
    "District_Moema": 0,
    "District_Mooca": 0,
    "District_Morumbi": 0,
    "District_Pari": 0,
    "District_Parque do Carmo": 0,
    "District_Pedreira": 0,
    "District_Penha": 0,
    "District_Perdizes": 0,
    "District_Perus": 0,
    "District_Pinheiros": 0,
    "District_Pirituba": 0,
    "District_Ponte Rasa": 0,
    "District_Raposo Tavares": 0,
    "District_Rep\u00fablica": 0,
    "District_Rio Pequeno": 0,
    "District_Sacom\u00e3": 0,
    "District_Santa Cec\u00edlia": 0,
    "District_Santana": 0,
    "District_Santo Amaro": 0,
    "District_Sapopemba": 0,
    "District_Sa\u00fade": 0,
    "District_Socorro": 0,
    "District_S\u00e3o Domingos": 0,
    "District_S\u00e3o Lucas": 0,
    "District_S\u00e3o Mateus": 0,
    "District_S\u00e3o Miguel": 0,
    "District_S\u00e3o Rafael": 0,
    "District_S\u00e9": 0,
    "District_Tatuap\u00e9": 0,
    "District_Trememb\u00e9": 0,
    "District_Tucuruvi": 0,
    "District_Vila Andrade": 0,
    "District_Vila Curu\u00e7\u00e1": 0,
    "District_Vila Formosa": 0,
    "District_Vila Guilherme": 0,
    "District_Vila Jacu\u00ed": 0,
    "District_Vila Leopoldina": 0,
    "District_Vila Madalena": 0,
    "District_Vila Maria": 0,
    "District_Vila Mariana": 0,
    "District_Vila Matilde": 0,
    "District_Vila Olimpia": 0,
    "District_Vila Prudente": 0,
    "District_Vila S\u00f4nia": 0,
    "District_\u00c1gua Rasa": 0,
    "Negotiation Type_rent": 0,
    "Negotiation Type_sale": 0,
    "Property Type_apartment": 0
}
```

Abaixo mostro a tela no Insomnia para fazer o post do API.
https://imgur.com/F2EFk2p

<center><img alt="Netflix" width="100%" src="https://i.imgur.com/F2EFk2p.jpg"></center>

***Conclusão***

Com a ajuda do professor Carlos Melo consegui fazer meu primeiro deploy de API que calcula o  preço de um imóvel em São Paulo. Foi o projeto mais demorado porque envolvia PyCharm, Heroku e Insomnia. Felizmente não consegui finalizar e continuo na minha jornada para ser um Cientista de Dados.

O site do professor Carlos Melo é https://sigmoidal.ai/ e tem me ajudado bastante a aprender Data Science na prática.