## Data Science Lopes
O teste abaixo busca entender o seu processo de resolução de problemas, e entendimento dos dados disponibilizados, dando espaço para você demonstrar seus conhecimentos de Data Science e Machine Learning.

Os dados disponibilizados são de sobre imóveis da Lopes e seu respectivo preço, as features disponibilizadas representam as características de cada imóvel.

### Objetivo
O objetivo é que o candidado **faça uma breve análise dos dados**, da forma que achar melhor, e **desenvolva um modelo de precificação (regressão)** utilizando como target/variável dependente a coluna `sale`. A função custo a se minimizar é sugerido o **RMSE**, porém caso queira utilizar outra função **justifique**.

Sugerimos que o código desenvolvido tenha todos os comentários necessários para entendermos o processo, bem como boas práticas de programação e desenvolvimento de software.

### Recomendações
Aproveite as capacidades do Jupyter Notebook de utilizar markdowns e comentários para documentar sua linha de pensamento.

### Arquivo Final
Os arquivos finais a serem entregues é o notebook `train.ipynb` e as predicts do modelo desenvolvido `output.csv`.

Após o treinamento e ajuste do modelo, as inferencias deverão ser feitas em cima do arquivo `test.csv` o qual contém todas colunas do arquivo `train.csv` exceto a **target**. O arquivo final deverá conter somente 2 colunas, `sku` e `predict` conforme o exemplo abaixo, e salvo em formato `.csv` e com o nome `output.csv`

| sku | predict |
|---|---|
| 51173fe76f683f3ccd556ed32ce2f2ca | 750000 |
| b62f5788ed7c0f37d522085fe401dab7 | 230000 |
| cb40092e1db4824b5d352e2b46f4a478 | 856700 |
| 8253796889c2733a3a10fd4d1e9af1c5 | 1000000 |

### Imports

In [15]:
# liste todos os requirements necessários abaixo
import pandas as pd

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format', lambda x: '%.4f' % x)

### Leitura dos Dados

In [16]:
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

In [17]:
print(f"train shape: {train.shape}\ntest shape: {test.shape}")

train shape: (39411, 40)
test shape: (4378, 39)


In [18]:
df.head()

Unnamed: 0,sku,condominium_fee,iptu_fee,product_floor,postal_code,lon,lat,status,category,bedroom_qty,living_area,suite_qty,bathroom_qty,parking,min_park_distance,min_train_distance,min_subway_distance,pet_care_c,piscina_coberta_c,piscina_descoberta_c,piscina_i,academia_de_ginastica_c,ar_condicionado_i,spa_c,bicicletario_c,deck_c,lavabo_i,salao_de_jogos_c,sauna_c,cozinha_americana_i,armario_embutido_i,cozinha_mobiliada_i,lavanderia_c,varanda_gourmet_i,churrasqueira_i,brinquedoteca_c,espaco_gourmet_c,playground_c,city,sale
0,51173fe76f683f3ccd556ed32ce2f2ca,1908.0,552.84,5.0,1421000,-46.6578,-23.5653,active,apartamento,3.0,166.0,1.0,3.0,2,0.3307,4.009,0.4869,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,0,0,0,0,0,0,sao paulo,2018636.0409
1,b62f5788ed7c0f37d522085fe401dab7,1746.0,289.0,11.0,1421000,-46.6578,-23.5653,active,apartamento,3.0,123.84,1.0,3.0,1,0.3307,4.009,0.4869,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,sao paulo,1464473.2891
2,cb40092e1db4824b5d352e2b46f4a478,1460.0,169.97,0.0,1421000,-46.6554,-23.5672,active,apartamento,2.0,104.0,0.0,0.0,2,0.5705,4.1132,0.5146,0,1,1,0,1,0,1,0,1,0,1,1,0,0,0,0,0,0,0,1,1,sao paulo,1940580.7971
3,8573b93d0835a1d9273e529a0def882f,2259.0,950.0,,1421000,-46.6578,-23.5653,active,apartamento,3.0,166.0,1.0,3.0,2,0.3307,4.009,0.4869,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,sao paulo,1943485.0161
4,8253796889c2733a3a10fd4d1e9af1c5,2000.0,610.0,8.0,1421000,-46.6578,-23.5653,active,apartamento,3.0,177.0,1.0,3.0,1,0.3307,4.009,0.4869,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,sao paulo,1831031.4568


### EDA
Na seção abaixo, desenvolva a sua análise exploratória dos dados, pode ser focado em alguma variável ou mais generalista.

A ideia é entender o comportamento a perfil dos dados, encontrar padrões/anomalias etc.

In [7]:
df.describe()

Unnamed: 0,condominium_fee,iptu_fee,product_floor,postal_code,lon,lat,bedroom_qty,living_area,suite_qty,bathroom_qty,...,cozinha_americana_i,armario_embutido_i,cozinha_mobiliada_i,lavanderia_c,varanda_gourmet_i,churrasqueira_i,brinquedoteca_c,espaco_gourmet_c,playground_c,sale
count,29441.0,32084.0,37921.0,39411.0,39411.0,39411.0,39386.0,39411.0,39245.0,39371.0,...,39411.0,39411.0,39411.0,39411.0,39411.0,39411.0,39411.0,39411.0,39411.0,39411.0
mean,6456.125,687.8721,4.467498,3568772.0,-46.607286,-23.533523,2.687859,158.650261,1.081437,2.669452,...,0.066809,0.378295,0.295983,0.027911,0.021136,0.133795,0.112481,0.076197,0.411814,1064902.0
std,814097.8,13249.45,9.073833,1447513.0,1.450307,0.740989,1.116276,1752.755281,1.153999,1.658778,...,0.249694,0.484968,0.456489,0.16472,0.14384,0.340436,0.315962,0.265316,0.492168,784233.4
min,0.01,0.01,-2.0,1005020.0,-50.56597,-24.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,105559.6
25%,450.0,110.0,0.0,2341000.0,-46.687143,-23.606216,2.0,65.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,468921.6
50%,800.0,290.0,2.0,3646000.0,-46.654301,-23.558141,3.0,110.0,1.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,807542.3
75%,1480.0,646.32,7.0,4705080.0,-46.619225,-23.502515,3.0,190.0,1.0,4.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1431622.0
max,139600400.0,1800000.0,890.0,8490740.0,1.0,1.0,62.0,325540.0,30.0,51.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4019495.0


### Train Model
Na seção abaixo desenvolva o treinamento do modelo de precificação utilizando a target `sale`

### Predicts

In [12]:
#predicts.to_csv("output.csv", index=False)