# 1.0 Introdução 

## 1.1 Problema de Negocio

[House Rocket][HRlink] plataforma de compras e vendas de imoveis.

***Modelo de Negocio:*** Comprar casas com preco baico e revender com o preco mais alto. 

***Desafio***: Encontrar bons negocios dentro do portifolio disponivel, ou seja, encotrar casas com preco baixo, em boa localizacao e que tenham um bom potencial de venda por um preco mais alta.

[HRlink]: https://sejaumdatascientist.com/os-5-projetos-de-data-science-que-fara-o-recrutador-olhar-para-voce/

## 1.2 Perguntas de Negocio. 

1. Quais são os imóveis que a House Rocket deveria comprar e por qual preço?
2. Uma vez o imóvel comprado, qual o melhor momento para vende-lo e por qual preço?

## 1.3 Conteudo 

Este Notebook está organu=izado como segue: 
- Dados: Carregamento dos dados
- Transformação: Processamento dos dados. 
- Visualização: Analise exploratoria dos dados. 

# 2.0 Dados

In [3]:
import pandas as pd
pd.set_option("display.max_columns", None)

In [8]:
df_house = pd.read_csv('../data/kc_house_data.csv')

In [17]:
df_house.sample(10)

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
17714,7303200450,2015-04-08,242000.0,3,1.75,1500,7560,1.0,0,0,3,7,1500,0,1979,0,98003,47.3467,-122.296,1500,7560
7190,1565950670,2015-02-25,380500.0,3,2.5,1900,7361,2.0,0,0,3,8,1900,0,1994,0,98055,47.4324,-122.191,2100,7361
14517,1217000481,2015-02-11,345000.0,3,1.75,1930,9000,1.0,0,1,4,7,1150,780,1951,0,98166,47.4539,-122.348,1590,9000
2717,9808590310,2015-04-08,1000750.0,3,2.75,3070,10739,2.0,0,0,3,10,2440,630,1987,0,98004,47.6444,-122.191,3490,11913
7489,3275870080,2014-12-12,765000.0,4,2.5,2910,15016,2.0,0,0,3,10,2910,0,1990,0,98052,47.69,-122.097,2870,13992
1038,5104511040,2015-02-20,380000.0,4,2.5,2000,6921,2.0,0,0,3,8,2000,0,2003,0,98038,47.3559,-122.014,2430,6339
16318,8001450170,2014-08-04,274950.0,3,1.75,1840,16679,1.0,0,0,3,8,1840,0,1989,0,98001,47.3207,-122.275,1910,15571
3781,98020300,2015-02-03,759000.0,5,2.75,3490,8230,2.0,0,0,3,10,3490,0,2005,0,98075,47.5825,-121.97,3480,7331
10879,7787100390,2015-04-20,440000.0,3,2.5,2040,7605,2.0,0,0,3,8,2040,0,1996,0,98045,47.4876,-121.779,2150,7545
18013,6149700315,2015-04-10,352000.0,3,0.75,1240,7200,1.0,0,0,3,7,1240,0,1947,0,98133,47.7298,-122.342,1210,7200


In [11]:
df_house.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   id             21613 non-null  int64         
 1   date           21613 non-null  datetime64[ns]
 2   price          21613 non-null  float64       
 3   bedrooms       21613 non-null  int64         
 4   bathrooms      21613 non-null  float64       
 5   sqft_living    21613 non-null  int64         
 6   sqft_lot       21613 non-null  int64         
 7   floors         21613 non-null  float64       
 8   waterfront     21613 non-null  int64         
 9   view           21613 non-null  int64         
 10  condition      21613 non-null  int64         
 11  grade          21613 non-null  int64         
 12  sqft_above     21613 non-null  int64         
 13  sqft_basement  21613 non-null  int64         
 14  yr_built       21613 non-null  int64         
 15  yr_renovated   2161

# 3.0 Transformação

In [19]:
df_house['date'] = pd.to_datetime(df_house['date'])

In [34]:
df_house['seasonality'] = df_house['date'].dt.month.apply(
    lambda x: 'Winter' if (x == 1) | (x == 2) | (x == 12) 
    else ('Spring' if (x >= 3) & (x <= 5) 
    else ('Summer' if (x >= 6) & (x <= 8) 
    else 'Fall')))

In [23]:
df_house[['date', 'seasonality']]

Unnamed: 0,date,seasonality
0,2014-10-13,Fall
1,2014-12-09,Winter
2,2015-02-25,Winter
3,2014-12-09,Winter
4,2015-02-18,Winter
...,...,...
21608,2014-05-21,Spring
21609,2015-02-23,Winter
21610,2014-06-23,Summer
21611,2015-01-16,Winter
