# 1.0 Introdução 

## 1.1 Problema de Negocio

[House Rocket][HRlink] plataforma de compras e vendas de imoveis.

***Modelo de Negocio:*** Comprar casas com preco baico e revender com o preco mais alto. 

***Desafio***: Encontrar bons negocios dentro do portifolio disponivel, ou seja, encotrar casas com preco baixo, em boa localizacao e que tenham um bom potencial de venda por um preco mais alta.

[HRlink]: https://sejaumdatascientist.com/os-5-projetos-de-data-science-que-fara-o-recrutador-olhar-para-voce/

## 1.2 Perguntas de Negocio. 

1. Quais são os imóveis que a House Rocket deveria comprar e por qual preço?
2. Uma vez o imóvel comprado, qual o melhor momento para vende-lo e por qual preço?

## 1.3 Conteudo 

Este Notebook está organu=izado como segue: 
- Dados: Carregamento dos dados
- Transformação: Processamento dos dados. 
- Visualização: Analise exploratoria dos dados. 

# 2.0 Dados

In [4]:
import pandas as pd
pd.set_option("display.max_columns", None)

In [5]:
df_house = pd.read_csv('../data/kc_house_data.csv')

In [6]:
df_house.sample(10)

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
5636,4027700321,20141028T000000,420000.0,3,1.75,2390,11242,1.0,0,0,3,7,1290,1100,1959,0,98155,47.7759,-122.272,2270,9650
20086,2768100512,20150422T000000,659000.0,2,2.5,1450,1213,2.0,0,0,3,9,1110,340,2015,0,98107,47.6692,-122.372,1620,1456
13940,1923000150,20150424T000000,754000.0,5,3.5,3020,15305,2.0,0,0,3,10,2230,790,1978,0,98040,47.5627,-122.216,3680,14486
5716,9201300020,20140811T000000,1517000.0,3,2.25,2610,9409,1.0,1,4,4,8,2610,0,1963,0,98075,47.5789,-122.076,2970,9156
1256,993001629,20141117T000000,265000.0,3,2.75,1120,881,3.0,0,0,3,8,1120,0,1999,0,98103,47.6914,-122.343,1120,1087
11475,1972202505,20140729T000000,543000.0,3,2.5,1540,1256,3.0,0,0,3,8,1540,0,2004,0,98103,47.6498,-122.346,1500,1350
12633,945000410,20150313T000000,265000.0,2,1.0,910,4600,1.0,0,0,3,5,910,0,1917,0,98117,47.6916,-122.362,1020,4600
4987,644000185,20140707T000000,875000.0,3,1.5,1820,12686,1.0,0,0,4,7,1820,0,1952,0,98004,47.5886,-122.195,3020,11550
12449,3905100310,20140625T000000,544000.0,4,2.5,2030,3974,2.0,0,0,3,8,2030,0,1994,0,98029,47.5692,-122.006,1780,3953
10779,2600110710,20140819T000000,602000.0,3,2.25,1580,11580,1.0,0,0,4,8,1580,0,1979,0,98006,47.5503,-122.155,2630,10009


In [7]:
df_house.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21613 non-null  int64  
 1   date           21613 non-null  object 
 2   price          21613 non-null  float64
 3   bedrooms       21613 non-null  int64  
 4   bathrooms      21613 non-null  float64
 5   sqft_living    21613 non-null  int64  
 6   sqft_lot       21613 non-null  int64  
 7   floors         21613 non-null  float64
 8   waterfront     21613 non-null  int64  
 9   view           21613 non-null  int64  
 10  condition      21613 non-null  int64  
 11  grade          21613 non-null  int64  
 12  sqft_above     21613 non-null  int64  
 13  sqft_basement  21613 non-null  int64  
 14  yr_built       21613 non-null  int64  
 15  yr_renovated   21613 non-null  int64  
 16  zipcode        21613 non-null  int64  
 17  lat            21613 non-null  float64
 18  long  

In [8]:
df_house.describe()

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
count,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0
mean,4580302000.0,540088.1,3.370842,2.114757,2079.899736,15106.97,1.494309,0.007542,0.234303,3.40943,7.656873,1788.390691,291.509045,1971.005136,84.402258,98077.939805,47.560053,-122.213896,1986.552492,12768.455652
std,2876566000.0,367127.2,0.930062,0.770163,918.440897,41420.51,0.539989,0.086517,0.766318,0.650743,1.175459,828.090978,442.575043,29.373411,401.67924,53.505026,0.138564,0.140828,685.391304,27304.179631
min,1000102.0,75000.0,0.0,0.0,290.0,520.0,1.0,0.0,0.0,1.0,1.0,290.0,0.0,1900.0,0.0,98001.0,47.1559,-122.519,399.0,651.0
25%,2123049000.0,321950.0,3.0,1.75,1427.0,5040.0,1.0,0.0,0.0,3.0,7.0,1190.0,0.0,1951.0,0.0,98033.0,47.471,-122.328,1490.0,5100.0
50%,3904930000.0,450000.0,3.0,2.25,1910.0,7618.0,1.5,0.0,0.0,3.0,7.0,1560.0,0.0,1975.0,0.0,98065.0,47.5718,-122.23,1840.0,7620.0
75%,7308900000.0,645000.0,4.0,2.5,2550.0,10688.0,2.0,0.0,0.0,4.0,8.0,2210.0,560.0,1997.0,0.0,98118.0,47.678,-122.125,2360.0,10083.0
max,9900000000.0,7700000.0,33.0,8.0,13540.0,1651359.0,3.5,1.0,4.0,5.0,13.0,9410.0,4820.0,2015.0,2015.0,98199.0,47.7776,-121.315,6210.0,871200.0


# 3.0 Transformação

In [9]:
df_house['date'] = pd.to_datetime(df_house['date'])

In [10]:
df_house['seasonality'] = df_house['date'].dt.month.apply(
    lambda x: 'Winter' if (x == 1) | (x == 2) | (x == 12) 
    else ('Spring' if (x >= 3) & (x <= 5) 
    else ('Summer' if (x >= 6) & (x <= 8) 
    else 'Fall')))

In [11]:
df_house[['date', 'seasonality']]

Unnamed: 0,date,seasonality
0,2014-10-13,Fall
1,2014-12-09,Winter
2,2015-02-25,Winter
3,2014-12-09,Winter
4,2015-02-18,Winter
...,...,...
21608,2014-05-21,Spring
21609,2015-02-23,Winter
21610,2014-06-23,Summer
21611,2015-01-16,Winter


In [12]:
def gen_sale_table(df_house):
    df_sale = df_house.copy()
    df_sale['seasonality_median_price'] = df_sale.groupby(['zipcode', 'seasonality'])['price'].transform('median')

    for index, row in df_sale.iterrows():
        if row['price'] > row['seasonality_median_price']:
            df_sale.loc[index, 'sale_price'] = row['price'] + calculate_percentage(row['price'], 10)

        else:
            df_sale.loc[index, 'sale_price'] = row['price'] + calculate_percentage(row['price'], 30)

       # df_sale.loc[index, 'profit'] = row['sale_price'] - row['price']

    return df_sale

In [14]:
def calculate_percentage(value, percentage):
    return value * (percentage/100)

In [15]:
gen_sale_table(df_house)

KeyError: 'sale_price'

In [3]:
df_house.groupby(['zipcode', 'seasonality'])['price'].transform('median')

NameError: name 'df_house' is not defined