# 1 - Introdução

Esse notebook tem por objetivo, descrever meu projeto em busca de implementar uma regressão linear tendo como base os dados de aluguéis do Brasil, afim de prever futuros preços, fazendo assim, possivel a implementação de decisões de negócios mais lucrativas, onde a base de dados pode ser encontrada no [kaggle](https://www.kaggle.com/datasets/rubenssjr/brasilian-houses-to-rent/code), então, vamos explorar!

## 1.1 - Primeiras impressões

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [5]:
df = pd.read_csv('/home/jdspy/Documentos/data/houses_to_rent.csv').iloc[:, 1:]

In [6]:
df

Unnamed: 0,city,area,rooms,bathroom,parking spaces,floor,animal,furniture,hoa,rent amount,property tax,fire insurance,total
0,1,240,3,3,4,-,acept,furnished,R$0,"R$8,000","R$1,000",R$121,"R$9,121"
1,0,64,2,1,1,10,acept,not furnished,R$540,R$820,R$122,R$11,"R$1,493"
2,1,443,5,5,4,3,acept,furnished,"R$4,172","R$7,000","R$1,417",R$89,"R$12,680"
3,1,73,2,2,1,12,acept,not furnished,R$700,"R$1,250",R$150,R$16,"R$2,116"
4,1,19,1,1,0,-,not acept,not furnished,R$0,"R$1,200",R$41,R$16,"R$1,257"
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6075,1,50,2,1,1,2,acept,not furnished,R$420,"R$1,150",R$0,R$15,"R$1,585"
6076,1,84,2,2,1,16,not acept,furnished,R$768,"R$2,900",R$63,R$37,"R$3,768"
6077,0,48,1,1,0,13,acept,not furnished,R$250,R$950,R$42,R$13,"R$1,255"
6078,1,160,3,2,2,-,not acept,not furnished,R$0,"R$3,500",R$250,R$53,"R$3,803"


**As definições das nossas variáveis são:**


- city: A coluna city contem a cidade onde esta localizado o imóvel.
- area: Corresponde a área do imóvel.
- rooms: Corresponde a quantidade de quartos.
- bathroom: Corresponde a quantidade de banheiros.
- parking spaces: corresponde a vagas de garagem.
- floor: Corresponde ao andar do imóvel.
- furniture: Indica se o imóvel está mobiliado ou não.
- hoa: Correspode ao valor do condomínio.
- rent amount: Corresponde ao valor do aluguel.
- property tax: Corresponde ao valor do IPTU.
- fire insurance: Corresponde ao valor do segure incêndio.
- total: Corresponde ao valor total das despesas mensais do imóvel.


## 1.2 - Avançando na limpeza de dados

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6080 entries, 0 to 6079
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   city            6080 non-null   int64 
 1   area            6080 non-null   int64 
 2   rooms           6080 non-null   int64 
 3   bathroom        6080 non-null   int64 
 4   parking spaces  6080 non-null   int64 
 5   floor           6080 non-null   object
 6   animal          6080 non-null   object
 7   furniture       6080 non-null   object
 8   hoa             6080 non-null   object
 9   rent amount     6080 non-null   object
 10  property tax    6080 non-null   object
 11  fire insurance  6080 non-null   object
 12  total           6080 non-null   object
dtypes: int64(5), object(8)
memory usage: 617.6+ KB


In [8]:
df.describe()

Unnamed: 0,city,area,rooms,bathroom,parking spaces
count,6080.0,6080.0,6080.0,6080.0,6080.0
mean,0.863322,151.143914,2.492599,2.341612,1.75625
std,0.343535,375.559485,1.129665,1.43886,1.611909
min,0.0,10.0,1.0,1.0,0.0
25%,1.0,58.0,2.0,1.0,1.0
50%,1.0,100.0,3.0,2.0,1.0
75%,1.0,200.0,3.0,3.0,2.0
max,1.0,24606.0,10.0,10.0,12.0


In [15]:
tmp = {
    'city': 'cidade',
    'area': 'area',
    'rooms': 'quartos',
    'bathroom': 'banheiro',
    'parking spaces': 'vagas_de_estacionamento',
    'floor': 'andar',
    'animal': 'animal',
    'furniture': 'moveis',
    'hoa': 'condominio',
    'rent amount': 'valor_do_aluguel',
    'property tax': 'imposto_sobre_a_propriedade',
    'fire insurance': 'seguro_contra_incendios',
    'total': 'total'
}

df.rename(columns=tmp, inplace=True)

In [20]:
df.replace({'acept' : True, 'not acept': False}, inplace=True) # Tratando a variável "animal"

In [24]:
df.replace({'furnished' : True, 'not furnished': False}, inplace=True) # Tratando variável "Móveis"

In [27]:
df.replace({'-' : np.nan}, inplace=True) # Tratando variável "andar"

In [50]:
df.replace(',', '', regex=True, inplace=True)

In [57]:
df.replace({'Sem info' : np.nan, 'Incluso': 'R$0'}, inplace=True)

In [75]:
def handle_currency_format(var):
    if type(var) == str:
        return var.split('$')[1]
    else:
        return var

0          0.0
1        540.0
2       4172.0
3        700.0
4          0.0
         ...  
6075     420.0
6076     768.0
6077     250.0
6078       0.0
6079     489.0
Name: condominio, Length: 6080, dtype: float64