Passo a passo deste colab:

1. Carregamento dos dados: Os dados das casas de São Paulo foram carregados a partir de um arquivo CSV para um DataFrame pandas.  
Exploração inicial dos dados: Foi verificado as primeiras linhas do DataFrame para entender a estrutura dos dados e as colunas existentes.  
2. Análise da coluna 'city': Foi verificado os valores únicos na coluna 'city' e a contagem de cada um. Como todos os registros eram de "São Paulo", a coluna foi removida por não adicionar informações relevantes para a análise dentro do contexto de casas em São Paulo.  
3. Tratamento de valores ausentes na coluna 'rooms': Foi identificado que a coluna 'rooms' possuía valores ausentes. A mediana da coluna foi calculada e usada para preencher os valores faltantes.  
4. Tratamento de valores na coluna 'floor': Foi identificado que a coluna 'floor' possuía o valor '-' representando andares não especificados (como casas térreas). Esses valores foram substituídos por 0 para permitir futuras análises numéricas da coluna.  
5. Verificação das alterações: Após cada etapa de tratamento, as primeiras linhas do DataFrame e as informações gerais foram verificadas para confirmar se as alterações foram aplicadas corretamente.  


Até agora, o foco foi na limpeza e organização inicial dos dados para prepará-los para análises futuras.

Link do csv usado: [houses_sp.csv](https://drive.google.com/file/d/1CTX0coNrhd7VyiojUh764AfA7crdZHbr/view?usp=sharing)

In [None]:
import numpy as np
import pandas as pd

In [None]:
houses_sp = pd.read_csv('houses_sp.csv')

In [None]:
houses_sp.head(10)

Unnamed: 0,city,area,rooms,bathroom,parking spaces,floor,hoa,rent,tax,insurance,total,page hits,days available,interactions,weeks available,type
0,São Paulo,70.0,2.0,1,1.0,7,2065,3300,211,42,5618.0,324,23,108,3,flat
1,São Paulo,320.0,4.0,4,2.0,20,1200,4960,1750,63,7973.0,720,78,240,11,flat
2,São Paulo,25.0,1.0,1,,1,0,800,25,11,836.0,1548,78,516,11,flat
3,São Paulo,650.0,3.0,3,7.0,-,0,8000,834,121,8955.0,396,66,132,9,house
4,São Paulo,213.0,,4,4.0,4,2254,3223,1735,41,7253.0,756,99,252,14,flat
5,São Paulo,152.0,2.0,2,1.0,3,1000,15000,250,191,16441.0,2412,142,804,20,flat
6,São Paulo,26.0,1.0,1,,2,470,2100,150,27,2747.0,828,85,276,12,flat
7,São Paulo,36.0,1.0,1,,11,359,2100,70,27,2556.0,360,22,120,3,flat
8,São Paulo,55.0,1.0,1,1.0,2,790,4200,224,54,5268.0,540,120,180,17,flat
9,São Paulo,100.0,2.0,2,2.0,24,900,4370,17,56,5343.0,1044,65,348,9,flat


In [None]:
houses_sp['city'].value_counts()

Unnamed: 0_level_0,count
city,Unnamed: 1_level_1
São Paulo,7143


In [None]:
houses_sp = houses_sp.drop(columns=['city'])

KeyError: "['city'] not found in axis"

In [None]:
houses_sp.head()

Unnamed: 0,area,rooms,bathroom,parking spaces,floor,hoa,rent,tax,insurance,total,page hits,days available,interactions,weeks available,type
0,70.0,2.0,1,1.0,7,2065,3300,211,42,5618.0,324,23,108,3,flat
1,320.0,4.0,4,2.0,20,1200,4960,1750,63,7973.0,720,78,240,11,flat
2,25.0,1.0,1,,1,0,800,25,11,836.0,1548,78,516,11,flat
3,650.0,3.0,3,7.0,-,0,8000,834,121,8955.0,396,66,132,9,house
4,213.0,,4,4.0,4,2254,3223,1735,41,7253.0,756,99,252,14,flat


In [None]:
houses_sp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7143 entries, 0 to 7142
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   area             7108 non-null   float64
 1   rooms            7092 non-null   float64
 2   bathroom         7143 non-null   int64  
 3   parking spaces   5586 non-null   float64
 4   floor            7143 non-null   object 
 5   hoa              7143 non-null   int64  
 6   rent             7143 non-null   int64  
 7   tax              7143 non-null   object 
 8   insurance        7143 non-null   int64  
 9   total            7143 non-null   float64
 10  page hits        7143 non-null   int64  
 11  days available   7143 non-null   int64  
 12  interactions     7143 non-null   int64  
 13  weeks available  7143 non-null   int64  
 14  type             7143 non-null   object 
dtypes: float64(4), int64(8), object(3)
memory usage: 837.2+ KB


In [None]:
houses_sp['rooms'].median()

3.0

In [None]:
houses_sp['rooms'] = houses_sp['rooms'].fillna(houses_sp['rooms'].median())

In [None]:
houses_sp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7143 entries, 0 to 7142
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   area             7108 non-null   float64
 1   rooms            7143 non-null   float64
 2   bathroom         7143 non-null   int64  
 3   parking spaces   5586 non-null   float64
 4   floor            7143 non-null   object 
 5   hoa              7143 non-null   int64  
 6   rent             7143 non-null   int64  
 7   tax              7143 non-null   object 
 8   insurance        7143 non-null   int64  
 9   total            7143 non-null   float64
 10  page hits        7143 non-null   int64  
 11  days available   7143 non-null   int64  
 12  interactions     7143 non-null   int64  
 13  weeks available  7143 non-null   int64  
 14  type             7143 non-null   object 
dtypes: float64(4), int64(8), object(3)
memory usage: 837.2+ KB


In [None]:
houses_sp.head(10)

Unnamed: 0,area,rooms,bathroom,parking spaces,floor,hoa,rent,tax,insurance,total,page hits,days available,interactions,weeks available,type
0,70.0,2.0,1,1.0,7,2065,3300,211,42,5618.0,324,23,108,3,flat
1,320.0,4.0,4,2.0,20,1200,4960,1750,63,7973.0,720,78,240,11,flat
2,25.0,1.0,1,,1,0,800,25,11,836.0,1548,78,516,11,flat
3,650.0,3.0,3,7.0,-,0,8000,834,121,8955.0,396,66,132,9,house
4,213.0,3.0,4,4.0,4,2254,3223,1735,41,7253.0,756,99,252,14,flat
5,152.0,2.0,2,1.0,3,1000,15000,250,191,16441.0,2412,142,804,20,flat
6,26.0,1.0,1,,2,470,2100,150,27,2747.0,828,85,276,12,flat
7,36.0,1.0,1,,11,359,2100,70,27,2556.0,360,22,120,3,flat
8,55.0,1.0,1,1.0,2,790,4200,224,54,5268.0,540,120,180,17,flat
9,100.0,2.0,2,2.0,24,900,4370,17,56,5343.0,1044,65,348,9,flat


In [None]:
houses_sp['floor'].value_counts()

Unnamed: 0_level_0,count
floor,Unnamed: 1_level_1
-,1847
1,727
2,506
4,402
3,399
5,370
7,312
6,297
8,291
10,256


In [None]:
houses_sp.loc[houses_sp['floor'] == '-', 'floor'] = 0

In [None]:
houses_sp

Unnamed: 0,area,rooms,bathroom,parking spaces,floor,hoa,rent,tax,insurance,total,page hits,days available,interactions,weeks available,type
0,70.0,2.0,1,1.0,7,2065,3300,211,42,5618.0,324,23,108,3,flat
1,320.0,4.0,4,2.0,20,1200,4960,1750,63,7973.0,720,78,240,11,flat
2,25.0,1.0,1,,1,0,800,25,11,836.0,1548,78,516,11,flat
3,650.0,3.0,3,7.0,0,0,8000,834,121,8955.0,396,66,132,9,house
4,213.0,3.0,4,4.0,4,2254,3223,1735,41,7253.0,756,99,252,14,flat
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7138,24.0,2.0,2,1.0,13,993,5500,141,70,6704.0,1800,93,600,13,flat
7139,280.0,4.0,4,2.0,5,4200,4000,1042,51,9293.0,1116,56,372,8,flat
7140,83.0,3.0,2,2.0,11,888,7521,221,96,8726.0,1116,84,372,12,flat
7141,150.0,3.0,3,2.0,8,0,13500,0,172,13672.0,2124,114,708,16,flat
