# Case: Previsão do preço de imóveis no Rio de Janeiro utilizando regressão

## Contexto
O mercado imobiliário é um importante setor da economia brasileira, e a previsão do preço de imóveis é uma questão relevante tanto para compradores quanto para vendedores. Neste caso, o objetivo é desenvolver um modelo de regressão para prever o preço de imóveis na cidade do Rio de Janeiro, utilizando dados públicos disponíveis.

## Base de dados
O Inside Airbnb (http://insideairbnb.com/get-the-data.html) oferece dados sobre anúncios de acomodações no Airbnb, incluindo informações sobre localização, tipo de imóvel, número de quartos e preços. É possível utilizar esses dados para modelar o preço dos imóveis no Rio de Janeiro, filtrando as informações relevantes para o contexto local.

## Aplicação de Ciência de Dados

### 1. Coleta e limpeza de dados
Baixe o arquivo "listings.csv" referente à cidade do Rio de Janeiro no Inside Airbnb. Em seguida, realize a limpeza dos dados removendo entradas duplicadas, tratando valores ausentes e convertendo variáveis categóricas em numéricas.
 - bronze: dado coletado em formato legivel (.parquet)
 - silver: dado filtrado ou convertido em formato (string -> date)
 - gold: dado do imóvel (id, preço, quantidade de quartos)

### 2. Análise exploratória de dados (EDA)
A EDA é realizada para identificar padrões, tendências e correlações entre as variáveis. Por exemplo, verificar a distribuição dos preços dos imóveis, identificar quais bairros possuem os imóveis mais caros e analisar a relação entre o tamanho do imóvel e o preço.

### 3. Feature engineering
Crie novas variáveis relevantes, como a distância até o centro da cidade ou a presença de comércio e serviços nas proximidades. Essas variáveis podem ajudar a melhorar o desempenho do modelo de regressão.

### 4. Modelagem e algoritmos
Separe a base de dados em conjuntos de treinamento e teste. Treine diferentes modelos de regressão, como regressão linear, árvores de decisão e regressão de floresta aleatória (Random Forest). Utilize validação cruzada para avaliar o desempenho dos modelos e selecione aquele com a melhor performance.

### 5. Avaliação e interpretação
Avalie o modelo selecionado utilizando métricas de desempenho, como o erro médio quadrático (RMSE) e o coeficiente de determinação (R²). Interprete os resultados, identificando os fatores que mais influenciam o preço dos imóveis e fornecendo insights para os interessados no mercado imobiliário.

Dessa forma, é possível criar um modelo de regressão para prever o preço de imóveis no Rio de Janeiro utilizando dados disponíveis e técnicas de ciência de dados.


#### Carregando os Dados

In [1]:
!pip install geopy

Collecting geopy
  Downloading geopy-2.3.0-py3-none-any.whl (119 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting geographiclib<3,>=1.52 (from geopy)
  Downloading geographiclib-2.0-py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.3/40.3 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: geographiclib, geopy
Successfully installed geographiclib-2.0 geopy-2.3.0


In [2]:
import pandas as pd
import pyarrow.parquet as pq
import re
from geopy.distance import distance
import matplotlib.pyplot as plt
import math

In [3]:
pd.set_option('display.max_columns', None)

In [4]:
mount_path = "/workspaces/prediction_house_price_from_airbnb/Data"

In [17]:
# read the Parquet file into a pyarrow Table

listings = pd.read_parquet(mount_path + "/bronze/listings.parquet")
#neighbourhoods = pd.read_parquet(mount_path + "/bronze/neighbourhoods.parquet")
#reviews = pd.read_parquet(mount_path + "/bronze/reviews.parquet")
#calendar = pd.read_parquet(mount_path + "/bronze/calendar.parquet")



In [18]:
listings.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,783493769216852616,https://www.airbnb.com/rooms/783493769216852616,20221229002515,2022-12-29,city scrape,"Leme, Brasil",Apartamento aconchegante próximo à praia do Le...,,https://a0.muscache.com/pictures/fe1f4b1b-a300...,491704706,https://www.airbnb.com/users/show/491704706,Felipe,2022-12-16,,,within a few hours,100%,67%,f,https://a0.muscache.com/defaults/user_pic-50x5...,https://a0.muscache.com/defaults/user_pic-225x...,Leme,2.0,2.0,"['email', 'phone']",f,f,,Leme,,-22.96421,-43.1716,Entire rental unit,Entire home/apt,2,,1 bath,1.0,1.0,"[""Wifi"", ""Dedicated workspace"", ""Kitchen"", ""Fi...",$720.00,1,365,1,1,365,365,1.0,365.0,,t,27,57,87,362,2022-12-29,0,0,0,,,,,,,,,,,f,2,2,0,0,
1,703973293620197060,https://www.airbnb.com/rooms/703973293620197060,20221229002515,2022-12-29,city scrape,Suíte com entrada independente em casarão 1,"Casa ampla, em excelente localização, situada ...",,https://a0.muscache.com/pictures/miso/Hosting-...,20362236,https://www.airbnb.com/users/show/20362236,Júlio Cesar,2014-08-21,"Rio de Janeiro, Brazil",,within an hour,70%,100%,f,https://a0.muscache.com/im/users/20362236/prof...,https://a0.muscache.com/im/users/20362236/prof...,Botafogo,4.0,5.0,"['email', 'phone']",t,t,,Botafogo,,-22.95792,-43.182226,Private room in bed and breakfast,Private room,2,,1 private bath,1.0,1.0,"[""Wifi"", ""Lock on bedroom door"", ""TV"", ""Coffee...",$599.00,4,365,4,4,365,365,4.0,365.0,,t,28,58,88,363,2022-12-29,1,1,0,2022-09-06,2022-09-06,5.0,5.0,5.0,5.0,5.0,4.0,5.0,,t,4,0,4,0,0.26


In [19]:
def clean_price_column(df):
    """
    Limpa a coluna 'price' do DataFrame 'df', removendo o símbolo de dólar ($) e a vírgula (,)
    e convertendo o resultado em um valor float. A nova coluna 'price' é adicionada ao DataFrame 'listings'.

    Args:
        df (pandas.DataFrame): O DataFrame a ser limpo.

    Returns:
        pandas.DataFrame: Uma cópia do DataFrame original com a nova coluna 'price' adicionada,
        contendo os valores de preço limpos e convertidos em float.
    """
    listings = df.copy() # cria uma cópia do DataFrame original
    listings['price'] = listings['price'].str.replace('[$,]', '', regex=True).astype(float) # substitui o símbolo de dólar e vírgula e converte para float
    return listings

def extract_bathrooms(df):
    """
    Extrai o número de banheiros da coluna 'bathrooms_text' do DataFrame 'df' e o converte para float,
    salvando o resultado na nova coluna 'bathrooms'.

    Args:
        df (pandas.DataFrame): O DataFrame a ser processado.

    Returns:
        pandas.DataFrame: Uma cópia do DataFrame original com a nova coluna 'bathrooms' adicionada,
        contendo os valores de banheiros extraídos e convertidos em float.
    """
    listings = df.copy() # cria uma cópia do DataFrame original
    listings['bathrooms'] = listings['bathrooms_text'].str.extract(r'(\d+(?:\.\d+)?)?').astype(float) # extrai o número de banheiros e converte para float
    return listings




def calc_distance_cen(row):
    """
    Calcula a distância entre as coordenadas geográficas do imóvel representado pela linha 'row' e as coordenadas do centro do Rio de Janeiro.
    Retorna o valor da distância em quilômetros.

    Args:
        row (pandas.Series): Uma linha do DataFrame 'listings' contendo informações sobre um imóvel.

    Returns:
        float: A distância em quilômetros entre o imóvel e o centro do Rio de Janeiro.
    """
    # Coordenadas geográficas do centro do Rio de Janeiro
    center_lat = -22.908333
    center_lon = -43.196388

    # Coordenadas geográficas do imóvel
    lat = row['latitude']
    lon = row['longitude']

    # Conversão de graus para radianos
    lat_rad = math.radians(lat)
    lon_rad = math.radians(lon)
    center_lat_rad = math.radians(center_lat)
    center_lon_rad = math.radians(center_lon)

    # Cálculo da distância entre as coordenadas geográficas
    delta_lat = center_lat_rad - lat_rad
    delta_lon = center_lon_rad - lon_rad
    a = math.sin(delta_lat/2)**2 + math.cos(lat_rad) * math.cos(center_lat_rad) * math.sin(delta_lon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    distance = 6371 * c # raio médio da Terra em quilômetros

    return distance


def calc_distance_cor(row):
    """
    Calcula a distância entre as coordenadas geográficas do imóvel representado pela linha 'row' e as coordenadas do centro do Rio de Janeiro.
    Retorna o valor da distância em quilômetros.

    Args:
        row (pandas.Series): Uma linha do DataFrame 'listings' contendo informações sobre um imóvel.

    Returns:
        float: A distância em quilômetros entre o imóvel e o centro do Rio de Janeiro.
    """
    # Coordenadas geográficas do centro do Rio de Janeiro
    center_lat = -22.9524
    center_lon = -43.2114

    # Coordenadas geográficas do imóvel
    lat = row['latitude']
    lon = row['longitude']

    # Conversão de graus para radianos
    lat_rad = math.radians(lat)
    lon_rad = math.radians(lon)
    center_lat_rad = math.radians(center_lat)
    center_lon_rad = math.radians(center_lon)

    # Cálculo da distância entre as coordenadas geográficas
    delta_lat = center_lat_rad - lat_rad
    delta_lon = center_lon_rad - lon_rad
    a = math.sin(delta_lat/2)**2 + math.cos(lat_rad) * math.cos(center_lat_rad) * math.sin(delta_lon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    distance = 6371 * c # raio médio da Terra em quilômetros

    return distance


def calc_distance_gal(row):
    """
    Calcula a distância entre as coordenadas geográficas do imóvel representado pela linha 'row' e as coordenadas do centro do Rio de Janeiro.
    Retorna o valor da distância em quilômetros.

    Args:
        row (pandas.Series): Uma linha do DataFrame 'listings' contendo informações sobre um imóvel.

    Returns:
        float: A distância em quilômetros entre o imóvel e o centro do Rio de Janeiro.
    """
    # Coordenadas geográficas do centro do Rio de Janeiro
    center_lat = -22.8053
    center_lon = -43.2566

    # Coordenadas geográficas do imóvel
    lat = row['latitude']
    lon = row['longitude']

    # Conversão de graus para radianos
    lat_rad = math.radians(lat)
    lon_rad = math.radians(lon)
    center_lat_rad = math.radians(center_lat)
    center_lon_rad = math.radians(center_lon)

    # Cálculo da distância entre as coordenadas geográficas
    delta_lat = center_lat_rad - lat_rad
    delta_lon = center_lon_rad - lon_rad
    a = math.sin(delta_lat/2)**2 + math.cos(lat_rad) * math.cos(center_lat_rad) * math.sin(delta_lon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    distance = 6371 * c # raio médio da Terra em quilômetros

    return distance

def calc_distance_oli(row):
    """
    Calcula a distância entre as coordenadas geográficas do imóvel representado pela linha 'row' e as coordenadas do centro do Rio de Janeiro.
    Retorna o valor da distância em quilômetros.

    Args:
        row (pandas.Series): Uma linha do DataFrame 'listings' contendo informações sobre um imóvel.

    Returns:
        float: A distância em quilômetros entre o imóvel e o centro do Rio de Janeiro.
    """
    # Coordenadas geográficas do centro do Rio de Janeiro
    center_lat = -22.9774
    center_lon = -43.3940
    # Coordenadas geográficas do imóvel
    lat = row['latitude']
    lon = row['longitude']

    # Conversão de graus para radianos
    lat_rad = math.radians(lat)
    lon_rad = math.radians(lon)
    center_lat_rad = math.radians(center_lat)
    center_lon_rad = math.radians(center_lon)

    # Cálculo da distância entre as coordenadas geográficas
    delta_lat = center_lat_rad - lat_rad
    delta_lon = center_lon_rad - lon_rad
    a = math.sin(delta_lat/2)**2 + math.cos(lat_rad) * math.cos(center_lat_rad) * math.sin(delta_lon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    distance = 6371 * c # raio médio da Terra em quilômetros

    return distance

def calc_distance_cop(row):
    """
    Calcula a distância entre as coordenadas geográficas do imóvel representado pela linha 'row' e as coordenadas do centro do Rio de Janeiro.
    Retorna o valor da distância em quilômetros.

    Args:
        row (pandas.Series): Uma linha do DataFrame 'listings' contendo informações sobre um imóvel.

    Returns:
        float: A distância em quilômetros entre o imóvel e o centro do Rio de Janeiro.
    """
    # Coordenadas geográficas do centro do Rio de Janeiro
    center_lat = -22.9739
    center_lon = -43.1853

    # Coordenadas geográficas do imóvel
    lat = row['latitude']
    lon = row['longitude']

    # Conversão de graus para radianos
    lat_rad = math.radians(lat)
    lon_rad = math.radians(lon)
    center_lat_rad = math.radians(center_lat)
    center_lon_rad = math.radians(center_lon)

    # Cálculo da distância entre as coordenadas geográficas
    delta_lat = center_lat_rad - lat_rad
    delta_lon = center_lon_rad - lon_rad
    a = math.sin(delta_lat/2)**2 + math.cos(lat_rad) * math.cos(center_lat_rad) * math.sin(delta_lon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    distance = 6371 * c # raio médio da Terra em quilômetros

    return distance

In [20]:
listings=clean_price_column(listings)
listings=extract_bathrooms(listings)
listings['distance_centro'] = listings.apply(calc_distance_cen, axis=1)
listings['distance_corcovado'] = listings.apply(calc_distance_cor, axis=1)
listings['distance_galeao'] = listings.apply(calc_distance_gal, axis=1)
listings['distance_olipico'] = listings.apply(calc_distance_oli, axis=1)
listings['distance_copacabana'] = listings.apply(calc_distance_cop, axis=1)

In [22]:
# # Remover as aspas das strings de amenities
# listings['amenities'] = listings['amenities'].str.replace('"', '')
# # Obter todos os amenities únicos em uma lista
# unique_amenities = list(set([amenity for amenities in listings['amenities'] for amenity in amenities]))
# # Criar colunas separadas para cada amenity em uma lista
# for amenity in unique_amenities:
#     listings[amenity] = listings['amenities'].apply(lambda x: 1 if amenity in x else 0)
    
listings[['last_scraped', 'host_since','calendar_last_scraped','first_review','last_review']] =( 
    listings[['last_scraped', 'host_since','calendar_last_scraped','first_review','last_review']].apply(pd.to_datetime))



# definir uma data fixa
data_fixa = pd.to_datetime('2023-05-01')

# calcular a diferença em anos entre cada data e a data fixa
listings['years'] = (data_fixa - listings['host_since']).dt.days / 365.25

In [None]:
listings.to_parquet('/workspaces/prediction_house_price_from_airbnb/Data/silver/listings_v01.parquet')

In [23]:
listings.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month,distance_centro,distance_corcovado,distance_galeao,distance_olipico,distance_copacabana,years
0,783493769216852616,https://www.airbnb.com/rooms/783493769216852616,20221229002515,2022-12-29,city scrape,"Leme, Brasil",Apartamento aconchegante próximo à praia do Le...,,https://a0.muscache.com/pictures/fe1f4b1b-a300...,491704706,https://www.airbnb.com/users/show/491704706,Felipe,2022-12-16,,,within a few hours,100%,67%,f,https://a0.muscache.com/defaults/user_pic-50x5...,https://a0.muscache.com/defaults/user_pic-225x...,Leme,2.0,2.0,"['email', 'phone']",f,f,,Leme,,-22.96421,-43.1716,Entire rental unit,Entire home/apt,2,1.0,1 bath,1.0,1.0,"[""Wifi"", ""Dedicated workspace"", ""Kitchen"", ""Fi...",720.0,1,365,1,1,365,365,1.0,365.0,,t,27,57,87,362,2022-12-29,0,0,0,NaT,NaT,,,,,,,,,f,2,2,0,0,,6.71176,4.281377,19.699012,22.815964,1.768678,0.372348
1,703973293620197060,https://www.airbnb.com/rooms/703973293620197060,20221229002515,2022-12-29,city scrape,Suíte com entrada independente em casarão 1,"Casa ampla, em excelente localização, situada ...",,https://a0.muscache.com/pictures/miso/Hosting-...,20362236,https://www.airbnb.com/users/show/20362236,Júlio Cesar,2014-08-21,"Rio de Janeiro, Brazil",,within an hour,70%,100%,f,https://a0.muscache.com/im/users/20362236/prof...,https://a0.muscache.com/im/users/20362236/prof...,Botafogo,4.0,5.0,"['email', 'phone']",t,t,,Botafogo,,-22.95792,-43.182226,Private room in bed and breakfast,Private room,2,1.0,1 private bath,1.0,1.0,"[""Wifi"", ""Lock on bedroom door"", ""TV"", ""Coffee...",599.0,4,365,4,4,365,365,4.0,365.0,,t,28,58,88,363,2022-12-29,1,1,0,2022-09-06,2022-09-06,5.0,5.0,5.0,5.0,5.0,4.0,5.0,,t,4,0,4,0,0.26,5.701366,3.049529,18.602507,21.789357,1.804552,8.692676


In [36]:
df=listings

In [34]:

# Converta as strings em listas reais
df['amenities'] = df['amenities'].apply(ast.literal_eval)

# Obtenha uma lista de todas as amenidades únicas
all_amenities = set([a for sublist in df['amenities'] for a in sublist])

# Cria uma coluna para cada amenity
for amenity in all_amenities:
    df[amenity] = np.where(df['amenities'].apply(lambda x: amenity in x), 1, 0)

# Remove a coluna 'amenities' original
df = df.drop('amenities', axis=1)



In [35]:
df.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month,distance_centro,distance_corcovado,distance_galeao,distance_olipico,distance_copacabana,years,amenity_0,amenity_1,amenity_2,amenity_3,amenity_4,amenity_5,amenity_6,amenity_7,amenity_8,amenity_9,amenity_10,amenity_11,amenity_12,amenity_13,amenity_14,amenity_15,amenity_16,amenity_17,amenity_18,amenity_19,amenity_20,amenity_21,amenity_22,amenity_23,amenity_24,amenity_25,amenity_26,amenity_27,amenity_28,amenity_29,amenity_30,amenity_31,amenity_32,amenity_33,amenity_34,amenity_35,amenity_36,amenity_37,amenity_38,amenity_39,amenity_40,amenity_41,amenity_42,amenity_43,amenity_44,amenity_45,amenity_46,amenity_47,amenity_48,amenity_49,amenity_50,amenity_51,amenity_52,amenity_53,amenity_54,amenity_55,amenity_56,amenity_57,amenity_58,amenity_59,amenity_60,amenity_61,amenity_62,amenity_63,amenity_64,amenity_65,amenity_66,amenity_67,amenity_68,amenity_69,amenity_70,amenity_71,amenity_72,amenity_73,amenity_74,amenity_75,amenity_76,amenity_77,amenity_78,amenity_79,amenity_80,amenity_81,amenity_82,amenity_83,amenity_84,amenity_85,amenity_86,amenity_87,amenity_88,amenity_89,amenity_90,amenity_91,amenity_92,amenity_93,amenity_94,amenity_95,amenity_96,amenity_97,amenity_98,amenity_99
0,783493769216852616,https://www.airbnb.com/rooms/783493769216852616,20221229002515,2022-12-29,city scrape,"Leme, Brasil",Apartamento aconchegante próximo à praia do Le...,,https://a0.muscache.com/pictures/fe1f4b1b-a300...,491704706,https://www.airbnb.com/users/show/491704706,Felipe,2022-12-16,,,within a few hours,100%,67%,f,https://a0.muscache.com/defaults/user_pic-50x5...,https://a0.muscache.com/defaults/user_pic-225x...,Leme,2.0,2.0,"['email', 'phone']",f,f,,Leme,,-22.96421,-43.1716,Entire rental unit,Entire home/apt,2,1.0,1 bath,1.0,1.0,"Wifi, Dedicated workspace, Kitchen, Fire extin...",720.0,1,365,1,1,365,365,1.0,365.0,,t,27,57,87,362,2022-12-29,0,0,0,NaT,NaT,,,,,,,,,f,2,2,0,0,,6.71176,4.281377,19.699012,22.815964,1.768678,0.372348,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,703973293620197060,https://www.airbnb.com/rooms/703973293620197060,20221229002515,2022-12-29,city scrape,Suíte com entrada independente em casarão 1,"Casa ampla, em excelente localização, situada ...",,https://a0.muscache.com/pictures/miso/Hosting-...,20362236,https://www.airbnb.com/users/show/20362236,Júlio Cesar,2014-08-21,"Rio de Janeiro, Brazil",,within an hour,70%,100%,f,https://a0.muscache.com/im/users/20362236/prof...,https://a0.muscache.com/im/users/20362236/prof...,Botafogo,4.0,5.0,"['email', 'phone']",t,t,,Botafogo,,-22.95792,-43.182226,Private room in bed and breakfast,Private room,2,1.0,1 private bath,1.0,1.0,"Wifi, Lock on bedroom door, TV, Coffee maker, ...",599.0,4,365,4,4,365,365,4.0,365.0,,t,28,58,88,363,2022-12-29,1,1,0,2022-09-06,2022-09-06,5.0,5.0,5.0,5.0,5.0,4.0,5.0,,t,4,0,4,0,0.26,5.701366,3.049529,18.602507,21.789357,1.804552,8.692676,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,782895997622988215,https://www.airbnb.com/rooms/782895997622988215,20221229002515,2022-12-29,city scrape,Apartamento próximo ao metrô,Sua família vai estar perto de tudo ao ficar n...,,https://a0.muscache.com/pictures/d48a6798-d1c8...,302417043,https://www.airbnb.com/users/show/302417043,Laís,2019-10-14,"Rio de Janeiro, Brazil",,within a few hours,100%,50%,f,https://a0.muscache.com/im/pictures/user/fadcf...,https://a0.muscache.com/im/pictures/user/fadcf...,Laranjeiras,1.0,1.0,"['email', 'phone']",t,t,,Laranjeiras,,-22.93196,-43.18018,Entire rental unit,Entire home/apt,4,1.0,1 bath,2.0,1.0,"Cooking basics, Clothing storage, Washer, Chan...",240.0,3,365,3,3,365,365,3.0,365.0,,t,19,40,70,160,2022-12-29,0,0,0,NaT,NaT,,,,,,,,,f,1,1,0,0,,3.107676,3.92249,16.113975,22.468479,4.692889,3.545517,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,23768085,https://www.airbnb.com/rooms/23768085,20221229002515,2022-12-29,city scrape,Vamos a praia,"Ótimo apartamento para duas pessoas, de frente...","Localização: praia da Barra da Tijuca, Posto 2...",https://a0.muscache.com/pictures/e43b686c-c4b4...,86611015,https://www.airbnb.com/users/show/86611015,Mauro,2016-07-28,"Rio de Janeiro, Brazil",,within an hour,100%,100%,f,https://a0.muscache.com/im/pictures/user/4fcbc...,https://a0.muscache.com/im/pictures/user/4fcbc...,Barra da Tijuca,1.0,2.0,"['email', 'phone']",t,t,"Barra da Tijuca, Rio de Janeiro, Brazil",Barra da Tijuca,,-23.01104,-43.32034,Entire rental unit,Entire home/apt,2,1.0,1 bath,1.0,6.0,"Ping pong table, Private patio or balcony, Sel...",494.0,3,60,3,4,1125,1125,3.3,1125.0,,t,5,18,48,228,2022-12-29,92,24,0,2018-04-04,2022-11-18,4.79,4.88,4.79,4.88,4.88,4.98,4.78,,t,1,1,0,0,1.59,17.073017,12.918443,23.790553,8.416713,14.426584,6.757016,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,10463735,https://www.airbnb.com/rooms/10463735,20221229002515,2022-12-29,city scrape,Sobrado aconchegante e espaçoso,"Sobrado amplo, arejado, com arquitetura inusit...",Laranjeiras é um bairro histórico do Rio de Ja...,https://a0.muscache.com/pictures/52ffaa78-2e94...,53918534,https://www.airbnb.com/users/show/53918534,Quiá,2016-01-09,"Rio de Janeiro, Brazil",Quiá Rodrigues é cineasta de animação e direto...,within a day,78%,100%,f,https://a0.muscache.com/im/pictures/user/34757...,https://a0.muscache.com/im/pictures/user/34757...,Laranjeiras,6.0,6.0,"['email', 'phone']",t,f,"Rio de Janeiro, Brazil",Laranjeiras,,-22.93555,-43.19107,Entire home,Entire home/apt,6,2.0,2 baths,2.0,3.0,"Iron, TV with standard cable, Wifi, Kitchen, A...",581.0,1,120,1,1,120,120,1.0,120.0,,t,0,0,0,181,2022-12-29,2,0,0,2016-06-28,2018-02-14,3.0,1.0,2.0,5.0,1.0,2.0,1.0,,t,6,2,4,0,0.03,3.07501,2.800748,15.963586,21.292424,4.305056,7.307324,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [29]:
df01=pd.read_parquet(path=mount_path+'/bronze/pontos_interesses_complete.parquet')

In [None]:
df=df.merge(df01, on='id')