# Predicción de Precio de Alojamiento 🤑

Damos un paso más en nuestra aventura de ciencia de datos, pasando de ser detectives de datos a arquitectos de predicciones. Nos adentraremos en el reino del aprendizaje automático y aplicaremos nuestras habilidades para construir un modelo capaz de predecir los precios de las viviendas en el mercado inmobiliario.

En este proceso, usaremos las valiosas conclusiones obtenidas de nuestro Análisis Exploratorio de Datos (EDA) para alimentar nuestro modelo y ajustarlo de manera óptima. Prepárate para embarcarte en la experimentación con algoritmos, la afinación de hiperparámetros y la validación de modelos.

¡Arranquemos con este emocionante proceso de construcción de modelos!


# 1. Preparar los datos 🧹

Pues tenemos 106 columnas en el dataset, pero no todas nos sirven para estimar el precio de un anuncio de airbnb. ¿Cuales escogerías tú?

In [66]:
import pandas as pd

# Carga del archivo csv en un DataFrame de pandas, convirtiendo ciertos valores en booleanos y nulos
listings_df = pd.read_csv(
    '/workspaces/keepler_technical_assessment_gorka_bengochea/data/listings_detailed.csv',
    true_values=['t'], false_values=['f'], na_values=[None, 'none'],low_memory = False)

pd.set_option('display.max_columns', None)

listings_df.head(3)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,6369,https://www.airbnb.com/rooms/6369,20200110222856,2020-01-11,"Rooftop terrace room , ensuite bathroom",Excellent connection with the AIRPORT and EXHI...,BETTER THAN A HOTEL.Upscale neighboorhood (Met...,Excellent connection with the AIRPORT and EXHI...,,Nice and definitely non touristic neighborhoo...,If you are a group/family 3 or 4 people we off...,Excelent public transport and easy Access to m...,"Full use of living room, kitchen (with dishwas...","English, spanish, german, russian, some french...",,,,https://a0.muscache.com/im/pictures/683224/4cc...,,13660,https://www.airbnb.com/users/show/13660,Simon,2009-04-16,"Madrid, Madrid, Spain","Gay couple, heterofriendly, enjoy having guest...",within an hour,100%,,True,https://a0.muscache.com/im/pictures/user/1c793...,https://a0.muscache.com/im/pictures/user/1c793...,Hispanoamérica,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'offlin...",True,False,"Madrid, Comunidad de Madrid, Spain",Chamartín,Hispanoamérica,Chamartín,Madrid,Comunidad de Madrid,28016,Madrid,"Madrid, Spain",ES,Spain,40.45628,-3.67763,True,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{Wifi,""Air conditioning"",Kitchen,Elevator,Heat...",172.0,$70.00,$350.00,$950.00,$0.00,$5.00,2,$15.00,1,365,1,1,365,365,1.0,365.0,5 days ago,True,22,52,82,82,2020-01-11,73,14,2010-03-14,2019-12-13,98.0,10.0,10.0,10.0,10.0,9.0,10.0,True,,,False,False,flexible,False,False,1,0,1,0,0.61
1,21853,https://www.airbnb.com/rooms/21853,20200110222856,2020-01-11,Bright and airy room,We have a quiet and sunny room with a good vie...,I am living in a nice flat near the centre of ...,We have a quiet and sunny room with a good vie...,,We live in a leafy neighbourhood with plenty o...,We are a 15 min bus ride away from the Casa de...,The flat is near the centre of Madrid (15 minu...,There is fibre optic internet connection for y...,If I am at home and see each other around here...,Gracias por no fumar en la casa. Es muy import...,,,https://a0.muscache.com/im/pictures/68483181/8...,,83531,https://www.airbnb.com/users/show/83531,Abdel,2010-02-21,"Madrid, Madrid, Spain",EN-ES-FR\r\nEN\r\nHi everybody: I'm Abdel. I'm...,,,,False,https://a0.muscache.com/im/users/83531/profile...,https://a0.muscache.com/im/users/83531/profile...,Aluche,2.0,2.0,"['email', 'phone', 'reviews', 'manual_offline'...",True,True,"Madrid, Madrid, Spain",Aluche,Cármenes,Latina,Madrid,Madrid,28047,Madrid,"Madrid, Spain",ES,Spain,40.40341,-3.74084,False,Apartment,Private room,1,1.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,""...",97.0,$17.00,$98.00,$370.00,,,1,$8.00,4,40,4,4,40,40,4.0,40.0,12 months ago,True,0,0,0,162,2020-01-11,33,0,2014-10-10,2018-07-15,92.0,9.0,9.0,10.0,10.0,8.0,9.0,True,,,False,False,strict_14_with_grace_period,False,False,2,0,2,0,0.52
2,23001,https://www.airbnb.com/rooms/23001,20200110222856,2020-01-11,Apartmento Arganzuela- Madrid Rio,"Apartamento de tres dormitorios dobles, gran s...","Apartamento de lujo, tres dormitorios dobles i...","Apartamento de tres dormitorios dobles, gran s...",,"Barrio Arganzuela, junto a Madrid Rio, zonas c...",,,"Piscina de verano, zonas comunes en el interio...",,"Preparacion apartamento , entrega llaves 20 € ...",,,https://a0.muscache.com/im/pictures/58e6a770-5...,,82175,https://www.airbnb.com/users/show/82175,Jesus,2010-02-17,"Madrid, Community of Madrid, Spain","Hi,\r\n\r\nWelcome to my apartments in the dow...",within an hour,100%,,False,https://a0.muscache.com/im/pictures/user/52bdf...,https://a0.muscache.com/im/pictures/user/52bdf...,Erzsébetváros - District VII.,10.0,10.0,"['email', 'phone', 'reviews', 'jumio', 'offlin...",True,False,"Madrid, Comunidad de Madrid, Spain",Legazpi,Legazpi,Arganzuela,Madrid,Comunidad de Madrid,28045,Madrid,"Madrid, Spain",ES,Spain,40.38695,-3.69304,True,Apartment,Entire home/apt,6,2.0,3.0,5.0,Real Bed,"{TV,Internet,Wifi,""Air conditioning"",""Wheelcha...",1184.0,$50.00,$556.00,"$1,500.00",$300.00,$30.00,1,$10.00,15,730,7,15,730,730,14.5,730.0,2 weeks ago,True,2,2,2,213,2020-01-11,0,0,,,,,,,,,,True,,,False,False,moderate,False,False,6,6,0,0,


Yo me he quedado con estas, que me parecen útiles para adivinar el precio:

In [67]:
listings_df = listings_df[["id","neighbourhood","property_type","room_type","accommodates","bathrooms","bedrooms","beds","bed_type","amenities","square_feet", "price"]]
listings_df.head(3)

Unnamed: 0,id,neighbourhood,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price
0,6369,Chamartín,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{Wifi,""Air conditioning"",Kitchen,Elevator,Heat...",172.0,$70.00
1,21853,Aluche,Apartment,Private room,1,1.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,""...",97.0,$17.00
2,23001,Legazpi,Apartment,Entire home/apt,6,2.0,3.0,5.0,Real Bed,"{TV,Internet,Wifi,""Air conditioning"",""Wheelcha...",1184.0,$50.00


¿Por qué?

1. `"id"` 🆔: No es necesariamente útil para la predicción de precios, pero es importante mantenerla para la identificación única de cada listado.
2. `"neighbourhood"` 🏘️: El vecindario puede tener un gran impacto en el precio del alojamiento. Algunos vecindarios pueden tener una demanda más alta debido a su proximidad a lugares de interés, buenos servicios de transporte, seguridad, etc.
3. `"property_type"` 🏠: El tipo de propiedad (casa, apartamento, habitación privada, etc.) puede afectar significativamente el precio. Por lo general, un apartamento entero costaría más que una habitación individual.
4. `"room_type"` 🚪: Similar a la propiedad, el tipo de habitación (toda la casa, habitación privada, habitación compartida) puede influir en el precio. Un alojamiento entero suele tener un precio más alto que una habitación privada o compartida.
5. `"accommodates"` 👥: Cuantas más personas pueda alojar una propiedad, es probable que sea más caro, ya que implica más recursos utilizados (agua, electricidad, desgaste de la propiedad).
6. `"bathrooms"` 🛁: El número de baños puede ser un factor importante en el precio, especialmente para grandes grupos que necesitan más facilidades.
7. `"bedrooms"` 🛏️: Similar al número de baños, el número de habitaciones también puede influir en el precio.
8. `"beds"` 🛌: La cantidad de camas puede influir en el precio de una lista. Una propiedad con más camas puede alojar a más personas, por lo que podría cobrar más.
9. `"bed_type"` 🛏️: El tipo de cama (cama normal, sofá cama, futón, etc.) puede afectar el precio. Las camas normales suelen ser más cómodas y por lo tanto pueden permitir un precio más alto.
10. `"amenities"` 🛋️: Las comodidades (como wifi, cocina, piscina, etc.) pueden aumentar el valor de una propiedad. Las propiedades con más comodidades suelen ser más atractivas para los huéspedes y pueden cobrar más.
11. `"square_feet"` 📏: El tamaño de la propiedad en pies cuadrados puede ser un indicador del precio. Las propiedades más grandes suelen tener un precio más alto.
12. `"price"` 💶: Este es el factor más directo y obvio en nuestra lista. El precio de un listado es el resultado de todas las otras características y factores. Es el dato final que se quiere predecir o analizar en muchos casos. Representa la cantidad que los huéspedes tienen que pagar por alojarse en la propiedad.


Pero claro no todo el mundo pone todo esto en su anuncio de airbnb... Vamos a ver cuantos huecos tiene nuestro dataset, no vaya a ser que sea un queso gruller!

In [68]:
listings_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21495 entries, 0 to 21494
Data columns (total 12 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21495 non-null  int64  
 1   neighbourhood  21361 non-null  object 
 2   property_type  21495 non-null  object 
 3   room_type      21495 non-null  object 
 4   accommodates   21495 non-null  int64  
 5   bathrooms      21480 non-null  float64
 6   bedrooms       21479 non-null  float64
 7   beds           21356 non-null  float64
 8   bed_type       21495 non-null  object 
 9   amenities      21495 non-null  object 
 10  square_feet    301 non-null    float64
 11  price          21495 non-null  object 
dtypes: float64(4), int64(2), object(6)
memory usage: 2.0+ MB


Vale pues la verdad es que nada mal! Eso sí, de los metros cuadrados casi que nos vamos olvidando... 🥲 Una pena porque ese dato era muy goloso! Pero no pasa nada para eso está el machine learning no? Hay que estimar con lo que se tiene!

In [72]:
pd.set_option('display.max_colwidth', None)

In [73]:
listings_df = listings_df[["id","neighbourhood","property_type","room_type","accommodates","bathrooms","bedrooms","beds","bed_type","amenities","price"]]
listings_df = listings_df.dropna()
listings_df.info()
listings_df.head()

<class 'pandas.core.frame.DataFrame'>
Index: 21217 entries, 0 to 21365
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21217 non-null  int64  
 1   neighbourhood  21217 non-null  object 
 2   property_type  21217 non-null  object 
 3   room_type      21217 non-null  object 
 4   accommodates   21217 non-null  int64  
 5   bathrooms      21217 non-null  float64
 6   bedrooms       21217 non-null  float64
 7   beds           21217 non-null  float64
 8   bed_type       21217 non-null  object 
 9   amenities      21217 non-null  object 
 10  price          21217 non-null  object 
dtypes: float64(3), int64(2), object(6)
memory usage: 1.9+ MB


Unnamed: 0,id,neighbourhood,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price
0,6369,Chamartín,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{Wifi,""Air conditioning"",Kitchen,Elevator,Heating,""Family/kid friendly"",Washer,Essentials,Shampoo,Hangers,""Hair dryer"",Iron,""Hot water"",""Bed linens"",""Extra pillows and blankets"",""Pocket wifi""}",$70.00
1,21853,Aluche,Apartment,Private room,1,1.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,""Free parking on premises"",Doorman,Elevator,Heating,Washer,""First aid kit"",""Fire extinguisher"",Essentials,Shampoo,""Lock on bedroom door"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace"",""translation missing: en.hosting_amenity_49"",""translation missing: en.hosting_amenity_50"",""Hot water"",""Bed linens"",""Extra pillows and blankets"",""Pocket wifi"",Microwave,""Coffee maker"",Refrigerator,""Dishes and silverware"",""Cooking basics"",Oven}",$17.00
2,23001,Legazpi,Apartment,Entire home/apt,6,2.0,3.0,5.0,Real Bed,"{TV,Internet,Wifi,""Air conditioning"",""Wheelchair accessible"",Pool,Kitchen,""Paid parking off premises"",""Smoking allowed"",""Washer / Dryer"",Doorman,Elevator,Heating,Washer,Dryer,Essentials,Shampoo,""Lock on bedroom door"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace"",""Private living room"",Bathtub,""Children’s books and toys"",Crib,""Room-darkening shades"",""Children’s dinnerware"",""Game console"",""Hot water"",""Bed linens"",""Extra pillows and blankets"",""Ethernet connection"",Microwave,""Coffee maker"",Refrigerator,Dishwasher,""Dishes and silverware"",""Cooking basics"",Oven,Stove,""Single level home"",""Luggage dropoff allowed"",""Long term stays allowed"",""Paid parking on premises""}",$50.00
3,24805,Malasaña,Apartment,Entire home/apt,3,1.0,0.0,1.0,Real Bed,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,Elevator,""Buzzer/wireless intercom"",Heating,""Family/kid friendly"",Washer,""Fire extinguisher"",Essentials,Shampoo,""24-hour check-in"",Hangers,""Hair dryer"",Iron,""Laptop friendly workspace"",""Pack ’n Play/travel crib"",""Hot water"",Microwave,""Coffee maker"",Refrigerator,""Dishes and silverware"",""Cooking basics"",Oven,Stove,""Luggage dropoff allowed"",""Long term stays allowed"",""Cleaning before checkout"",""Host greets you"",""Paid parking on premises""}",$80.00
4,24836,Justicia,Apartment,Entire home/apt,4,1.5,2.0,3.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning"",Kitchen,Elevator,Heating,""Family/kid friendly"",Washer,""Fire extinguisher"",Essentials,Shampoo,Hangers,""Hair dryer"",Iron,""Hot water"",""Bed linens"",Microwave,""Coffee maker"",Refrigerator,""Dishes and silverware"",""Cooking basics"",Stove,""Patio or balcony"",""Host greets you"",""Paid parking on premises""}",$115.00


Ahora vamos a hacer un poquito de limpieza, vamos a hacer OneHot encoding con las variables categóricas y a normalizar las variables numéricas:

In [74]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MultiLabelBinarizer
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

# 1. Convert 'price' to a numerical variable
listings_df['price'] = listings_df['price'].replace('[\$,]', '', regex=True).astype(float)

# 2. Convert categorical variables to numerical variables
categorical_features = ['neighbourhood', 'property_type', 'room_type', 'bed_type']
le = LabelEncoder()
for feature in categorical_features:
    listings_df[feature] = le.fit_transform(listings_df[feature])

# 3. Transform 'amenities' into multiple binary columns
listings_df['amenities'] = listings_df['amenities'].str.replace('[{}]', '').str.split(',')
mlb = MultiLabelBinarizer(sparse_output=True)
amenities_df = pd.DataFrame.sparse.from_spmatrix(
                mlb.fit_transform(listings_df.pop('amenities')),
                index=listings_df.index,
                columns=mlb.classes_)

# Merge the amenities dataframe back into the original dataframe
listings_df = listings_df.join(amenities_df)

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(listings_df.drop(columns='price'), listings_df['price'], test_size=0.2, random_state=42)

In [76]:
listings_df.head()

Unnamed: 0,id,neighbourhood,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,price,"toilet""","""24-hour check-in""","""24-hour check-in""}","""Accessible-height bed""","""Accessible-height bed""}","""Accessible-height toilet""","""Air conditioning""","""Amazon Echo""","""BBQ grill""","""BBQ grill""}","""Baby bath""","""Baby monitor""","""Babysitter recommendations""","""Babysitter recommendations""}","""Baking sheet""","""Baking sheet""}","""Barbecue utensils""","""Barbecue utensils""}","""Bath towel""","""Bathroom essentials""","""Bathroom essentials""}","""Bathtub with bath chair""","""Beach essentials""","""Beach essentials""}","""Beach view""","""Bed linens""","""Bed linens""}","""Bedroom comforts""","""Body soap""","""Bread maker""}","""Breakfast table""","""Building staff""","""Building staff""}","""Buzzer/wireless intercom""","""Buzzer/wireless intercom""}","""Cable TV""","""Carbon monoxide detector""","""Ceiling fan""","""Central air conditioning""","""Changing table""","""Children’s books and toys""","""Children’s dinnerware""","""Children’s dinnerware""}","""Cleaning before checkout""","""Cleaning before checkout""}","""Coffee maker""","""Coffee maker""}","""Convection oven""","""Cooking basics""","""Cooking basics""}","""DVD player""","""Day bed""","""Disabled parking spot""","""Disabled parking spot""}","""Dishes and silverware""","""Dishes and silverware""}","""Double oven""","""EV charger""","""En suite bathroom""","""Espresso machine""","""Ethernet connection""","""Ethernet connection""}","""Exercise equipment""","""Extra pillows and blankets""","""Extra pillows and blankets""}","""Extra space around bed""","""Family/kid friendly""","""Family/kid friendly""}","""Fire extinguisher""","""Fire extinguisher""}","""Fireplace guards""","""Firm mattress""","""Firm mattress""}","""First aid kit""","""First aid kit""}","""Fixed grab bars for shower""","""Fixed grab bars for shower""}","""Fixed grab bars for toilet""","""Flat path to guest entrance""","""Flat path to guest entrance""}","""Formal dining area""","""Free parking on premises""","""Free street parking""","""Full kitchen""","""Full kitchen""}","""Game console""","""Game console""}","""Garden or backyard""","""Garden or backyard""}","""Gas oven""","""Ground floor access""}","""HBO GO""","""Hair dryer""","""Hair dryer""}","""Handheld shower head""","""Handheld shower head""}","""Heated floors""","""Heated towel rack""","""High chair""","""High-resolution computer monitor""","""Host greets you""","""Host greets you""}","""Hot tub""","""Hot water kettle""","""Hot water kettle""}","""Hot water""","""Hot water""}","""Indoor fireplace""","""Jetted tub""","""Lake access""","""Lake access""}","""Laptop friendly workspace""","""Laptop friendly workspace""}","""Lock on bedroom door""","""Lock on bedroom door""}","""Long term stays allowed""","""Long term stays allowed""}","""Luggage dropoff allowed""","""Luggage dropoff allowed""}","""Memory foam mattress""","""Mini fridge""","""Mobile hoist""}","""No stairs or steps to enter""","""No stairs or steps to enter""}","""Other pet(s)""","""Outdoor seating""","""Outlet covers""","""Outlet covers""}","""Pack ’n Play/travel crib""","""Pack ’n Play/travel crib""}","""Paid parking off premises""","""Paid parking on premises""","""Paid parking on premises""}","""Patio or balcony""","""Patio or balcony""}","""Pets allowed""","""Pets live on this property""","""Pillow-top mattress""","""Pocket wifi""","""Pocket wifi""}","""Private bathroom""","""Private entrance""","""Private entrance""}","""Private living room""","""Private living room""}","""Projector and screen""","""Rain shower""","""Room-darkening shades""","""Room-darkening shades""}","""Safety card""","""Safety card""}","""Self check-in""","""Shared pool""","""Shower chair""}","""Shower gel""","""Shower gel""}","""Single level home""","""Single level home""}","""Smart TV""","""Smart lock""","""Smart lock""}","""Smoke detector""","""Smoking allowed""","""Smoking allowed""}","""Soaking tub""","""Sound system""","""Stair gates""","""Standing valet""","""Steam oven""","""Step-free shower""","""Step-free shower""}","""Suitable for events""","""Suitable for events""}","""Sun loungers""","""Table corner guards""","""Toilet paper""","""Touchless faucets""","""Trash can""","""Trash can""}","""Walk-in shower""","""Warming drawer""","""Washer / Dryer""","""Well-lit path to entrance""","""Well-lit path to entrance""}","""Wheelchair accessible""","""Wide clearance to shower","""Wide doorway to guest bathroom""","""Wide entrance for guests""","""Wide entrance for guests""}","""Wide entrance""","""Wide entryway""","""Wide entryway""}","""Wide hallways""","""Wide hallways""}","""Window guards""","""Window guards""}","""Wine cooler""","""translation missing: en.hosting_amenity_49""","""translation missing: en.hosting_amenity_49""}","""translation missing: en.hosting_amenity_50""","""translation missing: en.hosting_amenity_50""}",Balcony,Bathtub,Bathtub},Beachfront,Beachfront},Bidet,Breakfast,Cat(s),Crib,Crib},Dishwasher,Dishwasher},Dog(s),Doorman,Dryer,Dryer},Elevator,Elevator},Essentials,Essentials},Gym,Hammock,Hangers,Hangers},Heating,Heating},Internet,Iron,Iron},Keypad,Keypad},Kitchen,Kitchenette,Kitchen},Lockbox,Lockbox},Microwave,Microwave},Mudroom,Netflix,Other,Other},Oven,Oven},Pool,Printer,Refrigerator,Refrigerator},Shampoo,Shampoo},Ski-in/Ski-out,Ski-in/Ski-out},Stove,Stove},TV,TV},Terrace,Washer,Washer},Waterfront,Wifi,Wifi},"{""Air conditioning""","{""Cable TV""","{""Dishes and silverware""","{""Family/kid friendly""","{""Family/kid friendly""}","{""First aid kit""","{""Free parking on premises""","{""Free street parking""","{""Paid parking off premises""","{""Pets allowed""","{""Pets live on this property""","{""Pocket wifi""}","{""Private living room""","{""Smoking allowed""","{""Smoking allowed""}","{""Wheelchair accessible""","{""translation missing: en.hosting_amenity_49""","{""translation missing: en.hosting_amenity_49""}",{Breakfast,{Elevator,{Elevator},{Essentials,{Essentials},{Heating,{Internet,{Kitchen,{Kitchen},{Pool,{TV,{Washer,{Wifi,{Wifi},{}
0,6369,17,1,2,2,1.0,1.0,0.0,4,70.0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,21853,4,1,2,1,1.0,1.0,1.0,4,17.0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,23001,41,1,0,6,2.0,3.0,5.0,4,50.0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,1,0,1,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
3,24805,43,1,0,3,1.0,0.0,1.0,4,80.0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,24836,38,1,0,4,1.5,2.0,3.0,4,115.0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


Y con este dataset, vamos con un clásico Random Forest (despues de haber probado varios modelos):

In [81]:
# Train a RandomForestRegressor
regr = RandomForestRegressor(n_estimators=100, random_state=42)
regr.fit(X_train, y_train)

# Use the model to make predictions on the test set
y_pred = regr.predict(X_test)

# Compute the root mean squared error of our predictions
print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_test, y_pred)))
print('R2 Score:', r2_score(y_test, y_pred))




Root Mean Squared Error: 281.80152013145306
R2 Score: 0.3735589545502276


