# Airbnb Data Analysis: 
## Making your house the best at the platform!
---
This notebook is part of the Udacity's Data Scientist Nanodegree program

In this project, I will be investigating Airbnb data and answering relevant questions using the **CRISP-DM** process:

1. Business Understanding
2. Data Understanding
3. Data Preparation 
4. Data Modeling
5. Results & Evaluation

# 1. Business Understanding

**The 3 questions I'm looking to answer are:**

1. Does more expensive houses have better reviews?
2. What are the main features that influences the review rates? What about the prices?
3. Which city has the best listings? Which one has more expensive ones? Is there a connection in that?

# 2. Data Understanding

### Importing Necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import string
from sklearn.linear_model import LinearRegression, LassoCV, RidgeCV, ElasticNetCV
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.preprocessing import StandardScaler, OneHotEncoder, normalize
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.impute import SimpleImputer

pd.set_option('display.max_columns', 500)
%matplotlib inline

### Reading Data
**Boston Airbnb Data**

In [2]:
df_bos_lis = pd.read_csv('BostonData/listings.csv')
df_bos_rev = pd.read_csv('BostonData/reviews.csv')

**Seattle Aribnb Data**

In [3]:
df_sea_lis = pd.read_csv('SeattleData/listings.csv')
df_sea_rev = pd.read_csv('SeattleData/reviews.csv')

In [4]:
display(df_bos_lis.head(1), df_sea_lis.head(1))

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,12147973,https://www.airbnb.com/rooms/12147973,20160906204935,2016-09-07,Sunny Bungalow in the City,"Cozy, sunny, family home. Master bedroom high...",The house has an open and cozy feel at the sam...,"Cozy, sunny, family home. Master bedroom high...",none,"Roslindale is quiet, convenient and friendly. ...",,"The bus stop is 2 blocks away, and frequent. B...","You will have access to 2 bedrooms, a living r...",,Clean up and treat the home the way you'd like...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,31303940,https://www.airbnb.com/users/show/31303940,Virginia,2015-04-15,"Boston, Massachusetts, United States",We are country and city connecting in our deck...,,,,f,https://a2.muscache.com/im/pictures/5936fef0-b...,https://a2.muscache.com/im/pictures/5936fef0-b...,Roslindale,1,1,"['email', 'phone', 'facebook', 'reviews']",t,f,"Birch Street, Boston, MA 02131, United States",Roslindale,Roslindale,,Boston,MA,2131,Boston,"Boston, MA",US,United States,42.282619,-71.133068,t,House,Entire home/apt,4,1.5,2.0,3.0,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",,$250.00,,,,$35.00,1,$0.00,2,1125,2 weeks ago,,0,0,0,0,2016-09-06,0,,,,,,,,,,f,,,f,moderate,f,f,1,


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,241032,https://www.airbnb.com/rooms/241032,20160104002432,2016-01-04,Stylish Queen Anne Apartment,,Make your self at home in this charming one-be...,Make your self at home in this charming one-be...,none,,,,,,https://a1.muscache.com/ac/pictures/67560560/c...,,956883,https://www.airbnb.com/users/show/956883,Maija,2011-08-11,"Seattle, Washington, United States","I am an artist, interior designer, and run a s...",within a few hours,96%,100%,f,https://a0.muscache.com/ac/users/956883/profil...,https://a0.muscache.com/ac/users/956883/profil...,Queen Anne,3.0,3.0,"['email', 'phone', 'reviews', 'kba']",t,t,"Gilman Dr W, Seattle, WA 98119, United States",Queen Anne,West Queen Anne,Queen Anne,Seattle,WA,98119,Seattle,"Seattle, WA",US,United States,47.636289,-122.371025,t,Apartment,Entire home/apt,4,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",,$85.00,,,,,2,$5.00,1,365,4 weeks ago,t,14,41,71,346,2016-01-04,207,2011-11-01,2016-01-02,95.0,10.0,10.0,10.0,10.0,9.0,10.0,f,,WASHINGTON,f,moderate,f,f,2,4.07


In [5]:
display(df_bos_rev.head(), df_sea_rev.head())

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,5150351,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,5171140,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,7202016,38917982,2015-07-19,28943674,Bianca,Cute and cozy place. Perfect location to every...
1,7202016,39087409,2015-07-20,32440555,Frank,Kelly has a great room in a very central locat...
2,7202016,39820030,2015-07-26,37722850,Ian,"Very spacious apartment, and in a great neighb..."
3,7202016,40813543,2015-08-02,33671805,George,Close to Seattle Center and all it has to offe...
4,7202016,41986501,2015-08-10,34959538,Ming,Kelly was a great host and very accommodating ...


# 3. Data Preparation

- Since both cities have similar datasets, I'll create a pipeline to clean both of them the same way

### Reviews Dataset
- My objective with this dataset is to classify each review as good or bad. Then, I'll count how many good and bad reviews each listing has. 
    - I believe this will be a good parameter for predicting both price and review rating 

In [6]:
# checking first rows of the dataset
df_bos_rev.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,5150351,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,5171140,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


In [7]:
# making a copy of the original dataset
df_bos_rev_c = df_bos_rev.copy()

# checking the dataset info
df_bos_rev_c.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68275 entries, 0 to 68274
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   listing_id     68275 non-null  int64 
 1   id             68275 non-null  int64 
 2   date           68275 non-null  object
 3   reviewer_id    68275 non-null  int64 
 4   reviewer_name  68275 non-null  object
 5   comments       68222 non-null  object
dtypes: int64(3), object(3)
memory usage: 3.1+ MB


**Reviews classification process**
1. I'll will create two lists, one with positive words and other with negative words. 
2. I'll count how many words of each list are present in each comment
3. The category wich has more words, will be the category of the review

In [8]:
# good words list
positive_words = ['good', 'great', 'amazing', 'perfect', 'nice', 'cool', 'cozy', 'amazing', 'comfortable', 'loved',
                  'enjoyed', 'lovely', 'wonderful', 'fantastic', 'pleasure', 'brilliant', 'pleasant', 'superb',
                  'charming', 'awesome', 'beautiful', 'fun', 'excellent']

# bad words list
negative_words = ['bad', 'terrible', 'horrible', 'uncomfortable', 'dirty', 'cancelled', 'unconvenient', 'never',
                  'hated', 'disliked', 'ugly', 'boring', 'refund', 'exhausted', 'tired','garbage','not acceptable',
                  'refund', 'awful', 'damages', 'never', 'frustrated', 'canceled', 'cancel', 'panic', 'horror',
                  'worst']

# classification function
def classify_good_or_bad(comment):
    # making sure comment is string
    comment = str(comment)
    
    # removing all sorts of punctuation
    for char in string.punctuation:
        comment.replace(char, '')
    
    # make it all lowercase
    comment = comment.lower()
    
    # get all words
    comment = comment.split()
    
    # good words count
    positive_count = 0
    for word in comment:
        if word in positive_words:
            positive_count += 1
            
    # bad words count
    negative_count = 0
    for word in comment:
        if word in negative_words:
            negative_count += 1
            
    # classifying 
    if positive_count > negative_count:
        return 'positive'
    elif positive_count < negative_count:
        return 'negative'
    else:
        return 'unknwon'

**Creating new good or bad review column**

In [9]:
df_bos_rev_c['review_cat'] = df_bos_rev_c.comments.apply(classify_good_or_bad)
df_bos_rev_c['review_cat'].value_counts()

positive    53084
unknwon     13467
negative     1724
Name: review_cat, dtype: int64

**Dropping unnecessary columns and creating dummy variables**

In [10]:
df_bos_rev_c = df_bos_rev_c[['listing_id', 'review_cat']]
df_bos_rev_c = pd.concat([df_bos_rev_c['listing_id'], pd.get_dummies(df_bos_rev_c['review_cat'], prefix='review')], axis=1)
df_bos_rev_c.head()

Unnamed: 0,listing_id,review_negative,review_positive,review_unknwon
0,1178162,0,1,0
1,1178162,0,1,0
2,1178162,0,1,0
3,1178162,0,1,0
4,1178162,0,1,0


In [11]:
boston_reviews = df_bos_rev_c.groupby(['listing_id']).sum().reset_index()
boston_reviews.head()

Unnamed: 0,listing_id,review_negative,review_positive,review_unknwon
0,3353,0.0,27.0,7.0
1,5506,0.0,30.0,6.0
2,6695,0.0,40.0,7.0
3,6976,1.0,36.0,4.0
4,8792,0.0,17.0,1.0


This looks great! 
Later I'll combine this as features in the final dataframe!

### Listings Dataset
- My objective with this dataset is to get the main features of each listing and it's review status. That way, I'll be able to run a model to predict the listing price or review rate.

In [12]:
# creating copy of the dataset
df_bos_lis_c = df_bos_lis.copy()
df_bos_lis_c.head(1)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,12147973,https://www.airbnb.com/rooms/12147973,20160906204935,2016-09-07,Sunny Bungalow in the City,"Cozy, sunny, family home. Master bedroom high...",The house has an open and cozy feel at the sam...,"Cozy, sunny, family home. Master bedroom high...",none,"Roslindale is quiet, convenient and friendly. ...",,"The bus stop is 2 blocks away, and frequent. B...","You will have access to 2 bedrooms, a living r...",,Clean up and treat the home the way you'd like...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,https://a2.muscache.com/im/pictures/c0842db1-e...,31303940,https://www.airbnb.com/users/show/31303940,Virginia,2015-04-15,"Boston, Massachusetts, United States",We are country and city connecting in our deck...,,,,f,https://a2.muscache.com/im/pictures/5936fef0-b...,https://a2.muscache.com/im/pictures/5936fef0-b...,Roslindale,1,1,"['email', 'phone', 'facebook', 'reviews']",t,f,"Birch Street, Boston, MA 02131, United States",Roslindale,Roslindale,,Boston,MA,2131,Boston,"Boston, MA",US,United States,42.282619,-71.133068,t,House,Entire home/apt,4,1.5,2.0,3.0,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",,$250.00,,,,$35.00,1,$0.00,2,1125,2 weeks ago,,0,0,0,0,2016-09-06,0,,,,,,,,,,f,,,f,moderate,f,f,1,


**Getting only necessary columns**

In [13]:
# selecting only necessary columns
df_bos_lis_c = df_bos_lis_c[['id', 'market', 'host_id', 'host_is_superhost', 'neighbourhood_cleansed', 
                             'property_type', 'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 
                             'bed_type', 'amenities', 'price', 'cleaning_fee', 'number_of_reviews', 
                             'review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness', 
                             'review_scores_checkin', 'review_scores_communication', 'review_scores_location', 
                             'review_scores_value', 'cancellation_policy']]

df_bos_lis_c.head(3)

Unnamed: 0,id,market,host_id,host_is_superhost,neighbourhood_cleansed,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price,cleaning_fee,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,cancellation_policy
0,12147973,Boston,31303940,f,Roslindale,House,Entire home/apt,4,1.5,2.0,3.0,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",$250.00,$35.00,0,,,,,,,,moderate
1,3075044,Boston,2572247,f,Roslindale,Apartment,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",$65.00,$10.00,36,94.0,10.0,9.0,10.0,10.0,9.0,9.0,moderate
2,6976,Boston,16701,t,Roslindale,Apartment,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",""Wireless Internet"",""Air Condit...",$65.00,,41,98.0,10.0,9.0,10.0,10.0,9.0,10.0,moderate


In [14]:
df_bos_lis_c.describe()

Unnamed: 0,id,host_id,accommodates,bathrooms,bedrooms,beds,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
count,3585.0,3585.0,3585.0,3571.0,3575.0,3576.0,3585.0,2772.0,2762.0,2767.0,2765.0,2767.0,2763.0,2764.0
mean,8440875.0,24923110.0,3.041283,1.221647,1.255944,1.60906,19.04463,91.916667,9.431571,9.258041,9.646293,9.646549,9.414043,9.168234
std,4500787.0,22927810.0,1.778929,0.501487,0.75306,1.011745,35.571658,9.531686,0.931863,1.168977,0.762753,0.735507,0.903436,1.011116
min,3353.0,4240.0,1.0,0.0,0.0,0.0,0.0,20.0,2.0,2.0,2.0,4.0,2.0,2.0
25%,4679319.0,6103425.0,2.0,1.0,1.0,1.0,1.0,89.0,9.0,9.0,9.0,9.0,9.0,9.0
50%,8577620.0,19281000.0,2.0,1.0,1.0,1.0,5.0,94.0,10.0,10.0,10.0,10.0,10.0,9.0
75%,12789530.0,36221470.0,4.0,1.0,2.0,2.0,21.0,98.25,10.0,10.0,10.0,10.0,10.0,10.0
max,14933460.0,93854110.0,16.0,6.0,5.0,16.0,404.0,100.0,10.0,10.0,10.0,10.0,10.0,10.0


**Checking data types**

In [15]:
df_bos_lis_c.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3585 entries, 0 to 3584
Data columns (total 24 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           3585 non-null   int64  
 1   market                       3571 non-null   object 
 2   host_id                      3585 non-null   int64  
 3   host_is_superhost            3585 non-null   object 
 4   neighbourhood_cleansed       3585 non-null   object 
 5   property_type                3582 non-null   object 
 6   room_type                    3585 non-null   object 
 7   accommodates                 3585 non-null   int64  
 8   bathrooms                    3571 non-null   float64
 9   bedrooms                     3575 non-null   float64
 10  beds                         3576 non-null   float64
 11  bed_type                     3585 non-null   object 
 12  amenities                    3585 non-null   object 
 13  price             

Let's check the 'object' type columns to see if any of them should be a numerical one

In [16]:
df_bos_lis_c.select_dtypes(include=['object']).head(1)

Unnamed: 0,market,host_is_superhost,neighbourhood_cleansed,property_type,room_type,bed_type,amenities,price,cleaning_fee,cancellation_policy
0,Boston,f,Roslindale,House,Entire home/apt,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",$250.00,$35.00,moderate


Ok, so there are 2 columns that should be numerical: 'price' and 'cleaning_fee'. **Let's fix them:**

In [17]:
# applying the functions
df_bos_lis_c.price = df_bos_lis_c.price.apply(lambda x: x.replace('$', ''))
df_bos_lis_c.price = df_bos_lis_c.price.apply(lambda x: x.replace(',', '.'))
df_bos_lis_c.price = df_bos_lis_c.price.apply(lambda x: x[:-3])

df_bos_lis_c.cleaning_fee = df_bos_lis_c.cleaning_fee.apply(lambda x: str(x) if x else None)
df_bos_lis_c.cleaning_fee = df_bos_lis_c.cleaning_fee.apply(lambda x: x.replace('$', '') if x else None)
df_bos_lis_c.cleaning_fee = df_bos_lis_c.cleaning_fee.apply(lambda x: x.replace(',', '.') if x else None)
df_bos_lis_c.cleaning_fee = df_bos_lis_c.cleaning_fee.apply(lambda x: x[:-3] if x else None)
df_bos_lis_c.cleaning_fee = df_bos_lis_c.cleaning_fee.apply(lambda x: np.nan if x=='' else x)

# checking columns
df_bos_lis_c[['price', 'cleaning_fee']].head()

Unnamed: 0,price,cleaning_fee
0,250,35.0
1,65,10.0
2,65,
3,75,50.0
4,79,15.0


**OK**, they look alright. Now I can fix the datatypes!

In [18]:
df_bos_lis_c[['price', 'cleaning_fee']] = df_bos_lis_c[['price', 'cleaning_fee']].astype('float64')
df_bos_lis_c[['price', 'cleaning_fee']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3585 entries, 0 to 3584
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         3585 non-null   float64
 1   cleaning_fee  2478 non-null   float64
dtypes: float64(2)
memory usage: 56.1 KB


**Dealing with null values**

In [19]:
df_bos_lis_c.isnull().sum()

id                                0
market                           14
host_id                           0
host_is_superhost                 0
neighbourhood_cleansed            0
property_type                     3
room_type                         0
accommodates                      0
bathrooms                        14
bedrooms                         10
beds                              9
bed_type                          0
amenities                         0
price                             0
cleaning_fee                   1107
number_of_reviews                 0
review_scores_rating            813
review_scores_accuracy          823
review_scores_cleanliness       818
review_scores_checkin           820
review_scores_communication     818
review_scores_location          822
review_scores_value             821
cancellation_policy               0
dtype: int64

**Dealing with null values**

Handling nulls in categorical columns:

In [20]:
df_bos_lis_c.select_dtypes(include=['object']).isnull().sum()

market                    14
host_is_superhost          0
neighbourhood_cleansed     0
property_type              3
room_type                  0
bed_type                   0
amenities                  0
cancellation_policy        0
dtype: int64

* Market: all of them should be 'Boston' so I'll just put that 
* Property_type: Since it's only 3, I'll fill them with the mode

In [21]:
# SimpleImputer instance
imp_mode = SimpleImputer(missing_values=np.nan, strategy='most_frequent')

# fitting into the dataset
df_bos_lis_c[['market', 'property_type']] = imp_mode.fit(df_bos_lis_c[['market', 'property_type']])
df_bos_lis_c.select_dtypes(include=['object']).isnull().sum()

market                    0
host_is_superhost         0
neighbourhood_cleansed    0
property_type             0
room_type                 0
bed_type                  0
amenities                 0
cancellation_policy       0
dtype: int64

#### Handling the null values in numerical columns

In [22]:
df_bos_lis_c.select_dtypes(include=['int64', 'float64']).isnull().sum()

id                                0
host_id                           0
accommodates                      0
bathrooms                        14
bedrooms                         10
beds                              9
price                             0
cleaning_fee                   1107
number_of_reviews                 0
review_scores_rating            813
review_scores_accuracy          823
review_scores_cleanliness       818
review_scores_checkin           820
review_scores_communication     818
review_scores_location          822
review_scores_value             821
dtype: int64

**Bathrooms and betrooms**

In [23]:
display(df_bos_lis_c.bathrooms.value_counts())
display(df_bos_lis_c.bedrooms.value_counts())

1.0    2751
2.0     478
1.5     208
2.5      68
3.0      21
0.0      13
3.5      13
0.5       7
5.0       5
6.0       4
4.0       2
4.5       1
Name: bathrooms, dtype: int64

1.0    2379
2.0     693
0.0     287
3.0     155
4.0      45
5.0      16
Name: bedrooms, dtype: int64

In this case, it makes more sense to fill them with their mode

In [24]:
# simple imputer instance
imp_mode_2 = SimpleImputer(missing_values=np.nan, strategy='most_frequent')

# fitting it in my dataframe
df_bos_lis_c[['bathrooms', 'bedrooms']] = imp_mode_2.fit(df_bos_lis_c[['bathrooms', 'bedrooms']])

df_bos_lis_c[['bathrooms', 'bedrooms']].isnull().sum()

bathrooms    0
bedrooms     0
dtype: int64