# Data Cleaning

## What is it about?
This notebook is about preliminary exploration and cleaning of raw data. In the real world, data is never clean; it is full of missing values, duplicates, outliers, and bad formats. Therefore, the first task is to make the dataset suitable for analysis.shouldered

## Why is it important?
Poorly cleaned data leads to misleading conclusions and bad models.
Most data scientists spend 60–70% of their working time cleaning data.

## What should it contain?

- Loading data 
- Basic information (.shape, .info(), .describe())
- Handling missing values (e.g., filling in, deleting)
- Searching for and removing duplicates
- Checking types (e.g., dates → datetime, prices → float)
- Filtering out outliers (e.g., $10,000/night on Airbnb is unrealistic)
- Standardization (e.g., currencies, formats)

### Imports

In [145]:
import pandas as pd
import numpy as np

pd.set_option("display.max_columns", None)  # összes oszlopot mutassa


### 1.0 Loading data

In [146]:
amsterdams_airbnbs_raw_data = pd.read_csv("../data/raw/amsterdam_airbnbs_data.csv")
df= amsterdams_airbnbs_raw_data
df.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,availability_eoy,number_of_reviews_ly,estimated_occupancy_l365d,estimated_revenue_l365d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,27886,https://www.airbnb.com/rooms/27886,20250609011745,2025-06-17,city scrape,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,"Central, quiet, safe, clean and beautiful.",https://a0.muscache.com/pictures/02c2da9d-660e...,97647,https://www.airbnb.com/users/show/97647,Flip,2010-03-23,"Amsterdam, Netherlands","Marjan works in ""eye"" the dutch filmmuseum, an...",within an hour,100%,98%,t,https://a0.muscache.com/im/users/97647/profile...,https://a0.muscache.com/im/users/97647/profile...,Westelijke Eilanden,1.0,1.0,"['email', 'phone']",t,t,"Amsterdam, North Holland, Netherlands",Centrum-West,,52.38761,4.89188,Private room in houseboat,Private room,2,1.5,1.5 baths,1.0,1.0,"[""Coffee maker: Nespresso"", ""Shampoo"", ""Paid s...",$132.00,3,356,3,3,30,30,3.0,30.0,,t,0,0,0,53,2025-06-17,302,28,1,53,26,218,28776.0,2012-01-09,2025-06-11,4.92,4.9,4.94,4.95,4.92,4.9,4.78,0363 974D 4986 7411 88D8,f,1,0,1,0,1.85
1,28871,https://www.airbnb.com/rooms/28871,20250609011745,2025-06-17,city scrape,Comfortable double room,Basic bedroom in the center of Amsterdam.,"Flower market , Leidseplein , Rembrantsplein",https://a0.muscache.com/pictures/160889/362340...,124245,https://www.airbnb.com/users/show/124245,Edwin,2010-05-13,"Amsterdam, Netherlands",Hi,within an hour,100%,99%,t,https://a0.muscache.com/im/pictures/user/9986b...,https://a0.muscache.com/im/pictures/user/9986b...,Amsterdam Centrum,2.0,2.0,"['email', 'phone']",t,t,"Amsterdam, North Holland, Netherlands",Centrum-West,,52.36775,4.89092,Private room in rental unit,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$78.00,2,730,1,2,730,730,2.0,730.0,,t,1,2,4,130,2025-06-17,710,93,9,31,96,255,19890.0,2010-08-22,2025-06-16,4.88,4.9,4.87,4.94,4.94,4.94,4.84,0363 607B EA74 0BD8 2F6F,f,2,0,2,0,3.93
2,29051,https://www.airbnb.com/rooms/29051,20250609011745,2025-06-17,city scrape,Comfortable single / double room,This room can also be rented as a single or a ...,the street is quite lively especially on weeke...,https://a0.muscache.com/pictures/162009/bd6be2...,124245,https://www.airbnb.com/users/show/124245,Edwin,2010-05-13,"Amsterdam, Netherlands",Hi,within an hour,100%,99%,t,https://a0.muscache.com/im/pictures/user/9986b...,https://a0.muscache.com/im/pictures/user/9986b...,Amsterdam Centrum,2.0,2.0,"['email', 'phone']",t,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,,52.36584,4.89111,Private room in condo,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$70.00,2,730,1,2,730,730,2.0,730.0,,t,0,1,3,121,2025-06-17,822,86,7,20,88,255,17850.0,2011-03-16,2025-06-14,4.81,4.88,4.83,4.93,4.92,4.87,4.79,0363 607B EA74 0BD8 2F6F,f,2,0,2,0,4.74
3,44391,https://www.airbnb.com/rooms/44391,20250609011745,2025-06-17,previous scrape,Quiet 2-bedroom Amsterdam city centre apartment,Guests greatly appreciate the unique location ...,The appartment is located in the city centre. ...,https://a0.muscache.com/pictures/97741545/3900...,194779,https://www.airbnb.com/users/show/194779,Jan,2010-08-08,"Amsterdam, Netherlands",Love to travel while hosting and to host while...,,,,f,https://a0.muscache.com/im/users/194779/profil...,https://a0.muscache.com/im/users/194779/profil...,Oostelijke Eilanden en Kadijken,1.0,1.0,"['email', 'phone']",t,t,"Amsterdam, Noord-Holland, Netherlands",Centrum-Oost,,52.37168,4.91471,Entire rental unit,Entire home/apt,4,,1.5 baths,2.0,,"[""Shampoo"", ""Essentials"", ""Dishwasher"", ""Paid ...",,3,730,3,3,730,730,3.0,730.0,,t,0,0,0,0,2025-06-17,42,0,0,0,0,0,,2010-09-16,2022-08-20,4.71,4.68,4.49,4.95,4.9,4.68,4.5,0363 E76E F06A C1DD 172C,f,1,1,0,0,0.23
4,47061,https://www.airbnb.com/rooms/47061,20250609011745,2025-06-17,city scrape,Charming apartment in old centre,"A beautiful, quiet apartment in the center of ...",,https://a0.muscache.com/pictures/268343/a08ce2...,211696,https://www.airbnb.com/users/show/211696,Ivar,2010-08-24,Netherlands,"Hi, I am a freelance theatre director and comp...",within a few hours,100%,50%,f,https://a0.muscache.com/im/pictures/user/0a2a3...,https://a0.muscache.com/im/pictures/user/0a2a3...,,1.0,2.0,"['email', 'phone']",t,t,,De Baarsjes - Oud-West,,52.36786,4.87458,Entire rental unit,Entire home/apt,3,1.5,1.5 baths,2.0,2.0,"[""Shampoo"", ""Paid street parking off premises""...",$120.00,2,20,2,2,20,20,2.0,20.0,,t,1,1,1,66,2025-06-17,203,5,1,1,6,39,4680.0,2010-09-13,2025-05-29,4.77,4.78,4.61,4.76,4.9,4.85,4.63,0363 1266 8C04 4133 E6AC,f,1,1,0,0,1.13


### 2.0 Basic informations

#### 2.1 Shape

In [147]:
num_rows = len(df.axes[0]) 
num_columns = len(df.axes[1]) 
print(f"Rows: {num_rows}, Columns: {num_columns}")

Rows: 10168, Columns: 79


#### 2.2 Info - Data type of the differnt culomns

In [148]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10168 entries, 0 to 10167
Data columns (total 79 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            10168 non-null  int64  
 1   listing_url                                   10168 non-null  object 
 2   scrape_id                                     10168 non-null  int64  
 3   last_scraped                                  10168 non-null  object 
 4   source                                        10168 non-null  object 
 5   name                                          10168 non-null  object 
 6   description                                   9859 non-null   object 
 7   neighborhood_overview                         5258 non-null   object 
 8   picture_url                                   10168 non-null  object 
 9   host_id                                       10168 non-null 

#### 2.3 Descriptive statistics

In [149]:
df.describe()

Unnamed: 0,id,scrape_id,host_id,host_listings_count,host_total_listings_count,neighbourhood_group_cleansed,latitude,longitude,accommodates,bathrooms,bedrooms,beds,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,availability_30,availability_60,availability_90,availability_365,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,availability_eoy,number_of_reviews_ly,estimated_occupancy_l365d,estimated_revenue_l365d,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
count,10168.0,10168.0,10168.0,10164.0,10164.0,0.0,10168.0,10168.0,10168.0,6377.0,9874.0,6341.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,0.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,10168.0,6321.0,9198.0,9198.0,9197.0,9198.0,9198.0,9198.0,9198.0,10168.0,10168.0,10168.0,10168.0,9198.0
mean,5.495778e+17,20250610000000.0,130296100.0,3.524105,5.40368,,52.366657,4.889542,2.926928,1.250039,1.557322,1.841981,4.129622,289.781963,3.710366,4.758163,422770.8,422793.6,4.151465,422784.2,,5.806058,14.156471,22.189418,91.035012,47.568843,8.96784,0.759638,49.949548,8.634048,51.165519,16289.15,4.844651,4.855278,4.779614,4.893529,4.907673,4.813358,4.655034,1.852085,1.208694,0.575138,0.033733,0.982216
std,5.381949e+17,0.0,173732500.0,30.578664,55.067867,,0.017125,0.035242,1.292811,0.537645,0.893615,1.625684,17.267562,393.391805,16.161718,18.869662,30116580.0,30116580.0,16.403349,30116580.0,,8.761619,18.618628,28.685512,116.066009,128.11376,25.076334,2.101968,64.496009,24.464155,77.676604,49153.8,0.250759,0.242731,0.311448,0.213664,0.213848,0.237086,0.31657,3.343481,2.51286,1.770011,0.473912,2.112976
min,27886.0,20250610000000.0,1662.0,1.0,1.0,,52.290276,4.75587,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,0.0,0.01
25%,24759680.0,20250610000000.0,12694670.0,1.0,1.0,,52.355668,4.86455,2.0,1.0,1.0,1.0,2.0,20.0,2.0,2.0,21.0,21.0,2.0,21.0,,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,2088.0,4.79,4.8,4.69,4.86,4.9,4.72,4.52,1.0,1.0,0.0,0.0,0.2
50%,6.545968e+17,20250610000000.0,44423660.0,1.0,1.0,,52.365525,4.887365,2.0,1.0,1.0,1.0,3.0,30.0,2.0,3.0,35.0,72.5,3.0,60.0,,0.0,4.0,8.0,26.0,10.0,2.0,0.0,14.0,2.0,16.0,7560.0,4.92,4.92,4.87,4.97,5.0,4.88,4.71,1.0,1.0,0.0,0.0,0.41
75%,1.05309e+18,20250610000000.0,182933600.0,1.0,2.0,,52.376452,4.908866,4.0,1.5,2.0,2.0,4.0,365.0,3.0,4.0,730.0,999.0,4.0,845.725,,9.0,23.0,38.0,167.0,31.0,6.0,1.0,90.0,6.0,55.0,20212.0,5.0,5.0,5.0,5.0,5.0,5.0,4.85,1.0,1.0,0.0,0.0,0.92
max,1.438602e+18,20250610000000.0,699642400.0,911.0,1621.0,,52.42512,5.026669,16.0,17.0,17.0,33.0,1001.0,1125.0,1001.0,1001.0,2147484000.0,2147484000.0,1001.0,2147484000.0,,30.0,60.0,90.0,365.0,4792.0,848.0,85.0,198.0,836.0,255.0,2350000.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,39.0,34.0,17.0,9.0,99.01


### 3.0 Identify and handle missing values

#### 3.1 Identify missing values

In [150]:
df.replace("?", np.nan, inplace = True)
df.replace("", np.nan, inplace = True)
df.head(5)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,availability_eoy,number_of_reviews_ly,estimated_occupancy_l365d,estimated_revenue_l365d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,27886,https://www.airbnb.com/rooms/27886,20250609011745,2025-06-17,city scrape,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,"Central, quiet, safe, clean and beautiful.",https://a0.muscache.com/pictures/02c2da9d-660e...,97647,https://www.airbnb.com/users/show/97647,Flip,2010-03-23,"Amsterdam, Netherlands","Marjan works in ""eye"" the dutch filmmuseum, an...",within an hour,100%,98%,t,https://a0.muscache.com/im/users/97647/profile...,https://a0.muscache.com/im/users/97647/profile...,Westelijke Eilanden,1.0,1.0,"['email', 'phone']",t,t,"Amsterdam, North Holland, Netherlands",Centrum-West,,52.38761,4.89188,Private room in houseboat,Private room,2,1.5,1.5 baths,1.0,1.0,"[""Coffee maker: Nespresso"", ""Shampoo"", ""Paid s...",$132.00,3,356,3,3,30,30,3.0,30.0,,t,0,0,0,53,2025-06-17,302,28,1,53,26,218,28776.0,2012-01-09,2025-06-11,4.92,4.9,4.94,4.95,4.92,4.9,4.78,0363 974D 4986 7411 88D8,f,1,0,1,0,1.85
1,28871,https://www.airbnb.com/rooms/28871,20250609011745,2025-06-17,city scrape,Comfortable double room,Basic bedroom in the center of Amsterdam.,"Flower market , Leidseplein , Rembrantsplein",https://a0.muscache.com/pictures/160889/362340...,124245,https://www.airbnb.com/users/show/124245,Edwin,2010-05-13,"Amsterdam, Netherlands",Hi,within an hour,100%,99%,t,https://a0.muscache.com/im/pictures/user/9986b...,https://a0.muscache.com/im/pictures/user/9986b...,Amsterdam Centrum,2.0,2.0,"['email', 'phone']",t,t,"Amsterdam, North Holland, Netherlands",Centrum-West,,52.36775,4.89092,Private room in rental unit,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$78.00,2,730,1,2,730,730,2.0,730.0,,t,1,2,4,130,2025-06-17,710,93,9,31,96,255,19890.0,2010-08-22,2025-06-16,4.88,4.9,4.87,4.94,4.94,4.94,4.84,0363 607B EA74 0BD8 2F6F,f,2,0,2,0,3.93
2,29051,https://www.airbnb.com/rooms/29051,20250609011745,2025-06-17,city scrape,Comfortable single / double room,This room can also be rented as a single or a ...,the street is quite lively especially on weeke...,https://a0.muscache.com/pictures/162009/bd6be2...,124245,https://www.airbnb.com/users/show/124245,Edwin,2010-05-13,"Amsterdam, Netherlands",Hi,within an hour,100%,99%,t,https://a0.muscache.com/im/pictures/user/9986b...,https://a0.muscache.com/im/pictures/user/9986b...,Amsterdam Centrum,2.0,2.0,"['email', 'phone']",t,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,,52.36584,4.89111,Private room in condo,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$70.00,2,730,1,2,730,730,2.0,730.0,,t,0,1,3,121,2025-06-17,822,86,7,20,88,255,17850.0,2011-03-16,2025-06-14,4.81,4.88,4.83,4.93,4.92,4.87,4.79,0363 607B EA74 0BD8 2F6F,f,2,0,2,0,4.74
3,44391,https://www.airbnb.com/rooms/44391,20250609011745,2025-06-17,previous scrape,Quiet 2-bedroom Amsterdam city centre apartment,Guests greatly appreciate the unique location ...,The appartment is located in the city centre. ...,https://a0.muscache.com/pictures/97741545/3900...,194779,https://www.airbnb.com/users/show/194779,Jan,2010-08-08,"Amsterdam, Netherlands",Love to travel while hosting and to host while...,,,,f,https://a0.muscache.com/im/users/194779/profil...,https://a0.muscache.com/im/users/194779/profil...,Oostelijke Eilanden en Kadijken,1.0,1.0,"['email', 'phone']",t,t,"Amsterdam, Noord-Holland, Netherlands",Centrum-Oost,,52.37168,4.91471,Entire rental unit,Entire home/apt,4,,1.5 baths,2.0,,"[""Shampoo"", ""Essentials"", ""Dishwasher"", ""Paid ...",,3,730,3,3,730,730,3.0,730.0,,t,0,0,0,0,2025-06-17,42,0,0,0,0,0,,2010-09-16,2022-08-20,4.71,4.68,4.49,4.95,4.9,4.68,4.5,0363 E76E F06A C1DD 172C,f,1,1,0,0,0.23
4,47061,https://www.airbnb.com/rooms/47061,20250609011745,2025-06-17,city scrape,Charming apartment in old centre,"A beautiful, quiet apartment in the center of ...",,https://a0.muscache.com/pictures/268343/a08ce2...,211696,https://www.airbnb.com/users/show/211696,Ivar,2010-08-24,Netherlands,"Hi, I am a freelance theatre director and comp...",within a few hours,100%,50%,f,https://a0.muscache.com/im/pictures/user/0a2a3...,https://a0.muscache.com/im/pictures/user/0a2a3...,,1.0,2.0,"['email', 'phone']",t,t,,De Baarsjes - Oud-West,,52.36786,4.87458,Entire rental unit,Entire home/apt,3,1.5,1.5 baths,2.0,2.0,"[""Shampoo"", ""Paid street parking off premises""...",$120.00,2,20,2,2,20,20,2.0,20.0,,t,1,1,1,66,2025-06-17,203,5,1,1,6,39,4680.0,2010-09-13,2025-05-29,4.77,4.78,4.61,4.76,4.9,4.85,4.63,0363 1266 8C04 4133 E6AC,f,1,1,0,0,1.13


#### 3.2 Evaluating for Missing Data

In [151]:
missing_data = df.isnull()
missing_data

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,availability_eoy,number_of_reviews_ly,estimated_occupancy_l365d,estimated_revenue_l365d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,True,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10163,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,True,True,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,True,True,True,True,True,True,False,False,False,False,False,False,True
10164,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,True,True,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,True,True,True,True,True,True,False,False,False,False,False,False,True
10165,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,True,True,False,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,True,True,True,True,True,True,False,False,False,False,False,False,True
10166,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True,True,True,True,True,True,True,False,False,False,False,False,False,True


#### 3.3 Count missing values in each column

In [152]:
for column in missing_data.columns.values.tolist():
    print (missing_data[column].value_counts())
    print("")  

id
False    10168
Name: count, dtype: int64

listing_url
False    10168
Name: count, dtype: int64

scrape_id
False    10168
Name: count, dtype: int64

last_scraped
False    10168
Name: count, dtype: int64

source
False    10168
Name: count, dtype: int64

name
False    10168
Name: count, dtype: int64

description
False    9859
True      309
Name: count, dtype: int64

neighborhood_overview
False    5258
True     4910
Name: count, dtype: int64

picture_url
False    10168
Name: count, dtype: int64

host_id
False    10168
Name: count, dtype: int64

host_url
False    10168
Name: count, dtype: int64

host_name
False    10164
True         4
Name: count, dtype: int64

host_since
False    10164
True         4
Name: count, dtype: int64

host_location
False    9024
True     1144
Name: count, dtype: int64

host_about
False    5372
True     4796
Name: count, dtype: int64

host_response_time
False    6640
True     3528
Name: count, dtype: int64

host_response_rate
False    6640
True     3528
Name: co

#### 3.4 Remove unneccessary culomns with missing values

In [153]:
df.drop(columns = [
    "listing_url", "scrape_id", "last_scraped", "source", 
    "neighborhood_overview", "picture_url", "host_id", "host_url", 
    "host_name", "host_since", "host_location", "host_about", "host_response_time",
    "host_thumbnail_url", "host_picture_url", "host_neighbourhood", 
    "host_identity_verified", "neighbourhood_group_cleansed", 
    "calendar_updated", "has_availability", "availability_30", "availability_60", 
    "availability_90", "availability_365", "calendar_last_scraped", 
    "availability_eoy", "estimated_occupancy_l365d", "first_review", "last_review", 
    "license", "calculated_host_listings_count", 
    "calculated_host_listings_count_entire_homes", 
    "calculated_host_listings_count_private_rooms", 
    "calculated_host_listings_count_shared_rooms",
    "estimated_revenue_l365d", "reviews_per_month",
    "host_verifications", "latitude", 'longitude',
    
    ],errors="ignore", inplace=True)

df.head()

Unnamed: 0,id,name,description,host_response_rate,host_acceptance_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,neighbourhood,neighbourhood_cleansed,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,number_of_reviews_ly,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable
0,27886,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,100%,98%,t,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in houseboat,Private room,2,1.5,1.5 baths,1.0,1.0,"[""Coffee maker: Nespresso"", ""Shampoo"", ""Paid s...",$132.00,3,356,3,3,30,30,3.0,30.0,302,28,1,26,4.92,4.9,4.94,4.95,4.92,4.9,4.78,f
1,28871,Comfortable double room,Basic bedroom in the center of Amsterdam.,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in rental unit,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$78.00,2,730,1,2,730,730,2.0,730.0,710,93,9,96,4.88,4.9,4.87,4.94,4.94,4.94,4.84,f
2,29051,Comfortable single / double room,This room can also be rented as a single or a ...,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Private room in condo,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$70.00,2,730,1,2,730,730,2.0,730.0,822,86,7,88,4.81,4.88,4.83,4.93,4.92,4.87,4.79,f
3,44391,Quiet 2-bedroom Amsterdam city centre apartment,Guests greatly appreciate the unique location ...,,,f,1.0,1.0,t,"Amsterdam, Noord-Holland, Netherlands",Centrum-Oost,Entire rental unit,Entire home/apt,4,,1.5 baths,2.0,,"[""Shampoo"", ""Essentials"", ""Dishwasher"", ""Paid ...",,3,730,3,3,730,730,3.0,730.0,42,0,0,0,4.71,4.68,4.49,4.95,4.9,4.68,4.5,f
4,47061,Charming apartment in old centre,"A beautiful, quiet apartment in the center of ...",100%,50%,f,1.0,2.0,t,,De Baarsjes - Oud-West,Entire rental unit,Entire home/apt,3,1.5,1.5 baths,2.0,2.0,"[""Shampoo"", ""Paid street parking off premises""...",$120.00,2,20,2,2,20,20,2.0,20.0,203,5,1,6,4.77,4.78,4.61,4.76,4.9,4.85,4.63,f


#### 3.6 Replace missing value freqency

In [165]:
df.loc[:, 'host_response_rate'] = df['host_response_rate'].fillna(df['host_response_rate'].mode()[0])
df.loc[:, 'host_acceptance_rate'] = df['host_acceptance_rate'].fillna(df['host_acceptance_rate'].mode()[0])
df.loc[:, 'host_is_superhost'] = df['host_is_superhost'].fillna(df['host_is_superhost'].mode()[0])
df.loc[:, 'neighbourhood'] = df['neighbourhood'].fillna(df['neighbourhood'].mode()[0])
df.loc[:, 'neighbourhood_cleansed'] = df['neighbourhood_cleansed'].fillna(df['neighbourhood_cleansed'].mode()[0])
df.loc[:, 'property_type'] = df['property_type'].fillna(df['property_type'].mode()[0])
df.loc[:, 'room_type'] = df['room_type'].fillna(df['room_type'].mode()[0])


df.head(10)

Unnamed: 0,id,name,description,host_response_rate,host_acceptance_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,neighbourhood,neighbourhood_cleansed,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,number_of_reviews_ly,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable
0,27886,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,100%,98%,t,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in houseboat,Private room,2,1.5,1.5 baths,1.0,1.0,"[""Coffee maker: Nespresso"", ""Shampoo"", ""Paid s...",$132.00,3,356,3,3,30,30,3.0,30.0,302,28,1,26,4.92,4.9,4.94,4.95,4.92,4.9,4.78,f
1,28871,Comfortable double room,Basic bedroom in the center of Amsterdam.,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in rental unit,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$78.00,2,730,1,2,730,730,2.0,730.0,710,93,9,96,4.88,4.9,4.87,4.94,4.94,4.94,4.84,f
2,29051,Comfortable single / double room,This room can also be rented as a single or a ...,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Private room in condo,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$70.00,2,730,1,2,730,730,2.0,730.0,822,86,7,88,4.81,4.88,4.83,4.93,4.92,4.87,4.79,f
4,47061,Charming apartment in old centre,"A beautiful, quiet apartment in the center of ...",100%,50%,f,1.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",De Baarsjes - Oud-West,Entire rental unit,Entire home/apt,3,1.5,1.5 baths,2.0,2.0,"[""Shampoo"", ""Paid street parking off premises""...",$120.00,2,20,2,2,20,20,2.0,20.0,203,5,1,6,4.77,4.78,4.61,4.76,4.9,4.85,4.63,f
6,49552,Multatuli Luxury Guest Suite in top location,Stylish & spacious 60m2 guest suite in Amsterd...,100%,92%,t,1.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Entire guest suite,Entire home/apt,3,1.0,1 bath,2.0,2.0,"[""Marie Stella Maris body soap"", ""Coffee maker...",$284.00,3,1125,1,4,1125,1125,3.0,1125.0,599,56,8,58,4.93,4.93,4.93,4.96,4.97,4.98,4.78,f
7,50263,Central de Lux 2 bedrooms (4p) apt 125 sqm,A beautiful 'De Lux' 125 sqm apartment for 4 a...,100%,91%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Entire condo,Entire home/apt,4,1.5,1.5 baths,2.0,3.0,"[""Paid street parking off premises"", ""Single l...",$457.00,3,14,3,3,1125,1125,3.0,1125.0,173,10,2,7,4.85,4.91,4.81,4.82,4.76,4.65,4.74,f
8,50515,"Family Home (No drugs, smoking or parties)",This is a beautiful family home in a lovely pa...,100%,21%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Bos en Lommer,Entire townhouse,Entire home/apt,5,1.5,1.5 baths,3.0,3.0,"[""Wifi"", ""Washer"", ""Dryer"", ""Iron"", ""Shampoo"",...",$198.00,7,18,1,7,18,18,7.0,18.0,18,4,0,5,4.78,4.83,4.83,4.83,4.89,4.56,4.78,f
9,50523,B & B de 9 Straatjes (city center),B&B “De 9 Straatjes” – Your home in the heart ...,90%,99%,t,1.0,1.0,t,"Amsterdam, Noord-Holland, Netherlands",Centrum-West,Entire condo,Entire home/apt,2,1.0,1 bath,1.0,1.0,"[""Essentials"", ""Mini fridge"", ""Private entranc...",$139.00,2,365,1,2,365,365,2.0,365.0,543,75,5,68,4.88,4.9,4.83,4.86,4.83,4.94,4.83,f
13,62015,"Charming, beautifully & sunny place",This beautiful apartment in one of the most li...,100%,0%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Oud-Oost,Entire rental unit,Entire home/apt,2,1.5,1.5 baths,1.0,1.0,"[""Shampoo"", ""Essentials"", ""Dishwasher"", ""Paid ...",$200.00,30,30,30,30,30,30,30.0,30.0,38,0,0,3,4.89,4.92,4.89,5.0,4.89,4.65,4.57,f
16,97221,Beautiful and spacious room,"Private room offered in elegant furnished, cle...",100%,98%,f,2.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",Slotervaart,Private room in bed and breakfast,Private room,2,1.0,1 shared bath,1.0,2.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Hot water k...",$59.00,2,8,2,2,8,8,2.0,8.0,405,53,3,66,4.68,4.74,4.89,4.72,4.78,4.46,4.56,f


#### 3.7 Replace missing value with mean

In [166]:
df['review_scores_value'] = df['review_scores_value'].fillna(df['review_scores_value'].mean())
df['review_scores_location'] = df['review_scores_location'].fillna(df['review_scores_location'].mean())
df['review_scores_rating'] = df['review_scores_rating'].fillna(df['review_scores_rating'].mean())
df['review_scores_accuracy'] = df['review_scores_accuracy'].fillna(df['review_scores_accuracy'].mean())
df['review_scores_cleanliness'] = df['review_scores_cleanliness'].fillna(df['review_scores_cleanliness'].mean())
df['review_scores_checkin'] = df['review_scores_checkin'].fillna(df['review_scores_checkin'].mean())
df['review_scores_communication'] = df['review_scores_communication'].fillna(df['review_scores_communication'].mean())

df.head(10)

Unnamed: 0,id,name,description,host_response_rate,host_acceptance_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,neighbourhood,neighbourhood_cleansed,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,number_of_reviews_ly,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable
0,27886,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,100%,98%,t,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in houseboat,Private room,2,1.5,1.5 baths,1.0,1.0,"[""Coffee maker: Nespresso"", ""Shampoo"", ""Paid s...",$132.00,3,356,3,3,30,30,3.0,30.0,302,28,1,26,4.92,4.9,4.94,4.95,4.92,4.9,4.78,f
1,28871,Comfortable double room,Basic bedroom in the center of Amsterdam.,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in rental unit,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$78.00,2,730,1,2,730,730,2.0,730.0,710,93,9,96,4.88,4.9,4.87,4.94,4.94,4.94,4.84,f
2,29051,Comfortable single / double room,This room can also be rented as a single or a ...,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Private room in condo,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$70.00,2,730,1,2,730,730,2.0,730.0,822,86,7,88,4.81,4.88,4.83,4.93,4.92,4.87,4.79,f
4,47061,Charming apartment in old centre,"A beautiful, quiet apartment in the center of ...",100%,50%,f,1.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",De Baarsjes - Oud-West,Entire rental unit,Entire home/apt,3,1.5,1.5 baths,2.0,2.0,"[""Shampoo"", ""Paid street parking off premises""...",$120.00,2,20,2,2,20,20,2.0,20.0,203,5,1,6,4.77,4.78,4.61,4.76,4.9,4.85,4.63,f
6,49552,Multatuli Luxury Guest Suite in top location,Stylish & spacious 60m2 guest suite in Amsterd...,100%,92%,t,1.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Entire guest suite,Entire home/apt,3,1.0,1 bath,2.0,2.0,"[""Marie Stella Maris body soap"", ""Coffee maker...",$284.00,3,1125,1,4,1125,1125,3.0,1125.0,599,56,8,58,4.93,4.93,4.93,4.96,4.97,4.98,4.78,f
7,50263,Central de Lux 2 bedrooms (4p) apt 125 sqm,A beautiful 'De Lux' 125 sqm apartment for 4 a...,100%,91%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Entire condo,Entire home/apt,4,1.5,1.5 baths,2.0,3.0,"[""Paid street parking off premises"", ""Single l...",$457.00,3,14,3,3,1125,1125,3.0,1125.0,173,10,2,7,4.85,4.91,4.81,4.82,4.76,4.65,4.74,f
8,50515,"Family Home (No drugs, smoking or parties)",This is a beautiful family home in a lovely pa...,100%,21%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Bos en Lommer,Entire townhouse,Entire home/apt,5,1.5,1.5 baths,3.0,3.0,"[""Wifi"", ""Washer"", ""Dryer"", ""Iron"", ""Shampoo"",...",$198.00,7,18,1,7,18,18,7.0,18.0,18,4,0,5,4.78,4.83,4.83,4.83,4.89,4.56,4.78,f
9,50523,B & B de 9 Straatjes (city center),B&B “De 9 Straatjes” – Your home in the heart ...,90%,99%,t,1.0,1.0,t,"Amsterdam, Noord-Holland, Netherlands",Centrum-West,Entire condo,Entire home/apt,2,1.0,1 bath,1.0,1.0,"[""Essentials"", ""Mini fridge"", ""Private entranc...",$139.00,2,365,1,2,365,365,2.0,365.0,543,75,5,68,4.88,4.9,4.83,4.86,4.83,4.94,4.83,f
13,62015,"Charming, beautifully & sunny place",This beautiful apartment in one of the most li...,100%,0%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Oud-Oost,Entire rental unit,Entire home/apt,2,1.5,1.5 baths,1.0,1.0,"[""Shampoo"", ""Essentials"", ""Dishwasher"", ""Paid ...",$200.00,30,30,30,30,30,30,30.0,30.0,38,0,0,3,4.89,4.92,4.89,5.0,4.89,4.65,4.57,f
16,97221,Beautiful and spacious room,"Private room offered in elegant furnished, cle...",100%,98%,f,2.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",Slotervaart,Private room in bed and breakfast,Private room,2,1.0,1 shared bath,1.0,2.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Hot water k...",$59.00,2,8,2,2,8,8,2.0,8.0,405,53,3,66,4.68,4.74,4.89,4.72,4.78,4.46,4.56,f


#### 3.8 Replace missing value with random object

In [167]:
df.loc[:, 'host_response_rate'] = df['host_response_rate'].fillna(df['host_response_rate'].mode()[0])

df.head(10)

Unnamed: 0,id,name,description,host_response_rate,host_acceptance_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,neighbourhood,neighbourhood_cleansed,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,number_of_reviews_ly,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable
0,27886,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,100%,98%,t,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in houseboat,Private room,2,1.5,1.5 baths,1.0,1.0,"[""Coffee maker: Nespresso"", ""Shampoo"", ""Paid s...",$132.00,3,356,3,3,30,30,3.0,30.0,302,28,1,26,4.92,4.9,4.94,4.95,4.92,4.9,4.78,f
1,28871,Comfortable double room,Basic bedroom in the center of Amsterdam.,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in rental unit,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$78.00,2,730,1,2,730,730,2.0,730.0,710,93,9,96,4.88,4.9,4.87,4.94,4.94,4.94,4.84,f
2,29051,Comfortable single / double room,This room can also be rented as a single or a ...,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Private room in condo,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$70.00,2,730,1,2,730,730,2.0,730.0,822,86,7,88,4.81,4.88,4.83,4.93,4.92,4.87,4.79,f
4,47061,Charming apartment in old centre,"A beautiful, quiet apartment in the center of ...",100%,50%,f,1.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",De Baarsjes - Oud-West,Entire rental unit,Entire home/apt,3,1.5,1.5 baths,2.0,2.0,"[""Shampoo"", ""Paid street parking off premises""...",$120.00,2,20,2,2,20,20,2.0,20.0,203,5,1,6,4.77,4.78,4.61,4.76,4.9,4.85,4.63,f
6,49552,Multatuli Luxury Guest Suite in top location,Stylish & spacious 60m2 guest suite in Amsterd...,100%,92%,t,1.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Entire guest suite,Entire home/apt,3,1.0,1 bath,2.0,2.0,"[""Marie Stella Maris body soap"", ""Coffee maker...",$284.00,3,1125,1,4,1125,1125,3.0,1125.0,599,56,8,58,4.93,4.93,4.93,4.96,4.97,4.98,4.78,f
7,50263,Central de Lux 2 bedrooms (4p) apt 125 sqm,A beautiful 'De Lux' 125 sqm apartment for 4 a...,100%,91%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Entire condo,Entire home/apt,4,1.5,1.5 baths,2.0,3.0,"[""Paid street parking off premises"", ""Single l...",$457.00,3,14,3,3,1125,1125,3.0,1125.0,173,10,2,7,4.85,4.91,4.81,4.82,4.76,4.65,4.74,f
8,50515,"Family Home (No drugs, smoking or parties)",This is a beautiful family home in a lovely pa...,100%,21%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Bos en Lommer,Entire townhouse,Entire home/apt,5,1.5,1.5 baths,3.0,3.0,"[""Wifi"", ""Washer"", ""Dryer"", ""Iron"", ""Shampoo"",...",$198.00,7,18,1,7,18,18,7.0,18.0,18,4,0,5,4.78,4.83,4.83,4.83,4.89,4.56,4.78,f
9,50523,B & B de 9 Straatjes (city center),B&B “De 9 Straatjes” – Your home in the heart ...,90%,99%,t,1.0,1.0,t,"Amsterdam, Noord-Holland, Netherlands",Centrum-West,Entire condo,Entire home/apt,2,1.0,1 bath,1.0,1.0,"[""Essentials"", ""Mini fridge"", ""Private entranc...",$139.00,2,365,1,2,365,365,2.0,365.0,543,75,5,68,4.88,4.9,4.83,4.86,4.83,4.94,4.83,f
13,62015,"Charming, beautifully & sunny place",This beautiful apartment in one of the most li...,100%,0%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Oud-Oost,Entire rental unit,Entire home/apt,2,1.5,1.5 baths,1.0,1.0,"[""Shampoo"", ""Essentials"", ""Dishwasher"", ""Paid ...",$200.00,30,30,30,30,30,30,30.0,30.0,38,0,0,3,4.89,4.92,4.89,5.0,4.89,4.65,4.57,f
16,97221,Beautiful and spacious room,"Private room offered in elegant furnished, cle...",100%,98%,f,2.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",Slotervaart,Private room in bed and breakfast,Private room,2,1.0,1 shared bath,1.0,2.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Hot water k...",$59.00,2,8,2,2,8,8,2.0,8.0,405,53,3,66,4.68,4.74,4.89,4.72,4.78,4.46,4.56,f


#### 3.9 Drop rows with missing value

In [168]:
df = df.dropna(subset=['price'])
df = df.dropna(subset=['neighbourhood'])
df = df.dropna(subset=['neighbourhood_cleansed'])
df = df.dropna(subset=['property_type'])
df = df.dropna(subset=['bathrooms_text'])
df = df.dropna(subset=['host_has_profile_pic'])

df.head(10)

Unnamed: 0,id,name,description,host_response_rate,host_acceptance_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,neighbourhood,neighbourhood_cleansed,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,number_of_reviews_ly,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable
0,27886,"Romantic, stylish B&B houseboat in canal district",Stylish and romantic houseboat on fantastic hi...,100%,98%,t,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in houseboat,Private room,2,1.5,1.5 baths,1.0,1.0,"[""Coffee maker: Nespresso"", ""Shampoo"", ""Paid s...",$132.00,3,356,3,3,30,30,3.0,30.0,302,28,1,26,4.92,4.9,4.94,4.95,4.92,4.9,4.78,f
1,28871,Comfortable double room,Basic bedroom in the center of Amsterdam.,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Private room in rental unit,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$78.00,2,730,1,2,730,730,2.0,730.0,710,93,9,96,4.88,4.9,4.87,4.94,4.94,4.94,4.84,f
2,29051,Comfortable single / double room,This room can also be rented as a single or a ...,100%,99%,t,2.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Private room in condo,Private room,2,1.0,1 shared bath,1.0,1.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Heating"", ""...",$70.00,2,730,1,2,730,730,2.0,730.0,822,86,7,88,4.81,4.88,4.83,4.93,4.92,4.87,4.79,f
4,47061,Charming apartment in old centre,"A beautiful, quiet apartment in the center of ...",100%,50%,f,1.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",De Baarsjes - Oud-West,Entire rental unit,Entire home/apt,3,1.5,1.5 baths,2.0,2.0,"[""Shampoo"", ""Paid street parking off premises""...",$120.00,2,20,2,2,20,20,2.0,20.0,203,5,1,6,4.77,4.78,4.61,4.76,4.9,4.85,4.63,f
6,49552,Multatuli Luxury Guest Suite in top location,Stylish & spacious 60m2 guest suite in Amsterd...,100%,92%,t,1.0,2.0,t,"Amsterdam, North Holland, Netherlands",Centrum-West,Entire guest suite,Entire home/apt,3,1.0,1 bath,2.0,2.0,"[""Marie Stella Maris body soap"", ""Coffee maker...",$284.00,3,1125,1,4,1125,1125,3.0,1125.0,599,56,8,58,4.93,4.93,4.93,4.96,4.97,4.98,4.78,f
7,50263,Central de Lux 2 bedrooms (4p) apt 125 sqm,A beautiful 'De Lux' 125 sqm apartment for 4 a...,100%,91%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Centrum-Oost,Entire condo,Entire home/apt,4,1.5,1.5 baths,2.0,3.0,"[""Paid street parking off premises"", ""Single l...",$457.00,3,14,3,3,1125,1125,3.0,1125.0,173,10,2,7,4.85,4.91,4.81,4.82,4.76,4.65,4.74,f
8,50515,"Family Home (No drugs, smoking or parties)",This is a beautiful family home in a lovely pa...,100%,21%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Bos en Lommer,Entire townhouse,Entire home/apt,5,1.5,1.5 baths,3.0,3.0,"[""Wifi"", ""Washer"", ""Dryer"", ""Iron"", ""Shampoo"",...",$198.00,7,18,1,7,18,18,7.0,18.0,18,4,0,5,4.78,4.83,4.83,4.83,4.89,4.56,4.78,f
9,50523,B & B de 9 Straatjes (city center),B&B “De 9 Straatjes” – Your home in the heart ...,90%,99%,t,1.0,1.0,t,"Amsterdam, Noord-Holland, Netherlands",Centrum-West,Entire condo,Entire home/apt,2,1.0,1 bath,1.0,1.0,"[""Essentials"", ""Mini fridge"", ""Private entranc...",$139.00,2,365,1,2,365,365,2.0,365.0,543,75,5,68,4.88,4.9,4.83,4.86,4.83,4.94,4.83,f
13,62015,"Charming, beautifully & sunny place",This beautiful apartment in one of the most li...,100%,0%,f,1.0,1.0,t,"Amsterdam, North Holland, Netherlands",Oud-Oost,Entire rental unit,Entire home/apt,2,1.5,1.5 baths,1.0,1.0,"[""Shampoo"", ""Essentials"", ""Dishwasher"", ""Paid ...",$200.00,30,30,30,30,30,30,30.0,30.0,38,0,0,3,4.89,4.92,4.89,5.0,4.89,4.65,4.57,f
16,97221,Beautiful and spacious room,"Private room offered in elegant furnished, cle...",100%,98%,f,2.0,2.0,t,"Amsterdam, Noord-Holland, Netherlands",Slotervaart,Private room in bed and breakfast,Private room,2,1.0,1 shared bath,1.0,2.0,"[""Carbon monoxide alarm"", ""Wifi"", ""Hot water k...",$59.00,2,8,2,2,8,8,2.0,8.0,405,53,3,66,4.68,4.74,4.89,4.72,4.78,4.46,4.56,f


#### CHECK

In [169]:
df_missing = df[df.columns[df.isna().any()]]
df_missing.head(10)

Unnamed: 0,description,bedrooms,beds
0,Stylish and romantic houseboat on fantastic hi...,1.0,1.0
1,Basic bedroom in the center of Amsterdam.,1.0,1.0
2,This room can also be rented as a single or a ...,1.0,1.0
4,"A beautiful, quiet apartment in the center of ...",2.0,2.0
6,Stylish & spacious 60m2 guest suite in Amsterd...,2.0,2.0
7,A beautiful 'De Lux' 125 sqm apartment for 4 a...,2.0,3.0
8,This is a beautiful family home in a lovely pa...,3.0,3.0
9,B&B “De 9 Straatjes” – Your home in the heart ...,1.0,1.0
13,This beautiful apartment in one of the most li...,1.0,1.0
16,"Private room offered in elegant furnished, cle...",1.0,2.0


### 4.0 Searching for and removing duplicates

In [159]:
df[df.duplicated(keep=False)]

Unnamed: 0,id,name,description,host_response_rate,host_acceptance_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,neighbourhood,neighbourhood_cleansed,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,number_of_reviews_ly,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable


### 5.0 Checking types

### 6.0 Filtering out outliers

### 7.0 Normalization