Airbnb Data Analysis (Bangkok)

##### Loading Dataset

In [None]:
# Importing all the necessary libraries 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Importing the dataset from csv file
df = pd.read_csv('Airbnb Listings Bangkok.csv', index_col=0)
df

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
0,27934,Nice room with superb city view,120437,Nuttee,Ratchathewi,13.759830,100.541340,Entire home/apt,1905,3,65,2020-01-06,0.50,2,353,0
1,27979,"Easy going landlord,easy place",120541,Emy,Bang Na,13.668180,100.616740,Private room,1316,1,0,,,2,358,0
2,28745,modern-style apartment in Bangkok,123784,Familyroom,Bang Kapi,13.752320,100.624020,Private room,800,60,0,,,1,365,0
3,35780,Spacious one bedroom at The Kris Condo Bldg. 3,153730,Sirilak,Din Daeng,13.788230,100.572560,Private room,1286,7,2,2022-04-01,0.03,1,323,1
4,941865,Suite Room 3 at MetroPoint,610315,Kasem,Bang Kapi,13.768720,100.633380,Private room,1905,1,0,,,3,365,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15849,790465040741092826,素坤逸核心两房公寓42楼，靠近BTSon nut/无边天际泳池观赏曼谷夜景/出门当地美食街,94899359,Renee,Pra Wet,13.715132,100.653458,Private room,2298,28,0,,,1,362,0
15850,790474503157243541,Euro LuxuryHotel PratunamMKt TripleBdNrShoping...,491526222,Phakhamon,Ratchathewi,13.753052,100.538738,Private room,1429,1,0,,,14,365,0
15851,790475335086864240,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,491526222,Phakhamon,Ratchathewi,13.753169,100.538700,Private room,1214,1,0,,,14,365,0
15852,790475546213717328,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,491526222,Phakhamon,Ratchathewi,13.754789,100.538757,Private room,1214,1,0,,,14,365,0


Dataset Overview:

This dataset contains Airbnb listings in Bangkok, including pricing, room types, availability, reviews, and host information.
- id - Airbnb's unique identifier for the listing.
- name - Name of the listing.
- host_id - Airbnb's unique identifier for the host/user.
- host_name - Name of the host. Usually, just the first name(s).
- neighborhood - The neighborhood is geocoded using the latitude and longitude against neighborhoods as defined by open or public - digital shapefiles.
- latitude - Uses the World Geodetic System (WGS84) projection for latitude and longitude.
- longitude - Uses the World Geodetic System (WGS84) projection for latitude and longitude.
- room_type - [Entire home/apt |Private room| Shared room| Hotel]
- price - Daily price in local currency. Note, the $ sign may be used despite the locale.
- minimum_nights - The minimum number of night stays for the listing (calendar rules may differ).
- number_of_reviews - The number of reviews the listing has.
- last_review - The date of the last/newest review.
- calculated_host_listings_count - The number of listings the host has in the current scrape in the city/region geography.
- availability_365 - The calendar determines the availability of the listing x days in the future. Note a listing may - be available because it has been booked by a guest or blocked by the host.
- number_of_reviews_ltm - The number of reviews the listing has (in the last 12 months).



In [None]:
# Checking the first 5 rows
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
0,27934,Nice room with superb city view,120437,Nuttee,Ratchathewi,13.75983,100.54134,Entire home/apt,1905,3,65,2020-01-06,0.5,2,353,0
1,27979,"Easy going landlord,easy place",120541,Emy,Bang Na,13.66818,100.61674,Private room,1316,1,0,,,2,358,0
2,28745,modern-style apartment in Bangkok,123784,Familyroom,Bang Kapi,13.75232,100.62402,Private room,800,60,0,,,1,365,0
3,35780,Spacious one bedroom at The Kris Condo Bldg. 3,153730,Sirilak,Din Daeng,13.78823,100.57256,Private room,1286,7,2,2022-04-01,0.03,1,323,1
4,941865,Suite Room 3 at MetroPoint,610315,Kasem,Bang Kapi,13.76872,100.63338,Private room,1905,1,0,,,3,365,0


In [None]:
# Checking the last 5 rows
df.tail()

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
15849,790465040741092826,素坤逸核心两房公寓42楼，靠近BTSon nut/无边天际泳池观赏曼谷夜景/出门当地美食街,94899359,Renee,Pra Wet,13.715132,100.653458,Private room,2298,28,0,,,1,362,0
15850,790474503157243541,Euro LuxuryHotel PratunamMKt TripleBdNrShoping...,491526222,Phakhamon,Ratchathewi,13.753052,100.538738,Private room,1429,1,0,,,14,365,0
15851,790475335086864240,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,491526222,Phakhamon,Ratchathewi,13.753169,100.5387,Private room,1214,1,0,,,14,365,0
15852,790475546213717328,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,491526222,Phakhamon,Ratchathewi,13.754789,100.538757,Private room,1214,1,0,,,14,365,0
15853,790476492384199044,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,491526222,Phakhamon,Ratchathewi,13.75296,100.54082,Private room,1214,1,0,,,14,365,0


##### Data Structure

In [30]:
df.shape

(15854, 16)

In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 15854 entries, 0 to 15853
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              15854 non-null  int64  
 1   name                            15846 non-null  object 
 2   host_id                         15854 non-null  int64  
 3   host_name                       15853 non-null  object 
 4   neighbourhood                   15854 non-null  object 
 5   latitude                        15854 non-null  float64
 6   longitude                       15854 non-null  float64
 7   room_type                       15854 non-null  object 
 8   price                           15854 non-null  int64  
 9   minimum_nights                  15854 non-null  int64  
 10  number_of_reviews               15854 non-null  int64  
 11  last_review                     10064 non-null  object 
 12  reviews_per_month               10064

The dataset has contains thousands of airbnb listings.
The dataset has mix of both numerical and categorical values.

##### Statistical Summary

In [None]:
df.describe().T # T - transpose matrix (row to col, col to rows)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
id,15854.0,1.579397e+17,2.946015e+17,27934.0,21045090.0,35037340.0,52561540.0,7.908162e+17
host_id,15854.0,154105800.0,131872600.0,58920.0,39744310.0,122455600.0,239054700.0,492665900.0
latitude,15854.0,13.74514,0.04303957,13.5273,13.72009,13.73849,13.7595,13.95354
longitude,15854.0,100.5599,0.05091058,100.32955,100.5297,100.5614,100.5851,100.9234
price,15854.0,3217.704,24972.12,0.0,900.0,1429.0,2429.0,1100000.0
minimum_nights,15854.0,15.29236,50.81502,1.0,1.0,1.0,7.0,1125.0
number_of_reviews,15854.0,16.65416,40.61333,0.0,0.0,2.0,13.0,1224.0
reviews_per_month,10064.0,0.8131449,1.090196,0.01,0.12,0.435,1.06,19.13
calculated_host_listings_count,15854.0,13.88962,30.26985,1.0,1.0,4.0,13.0,228.0
availability_365,15854.0,244.3786,125.8432,0.0,138.0,309.0,360.0,365.0


In [33]:
df.describe(include='object')

Unnamed: 0,name,host_name,neighbourhood,room_type,last_review
count,15846,15853,15854,15854,10064
unique,14794,5312,50,4,1669
top,New! La Chada Night Market studio 2PPL near MRT,Curry,Vadhana,Entire home/apt,2022-12-11
freq,45,228,2153,8912,189


##### Handling Missing Value

In [37]:
df.isnull().sum()

id                                   0
name                                 8
host_id                              0
host_name                            1
neighbourhood                        0
latitude                             0
longitude                            0
room_type                            0
price                                0
minimum_nights                       0
number_of_reviews                    0
last_review                       5790
reviews_per_month                 5790
calculated_host_listings_count       0
availability_365                     0
number_of_reviews_ltm                0
dtype: int64

The last_review and revies_per_month column has quite a lot missing values.

In [None]:
# Filling the numerical column missing values with median 
df['reviews_per_month'] = df['reviews_per_month'].fillna(df['reviews_per_month'].median())

# Filling the categorical column missing values with mode 
cat_mode = df['last_review'].mode()[0] # taking the one with highest frequency
df['last_review'] = df['last_review'].fillna(cat_mode)

2022-12-11


In [51]:
# Checking if the missing values still exits or not
df.isnull().sum()

id                                0
name                              8
host_id                           0
host_name                         1
neighbourhood                     0
latitude                          0
longitude                         0
room_type                         0
price                             0
minimum_nights                    0
number_of_reviews                 0
last_review                       0
reviews_per_month                 0
calculated_host_listings_count    0
availability_365                  0
number_of_reviews_ltm             0
dtype: int64