**Capstone Project Module #2 - Sendhi Anshari Rasyid**

# **Latar Belakang**

# **Pernyataan Masalah**

1. Faktor apa saja yang mempengaruhi kunjungan dari sebuah penginapan? | *harga, lokasi, jumlah minimal hari menginap, tipe penginapan*
1. Kapankah suatu penginapan ramai dikunjungi oleh para penyewa?
1. Apakah persebaran penginapan sudah cukup merata di Bangkok?

# **Tujuan**

1. Mengetahui faktor yang paling berpengaruh terhadap kunjungan sebuah penginapan di Bangkok
1. Mengetahui bulan apa yang menjadi puncak dari kunjungan pada penginapan di Bangkok
1. Mengetahui daerah mana yang menjadi pusat kunjungan dari para pendatang yang singgah di Bangkok

## **Data**
Dataset yang digunakan pada analisis ini merupakan kumpulan dari *host* penginapan dalam aplikasi Airbnb yang berlokasi di sekitar Kota Bangkok, Thailand. Lengkap dengan detail lokasi, ketersediaan kamar, tipe kamar, dan lainnya. Dataset tersebut bisa diakses melalui [tautan ini.](https://drive.google.com/file/d/1Kagt-IMGruvyBV3tH6HYa721JK-TN-56/view?usp=drive_link)

### Penjelasan Dataset

Berikut adalah penjelasan detail dari setiap data yang termasuk dalam dataset tersebut:

-  `Unnamed: 0`: Index
-  `id`: ID Airbnb unik yang dimiliki setiap listing penginapan
-  `name`: Nama listing penginapan
-  `host_id`: ID Airbnb unik yang dimiliki oleh setiap host
-  `host_name`: Nama dari host pemilik penginapan
-  `neighborhood`: Lokasi/daerah di mana penginapan berada
-  `latitude`: Posisi koordinat garis lintang dari penginapan terkait
-  `longitude`: Posisi koordinat garis bujur dari penginapan terkait
-  `room_type`: Tipe dari kamar yang disewakan, terbagi menjadi 3; *entire place, private rooms, shared rooms*
-  `price`: Harga sewa per hari
-  `minimum_nights`: Jumlah minimal untuk menyewa kamar
-  `number_of_reviews`: Jumlah review keseluruhan yang dimiliki oleh penginapan
-  `last_review`: Tanggal review terakhir oleh konsumen
-  `reviews_per_month`: Jumlah review yang diterima oleh penginapan pada satu bulan
-  `calculated_host_listings_count`: Jumlah listing yang dimiliki oleh setiap host
-  `availability_365`: Ketersediaan kamar yang dapat disewa dalam kurun waktu 365 hari
-  `number_of_reviews_ltm`: Jumlah review yang dimiliki oleh penginapan (dalam 12 bulan terakhir)

### Data Preparation
Tahap persiapan dengan melakukan *import* beberapa *library* yang akan digunakan dalam mengelola dan menganalisis data lebih lanjut.

In [124]:
#Import Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from scipy.stats import normaltest, chi2_contingency
import warnings
warnings.filterwarnings('ignore')

In [125]:
#Import Dataset
df=pd.read_csv('Airbnb Listings Bangkok.csv')

#Menampilkan 5 data sample acak yang terdapat di dalam dataset
display(df.sample(5))

Unnamed: 0.1,Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
14060,14060,715565949199588020,Lovely Studio room with pool - great location,434168399,Krittika,Khlong Toei,13.72693,100.56259,Entire home/apt,1950,1,0,,,44,359,0
13793,13793,689719738192889434,Modern Cozy fully furnished 1 BR | Center of Bkk,474141399,Beverly,Khlong Toei,13.70901,100.58296,Entire home/apt,797,14,5,2022-12-13,1.14,3,331,5
14624,14624,743694980810006625,素坤逸高级公寓近BTS on nut，无边天际泳池赏曼谷夜景,484194134,行走在曼谷,Khlong Toei,13.729,100.56854,Private room,2500,28,0,,,1,362,0
7779,7779,34289227,Luxury condo@Superb skyline view,199624792,Ho,Vadhana,13.74019,100.58869,Entire home/apt,1450,21,2,2022-04-01,0.21,23,95,2
4470,4470,22720798,1Br Silom-Sathorn Bkk@BTS Chong Nonsi by Triple B,164560603,Triple B,Bang Rak,13.7236,100.52676,Entire home/apt,2429,14,36,2022-12-22,0.61,3,320,5


> Terlihat bahwa kolom `Unnamed: 0` bertindak sebagai index dari setiap datanya. Untuk menghindari kebingungan, maka kolom tersebut akan diubah namanya menjadi kolom `Index`.

In [126]:
#Mengganti nama kolom
df=df.rename(columns={'Unnamed: 0':'index'})
df.head(5)

Unnamed: 0,index,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
0,0,27934,Nice room with superb city view,120437,Nuttee,Ratchathewi,13.75983,100.54134,Entire home/apt,1905,3,65,2020-01-06,0.5,2,353,0
1,1,27979,"Easy going landlord,easy place",120541,Emy,Bang Na,13.66818,100.61674,Private room,1316,1,0,,,2,358,0
2,2,28745,modern-style apartment in Bangkok,123784,Familyroom,Bang Kapi,13.75232,100.62402,Private room,800,60,0,,,1,365,0
3,3,35780,Spacious one bedroom at The Kris Condo Bldg. 3,153730,Sirilak,Din Daeng,13.78823,100.57256,Private room,1286,7,2,2022-04-01,0.03,1,323,1
4,4,941865,Suite Room 3 at MetroPoint,610315,Kasem,Bang Kapi,13.76872,100.63338,Private room,1905,1,0,,,3,365,0


### Data Understanding dan Data Cleaning
Sebelum melakukan analisis, terlebih dahulu dilakukan *data understanding* untuk lebih memahami isi dataset. Serta *data cleaning* untuk mengantisipasi kehadiran *error* ataupun *missing value* pada sebuah dataset.

Kedua hal tersebut dilakukan agar *data analysis* yang dilakukan nantinya bisa lebih efisien dan memiliki hasil yang maksimal.

Adapun beberapa tahapan tahapan yang akan saya lakukan, terdiri dari:

1. Identifikasi Data
1. Identifikasi Kolom yang Akan Dianalisis
1. Identifikasi Duplikasi Data pada Primary Key
1. Identifikasi dan Penanganan *Missing Value*
1. Identifikasi *Unique Data*

##### 1. Identifikasi Data

In [127]:
#Mengetahui jumlah data yang terdapat pada dataset
a,b = df.shape
print(f'Jumlah data yang terdapat pada adalah sebanyak {a} baris dan {b} kolom.\n')

#Melihat detail dataset
df.info()

Jumlah data yang terdapat pada adalah sebanyak 15854 baris dan 17 kolom.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15854 entries, 0 to 15853
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   index                           15854 non-null  int64  
 1   id                              15854 non-null  int64  
 2   name                            15846 non-null  object 
 3   host_id                         15854 non-null  int64  
 4   host_name                       15853 non-null  object 
 5   neighbourhood                   15854 non-null  object 
 6   latitude                        15854 non-null  float64
 7   longitude                       15854 non-null  float64
 8   room_type                       15854 non-null  object 
 9   price                           15854 non-null  int64  
 10  minimum_nights                  15854 non-null  int64  
 11  number_of_reviews  

> Beberapa informasi yang bisa didapatkan berdasarkan data di atas adalah sebagai berikut:
> - Jumlah data yang terdapat pada dataset ini adalah sebanyak 15854 baris yang terbagi pada 17 kolom.
>- Jumlah data pada kolom `last_review` dan `reviews_per_month` hanya sebanyak 10064 data saja. Jauh berbeda dengan total jumlah data keseluruhan, yaitu sebanyak 15854 data. Dapat disimpulkan (sementara) bahwa pada kedua kolom tersebut terdapat banyak data *Null*.
>- Tipe data yang terdapat di dalam dataset terbagi menjadi 12 kolom numerik dan 5 kolom non-numerik/object.
>- Format pada kolom `last_review` masih keliru, seharusnya berformat *date.*

In [128]:
#Mengubah format 'last_review' menjadi datetime
df['last_review'] = pd.to_datetime(df['last_review'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15854 entries, 0 to 15853
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   index                           15854 non-null  int64         
 1   id                              15854 non-null  int64         
 2   name                            15846 non-null  object        
 3   host_id                         15854 non-null  int64         
 4   host_name                       15853 non-null  object        
 5   neighbourhood                   15854 non-null  object        
 6   latitude                        15854 non-null  float64       
 7   longitude                       15854 non-null  float64       
 8   room_type                       15854 non-null  object        
 9   price                           15854 non-null  int64         
 10  minimum_nights                  15854 non-null  int64         
 11  nu

> Kolom `reviews_per_month` sudah berhasil diganti tipe datanya menjadi datetime.

In [129]:
#Melihat statistik deskriptif dari masing-masing kolom data (numerik dan non-numerik)
display(df.describe(), df.describe(include='object'))

Unnamed: 0,index,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
count,15854.0,15854.0,15854.0,15854.0,15854.0,15854.0,15854.0,15854.0,10064,10064.0,15854.0,15854.0,15854.0
mean,7926.5,1.579397e+17,154105800.0,13.745144,100.559903,3217.704,15.292355,16.654157,2021-08-30 08:37:49.316375296,0.813145,13.889618,244.378643,3.481519
min,0.0,27934.0,58920.0,13.5273,100.32955,0.0,1.0,0.0,2012-12-15 00:00:00,0.01,1.0,0.0,0.0
25%,3963.25,21045090.0,39744310.0,13.72009,100.52969,900.0,1.0,0.0,2020-02-20 00:00:00,0.12,1.0,138.0,0.0
50%,7926.5,35037340.0,122455600.0,13.73849,100.561415,1429.0,1.0,2.0,2022-10-24 00:00:00,0.435,4.0,309.0,0.0
75%,11889.75,52561540.0,239054700.0,13.759497,100.58515,2429.0,7.0,13.0,2022-12-08 00:00:00,1.06,13.0,360.0,3.0
max,15853.0,7.908162e+17,492665900.0,13.95354,100.92344,1100000.0,1125.0,1224.0,2022-12-28 00:00:00,19.13,228.0,365.0,325.0
std,4576.799919,2.946015e+17,131872600.0,0.04304,0.050911,24972.12,50.81502,40.613331,,1.090196,30.269848,125.843224,8.916937


Unnamed: 0,name,host_name,neighbourhood,room_type
count,15846,15853,15854,15854
unique,14794,5312,50,4
top,New! La Chada Night Market studio 2PPL near MRT,Curry,Vadhana,Entire home/apt
freq,45,228,2153,8912


> Berdasarkan informasi di atas, kita bisa mengetahui beberapa hal:
> - Pada kolom `Price` diketahui memiliki nilai minimal sebesar 0, di mana merupakan sebuah anomali. Karena *value* tersebut mengindikasikan bahwa ada sebuah *listing* penginapan yang bisa disewa dengan gratis. Sedangkan sebuah *listing* seharusnya memiliki *value* > 0 agar bisa disewakan.
> - Jumlah data *unique* pada kolom `name`, `host_name`, `neighbourhood`, `room_type`, dan `last_review` memiliki selisih yang signifikan dengan total data *(count)* keseluruhan, sehingga dapat disimpulkan terdapat banyak duplikasi data. Hal ini bukanlah suatu anomali, mengingat kolom-kolom tersebut memang memiliki *value* yang berulang.

In [130]:
#Melihat data yang memiliki 'price' = 0
df.loc[df['price'] == 0]

Unnamed: 0,index,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
11103,11103,44563108,Somerset Maison Asoke Bangkok,360620448,Somerset Maison Asoke,Vadhana,13.73815,100.5642,Hotel room,0,1,0,NaT,,1,0,0


> Diketahui terdapat 1 data yang memiliki `price` = 0. Karena merupakan sebuah anomali data, maka diputuskan untuk menghapus row data tersebut.

In [131]:
#Menghapus row data yang memiliki 'price' = 0
df.drop(11103, inplace=True)

In [132]:
#Reset index setelah melakukan drop row
df.set_index('name', inplace=True)
df.reset_index(inplace=True)

In [133]:
#Pengecekan ulang statistik deskriptif dari masing-masing kolom data
display(df.describe(), df.describe(include='object'))

Unnamed: 0,index,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
count,15853.0,15853.0,15853.0,15853.0,15853.0,15853.0,15853.0,15853.0,10064,10064.0,15853.0,15853.0,15853.0
mean,7926.299628,1.579496e+17,154092800.0,13.745144,100.559903,3217.907,15.293257,16.655207,2021-08-30 08:37:49.316375296,0.813145,13.890431,244.394058,3.481738
min,0.0,27934.0,58920.0,13.5273,100.32955,278.0,1.0,0.0,2012-12-15 00:00:00,0.01,1.0,0.0,0.0
25%,3963.0,21045090.0,39744310.0,13.72009,100.52969,900.0,1.0,0.0,2020-02-20 00:00:00,0.12,1.0,138.0,0.0
50%,7926.0,35032240.0,122455600.0,13.73849,100.56141,1429.0,1.0,2.0,2022-10-24 00:00:00,0.435,4.0,309.0,0.0
75%,11890.0,52562840.0,239027400.0,13.7595,100.58515,2429.0,7.0,13.0,2022-12-08 00:00:00,1.06,13.0,360.0,3.0
max,15853.0,7.908162e+17,492665900.0,13.95354,100.92344,1100000.0,1125.0,1224.0,2022-12-28 00:00:00,19.13,228.0,365.0,325.0
std,4576.874737,2.946082e+17,131866600.0,0.043041,0.050912,24972.9,50.816496,40.614397,,1.090196,30.27063,125.832224,8.917176


Unnamed: 0,name,host_name,neighbourhood,room_type
count,15845,15852,15853,15853
unique,14793,5311,50,4
top,New! La Chada Night Market studio 2PPL near MRT,Curry,Vadhana,Entire home/apt
freq,45,228,2152,8912


> Dapat terlihat bahwa nilai min dari `price` sudah bukan 0 lagi, yang berarti data '11103' yang memiliki value=0 sudah berhasil terhapus.

##### 2. Identifikasi Kolom yang Akan Dianalisis

Sebelum memasuki tahapan analisis data, terlebih dahulu mengidentifikasi kolom mana yang tidak akan digunakan dalam tahap tersebut karena tidak relevan dengan rumusan masalah. Beberapa yang saya putuskan untuk dihapus secara permanen adalah sebagai berikut:
- kolom `Index`, karena informasinya hanya merupakan duplikasi dari index setiap data
- kolom `calculated_host_listings_count`, karena hanya berisikan informasi terkait jumlah penginapan yang dimiliki oleh seorang host
- kolom `number_of_reviews_ltm`, karena hanya merupakan rangkuman dari jumlah review yang diterima dalam 12 bulan terakhir
- kolom `longitude` dan `latitude`, karena hanya berisikan koordinat detail dari setiap penginapannya
- kolom `reviews_per_month`, karena hanya berisi jumlah review yang diterima oleh sebuah penginapan dalam setiap bulannya

In [134]:
# df.drop(['index', 'longitude', 'latitude', 'reviews_per_month', 'calculated_host_listings_count', 'number_of_reviews_ltm'], axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15853 entries, 0 to 15852
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   name                            15845 non-null  object        
 1   index                           15853 non-null  int64         
 2   id                              15853 non-null  int64         
 3   host_id                         15853 non-null  int64         
 4   host_name                       15852 non-null  object        
 5   neighbourhood                   15853 non-null  object        
 6   latitude                        15853 non-null  float64       
 7   longitude                       15853 non-null  float64       
 8   room_type                       15853 non-null  object        
 9   price                           15853 non-null  int64         
 10  minimum_nights                  15853 non-null  int64         
 11  nu

##### 3. Identifikasi Duplikasi Data pada Primary Key

Karena kolom `ID` bertindak sebagai *primary key* pada dataset ini, maka prioritas pertama adalah mengecek apakah kolom tersebut memiliki duplikasi atau tidak agar akurasi dari analisis yang dilakukan bisa lebih akurat.

In [135]:
#Pengecekan duplikasi data pada kolom ID
print("Jumlah duplikasi data pada kolom 'ID' adalah sebanyak",
      df['id'].duplicated().sum(), "data.")

Jumlah duplikasi data pada kolom 'ID' adalah sebanyak 0 data.


> Setelah dilakukan pengecekan, ternyata kolom `ID` tidak memiliki duplikasi. 

Pada kolom lainnya tidak dicek apakah memiliki duplikasi atau tidak, karena sudah terwakili oleh pengecekan kolom `ID` ini. Kemunculan data berulang yang terjadi pada kolom `host_name`, `host_id`, `neighbourhood`, ataupun lainnya tidak akan berpengaruh pada proses analisis berikutnya.

##### 4. Identifikasi dan Penanganan Missing Value

In [136]:
#Fungsi tampil missing value
def persentase_null():
    return round(df.isnull().sum() * 100 / len(df), 2)

def jumlah_null():
    return df.isna().sum()

#Dataframe missing value
df_null = pd.DataFrame({
    'Jumlah Data': jumlah_null(),
    'Persentase': persentase_null()
})

#Pengecekan keberadaan missing value dari setiap kolom
print('Jumlah missing value di setiap kolom adalah sebagai berikut:\n')
print(df_null)
print('\nJumlah missing value dari semua kolom sebanyak', jumlah_null().sum(), 'data.')

Jumlah missing value di setiap kolom adalah sebagai berikut:

                                Jumlah Data  Persentase
name                            8            0.05      
index                           0            0.00      
id                              0            0.00      
host_id                         0            0.00      
host_name                       1            0.01      
neighbourhood                   0            0.00      
latitude                        0            0.00      
longitude                       0            0.00      
room_type                       0            0.00      
price                           0            0.00      
minimum_nights                  0            0.00      
number_of_reviews               0            0.00      
last_review                     5789         36.52     
reviews_per_month               5789         36.52     
calculated_host_listings_count  0            0.00      
availability_365                0         

>1. Beberapa kolom masih memiliki *missing value*, dengan detail sebagai berikut:
>- `name` sebanyak 8 data atau 0.05%
>- `host_name` sebanyak 1 data atau 0.01%
>- `last_review` sebanyak 5790 data atau 36.52%
>- `reviews_per_month` sebanyak 5790 data atau 36.52%
>2. Total dari keseluruhan *missing value* yang terdapat pada dataset adalah sejumlah 11589 data.
>3. Walaupun memiliki *missing value* yang cukup besar presentasinya, karena kolom `last_review` akan dijadikan sebagai salah satu acuan dalam melakukan analisis, maka kolom tersebut tetap dipertahankan.

Kolom-kolom yang masih memiliki *missing value* tersebut akan dicek satu per satu, sebelum diputuskan tindak lanjut seperti apa yang harus dilakukan pada data-data tersebut.

In [137]:
#Pengecekan missing value pada kolom name
df[df['name'].isna()]

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
439,,439,4549768,18852579,Titawan,Phra Khanong,13.69406,100.59619,Private room,1080,5,0,NaT,,1,365,0
544,,544,4720818,24386225,Cherry,Din Daeng,13.77562,100.57346,Private room,1200,1,0,NaT,,1,365,0
572,,572,4245018,22030043,Parichart,Bang Phlat,13.78376,100.49821,Private room,1200,1,0,NaT,,1,365,0
669,,669,6148415,31895202,Chira,Bang Na,13.68276,100.60894,Entire home/apt,2424,2,0,NaT,,1,365,0
1030,,1030,8055144,42521288,Nantida,Vadhana,13.74126,100.55761,Private room,5000,3,0,NaT,,1,365,0
1282,,1282,10000742,51374914,Diamond Bangkok,Ratchathewi,13.75328,100.52928,Private room,930,1,6,2017-05-13,0.07,1,365,0
1594,,1594,10710165,55347997,Khaneungnit,Vadhana,13.71757,100.60464,Private room,1000,1,0,NaT,,1,365,0
2075,,2075,13142743,73275200,Pakaphol,Khlong Toei,13.72566,100.56416,Private room,850,1,2,2017-12-11,0.03,3,220,0


In [138]:
#Proses mengubah missing value pada kolom `name` menjadi "No Name"
df['name'].fillna('No Name', inplace=True)
df[df['name'] == 'No Name']

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
439,No Name,439,4549768,18852579,Titawan,Phra Khanong,13.69406,100.59619,Private room,1080,5,0,NaT,,1,365,0
544,No Name,544,4720818,24386225,Cherry,Din Daeng,13.77562,100.57346,Private room,1200,1,0,NaT,,1,365,0
572,No Name,572,4245018,22030043,Parichart,Bang Phlat,13.78376,100.49821,Private room,1200,1,0,NaT,,1,365,0
669,No Name,669,6148415,31895202,Chira,Bang Na,13.68276,100.60894,Entire home/apt,2424,2,0,NaT,,1,365,0
1030,No Name,1030,8055144,42521288,Nantida,Vadhana,13.74126,100.55761,Private room,5000,3,0,NaT,,1,365,0
1282,No Name,1282,10000742,51374914,Diamond Bangkok,Ratchathewi,13.75328,100.52928,Private room,930,1,6,2017-05-13,0.07,1,365,0
1594,No Name,1594,10710165,55347997,Khaneungnit,Vadhana,13.71757,100.60464,Private room,1000,1,0,NaT,,1,365,0
2075,No Name,2075,13142743,73275200,Pakaphol,Khlong Toei,13.72566,100.56416,Private room,850,1,2,2017-12-11,0.03,3,220,0


> Seluruh *missing value* pada kolom `name` kini telah diisi oleh *value* "No Name."

In [139]:
#Pengecekan missing value pada kolom host name
df[df['host_name'].isna()]

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
3571,Cozy Hideaway,3571,19682464,137488762,,Bang Kapi,13.76999,100.63769,Private room,1399,3,1,2017-07-29,0.02,1,365,0


In [140]:
#Proses mengubah missing value pada kolom `name` menjadi "No Name"
df['host_name'].fillna('No Name', inplace=True)
df[df['host_name'] == 'No Name']

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
3571,Cozy Hideaway,3571,19682464,137488762,No Name,Bang Kapi,13.76999,100.63769,Private room,1399,3,1,2017-07-29,0.02,1,365,0


> *Missing value* pada kolom `host_name` kini telah diisi oleh *value* "No Name."

In [141]:
#Pengecekan missing value pada kolom last_review
df[df['last_review'].isna()]

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
1,"Easy going landlord,easy place",1,27979,120541,Emy,Bang Na,13.668180,100.616740,Private room,1316,1,0,NaT,,2,358,0
2,modern-style apartment in Bangkok,2,28745,123784,Familyroom,Bang Kapi,13.752320,100.624020,Private room,800,60,0,NaT,,1,365,0
4,Suite Room 3 at MetroPoint,4,941865,610315,Kasem,Bang Kapi,13.768720,100.633380,Private room,1905,1,0,NaT,,3,365,0
7,1 chic bedroom apartment in BKK,7,1738669,7045870,Jiraporn,Chatu Chak,13.829250,100.567370,Entire home/apt,1461,1,0,NaT,,1,365,0
14,"Deluxe Condo, Nana, Pool/GYM/Sauna",14,959254,5153476,Natcha,Khlong Toei,13.715160,100.568060,Entire home/apt,1400,30,0,NaT,,1,365,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15848,素坤逸核心两房公寓42楼，靠近BTSon nut/无边天际泳池观赏曼谷夜景/出门当地美食街,15849,790465040741092826,94899359,Renee,Pra Wet,13.715132,100.653458,Private room,2298,28,0,NaT,,1,362,0
15849,Euro LuxuryHotel PratunamMKt TripleBdNrShopingArea,15850,790474503157243541,491526222,Phakhamon,Ratchathewi,13.753052,100.538738,Private room,1429,1,0,NaT,,14,365,0
15850,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15851,790475335086864240,491526222,Phakhamon,Ratchathewi,13.753169,100.538700,Private room,1214,1,0,NaT,,14,365,0
15851,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15852,790475546213717328,491526222,Phakhamon,Ratchathewi,13.754789,100.538757,Private room,1214,1,0,NaT,,14,365,0


In [142]:
#Proses mengubah missing value pada kolom `last_review` menjadi "-"
df['last_review'].fillna('-', inplace=True)
df[df['last_review'] == '-']

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
1,"Easy going landlord,easy place",1,27979,120541,Emy,Bang Na,13.668180,100.616740,Private room,1316,1,0,-,,2,358,0
2,modern-style apartment in Bangkok,2,28745,123784,Familyroom,Bang Kapi,13.752320,100.624020,Private room,800,60,0,-,,1,365,0
4,Suite Room 3 at MetroPoint,4,941865,610315,Kasem,Bang Kapi,13.768720,100.633380,Private room,1905,1,0,-,,3,365,0
7,1 chic bedroom apartment in BKK,7,1738669,7045870,Jiraporn,Chatu Chak,13.829250,100.567370,Entire home/apt,1461,1,0,-,,1,365,0
14,"Deluxe Condo, Nana, Pool/GYM/Sauna",14,959254,5153476,Natcha,Khlong Toei,13.715160,100.568060,Entire home/apt,1400,30,0,-,,1,365,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15848,素坤逸核心两房公寓42楼，靠近BTSon nut/无边天际泳池观赏曼谷夜景/出门当地美食街,15849,790465040741092826,94899359,Renee,Pra Wet,13.715132,100.653458,Private room,2298,28,0,-,,1,362,0
15849,Euro LuxuryHotel PratunamMKt TripleBdNrShopingArea,15850,790474503157243541,491526222,Phakhamon,Ratchathewi,13.753052,100.538738,Private room,1429,1,0,-,,14,365,0
15850,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15851,790475335086864240,491526222,Phakhamon,Ratchathewi,13.753169,100.538700,Private room,1214,1,0,-,,14,365,0
15851,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15852,790475546213717328,491526222,Phakhamon,Ratchathewi,13.754789,100.538757,Private room,1214,1,0,-,,14,365,0


> *Missing value* pada kolom `last_review` kini telah diisi oleh *value* "-"

In [143]:
#Pengecekan missing value pada kolom reviews_per_month
df[df['reviews_per_month'].isna()]

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
1,"Easy going landlord,easy place",1,27979,120541,Emy,Bang Na,13.668180,100.616740,Private room,1316,1,0,-,,2,358,0
2,modern-style apartment in Bangkok,2,28745,123784,Familyroom,Bang Kapi,13.752320,100.624020,Private room,800,60,0,-,,1,365,0
4,Suite Room 3 at MetroPoint,4,941865,610315,Kasem,Bang Kapi,13.768720,100.633380,Private room,1905,1,0,-,,3,365,0
7,1 chic bedroom apartment in BKK,7,1738669,7045870,Jiraporn,Chatu Chak,13.829250,100.567370,Entire home/apt,1461,1,0,-,,1,365,0
14,"Deluxe Condo, Nana, Pool/GYM/Sauna",14,959254,5153476,Natcha,Khlong Toei,13.715160,100.568060,Entire home/apt,1400,30,0,-,,1,365,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15848,素坤逸核心两房公寓42楼，靠近BTSon nut/无边天际泳池观赏曼谷夜景/出门当地美食街,15849,790465040741092826,94899359,Renee,Pra Wet,13.715132,100.653458,Private room,2298,28,0,-,,1,362,0
15849,Euro LuxuryHotel PratunamMKt TripleBdNrShopingArea,15850,790474503157243541,491526222,Phakhamon,Ratchathewi,13.753052,100.538738,Private room,1429,1,0,-,,14,365,0
15850,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15851,790475335086864240,491526222,Phakhamon,Ratchathewi,13.753169,100.538700,Private room,1214,1,0,-,,14,365,0
15851,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15852,790475546213717328,491526222,Phakhamon,Ratchathewi,13.754789,100.538757,Private room,1214,1,0,-,,14,365,0


In [144]:
#Proses mengubah missing value pada kolom `reviews_per_month` menjadi "0"
df['reviews_per_month'].fillna('0', inplace=True)
df[df['reviews_per_month'] == '0']

Unnamed: 0,name,index,id,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
1,"Easy going landlord,easy place",1,27979,120541,Emy,Bang Na,13.668180,100.616740,Private room,1316,1,0,-,0,2,358,0
2,modern-style apartment in Bangkok,2,28745,123784,Familyroom,Bang Kapi,13.752320,100.624020,Private room,800,60,0,-,0,1,365,0
4,Suite Room 3 at MetroPoint,4,941865,610315,Kasem,Bang Kapi,13.768720,100.633380,Private room,1905,1,0,-,0,3,365,0
7,1 chic bedroom apartment in BKK,7,1738669,7045870,Jiraporn,Chatu Chak,13.829250,100.567370,Entire home/apt,1461,1,0,-,0,1,365,0
14,"Deluxe Condo, Nana, Pool/GYM/Sauna",14,959254,5153476,Natcha,Khlong Toei,13.715160,100.568060,Entire home/apt,1400,30,0,-,0,1,365,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15848,素坤逸核心两房公寓42楼，靠近BTSon nut/无边天际泳池观赏曼谷夜景/出门当地美食街,15849,790465040741092826,94899359,Renee,Pra Wet,13.715132,100.653458,Private room,2298,28,0,-,0,1,362,0
15849,Euro LuxuryHotel PratunamMKt TripleBdNrShopingArea,15850,790474503157243541,491526222,Phakhamon,Ratchathewi,13.753052,100.538738,Private room,1429,1,0,-,0,14,365,0
15850,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15851,790475335086864240,491526222,Phakhamon,Ratchathewi,13.753169,100.538700,Private room,1214,1,0,-,0,14,365,0
15851,Euro LuxuryHotel PratunamMKt TwinBedNrShopingArea,15852,790475546213717328,491526222,Phakhamon,Ratchathewi,13.754789,100.538757,Private room,1214,1,0,-,0,14,365,0


> *Missing value* pada kolom `reviews_per_month` kini telah diisi oleh *value* "0"

##### 5. Identifikasi Unique Data

In [145]:
pd.set_option('display.max_colwidth', 1)

#Menampilkan data unik dari setiap kolom
listItem=[]
for col in df.columns:
    listItem.append([col, df[col].nunique(), df[col].unique()])

UniqueTable=pd.DataFrame(columns=['Nama Kolom', 'Jumlah Data Unik', 'Sample Unik'],
                        data=listItem)
UniqueTable

Unnamed: 0,Nama Kolom,Jumlah Data Unik,Sample Unik
0,name,14794,"[Nice room with superb city view, Easy going landlord,easy place, modern-style apartment in Bangkok, Spacious one bedroom at The Kris Condo Bldg. 3, Suite Room 3 at MetroPoint, NEw Pro!! Bungalow Bkk Centre, Condo with Chaopraya River View, 1 chic bedroom apartment in BKK, Batcave, Pool view, near Chatuchak, Standard Room Decor do Hostel, Sathorn Terrace Apartment(61), 2BR apt in a cozy neighborhood, Comfy bedroom near River pier & BTS Taksin., budget hotel bangkok near subway, Deluxe Condo, Nana, Pool/GYM/Sauna, Luxury@swimpool/FreeWiFi/nearJJMkt, Nice and Quiet condo near BTS Onnut, 24Flr- 1br Apt near JJ, MRT, BTS, Central Bangkok 3 Bedroom Apartment, The Duplex - Asoke- Luxury 92sqm, New, Stylish & Luxury Studio Condo, River View - Ivy Condo (1 Bedroom), Siamese Gioia on Sukhumvit 31, Contemporary Modern Duplex-Thong Lo, Pan Dao Condo 5 min from BTS On Nut, 1 BR condominium center BKK +NETFLIX+55SQM, 1 penthouse in central Bangkok, MetroPoint Suite Room, Near Airport, Boutique Rooms Near Bangkok Airport, BangLuang House1 @ Bangkok Thailand, Studio near Chula University/Silom walk to MRT/BTS, กรองทองแมนชั่น (ลาดพร้าว 81), Deluxe one Bedroom Condo w.Pool-GYM & Sauna 8-7, Beautiful 1 BR apartment @BTS Ari, Urban Oasis in the heart of Bangkok, 1Bed apt. near Chula University/Silom, Stay at the ROARING RATCHADA!, 60 m2 apartment in Thong Lor, Bangkok, ICONSIAM River view on 49th floor, 2br apt in Sukhumvit Asoke near BTS, Self catering cozy1-bed near BTS, ❂☀☀☀Perfect Escape☀☀☀Sunny Roof EnSuite☀☀☀☀, Room with city view of BKK, BangLuang House 2@ Bangkok Thailand, Tranquility found in busy Bangkok near new skytran, Private room in Bangkok, ☞✪✪✪✪Roomy Studio 4 Family r friends✪No Stairs✪✪✪✪, ☞Downtown Central Studio-Bangkok MRT, Beautiful Wood Bangkok Resort House, ""Serviced 2 Bed Scenic SkyVillas"", Cozy 1BR rooftop (BTS Ploenchit) heart of bangkok, Chic two bedroom for Monthly rental, Sukhumvit52 near SkyTrain to BkkCBD, ♡Chic Studio, Easy Walk to Pier & BTS Taksin♡, One Bedroom Suite- WIFI- SATHORN, STUDIO RM2 - WIFI- SATHORN, Quiet Double Bed Apartment, Quiet Double Bed Apartment, Suvarnabhumi free transfer, Luxury&Comfy wthWifi walk-distance to Subwy-Malls, Apr. for rent full fur 1 bedroom, monthly, Long-stay special rate spacious entire floor Siam, One Bed Room at Sukumvit 50 Bangkok, City View, relaxed theme & delicious food around, Ideo Blucove Sukhumvit Bangkok, 2-BR condo near BTS on Sukhumvit Rd, NewlyRenovated! 3Br,SingleHouse, Park/BTS/Airport., IdeoMix, Sukhumvit RD, close to BTS, Mix Dorm Decor do Hostel, Oasis in the heart of Bangkok, 5 mins by car from Chong Nonsi BTS Station, Inn Saladaeng - Superior hotel room, Best nr Chatujak, MRT, BTS free wifi&fNetflix, ❂Citycenter✔Subway station✔Private Bathroom4Aircon, Nice River View Condominium 30 sq.m, Monthly rent 2Beds/2Baths quiet APT at BTS, Sukhumvit apartment near Nana BTS, A room w/ the view :-) in the city, Spacious 1Bed apartment, Near Bangkok more space than urban!, ✺✺99 feet in the sky✺✺, Cozy Studio Apt near Skytrain.(72/74), Asoke: tasteful, modern 1BR condo, 2 bed 2 bath, BTS, Supermarkets, Monthly, Private, relaxed with amenities, S1 hostel (Dorm) Sathorn Bangkok, 3 minutes walk to Phrom Phong BTS, 1 BDM CONDO SAPHAN KWAI/ARI walk to JJ/BTS/MRT, เฮ้าส์โหมด House Mode, ❂100% Private&Central Light EnSuite, Spacious Studio kitchen/wifi, 2. Bangkok bright Apartment 201, 1.Bangkok great value Studio WIFI, BKK City Fab Luxx Studio free wifi @1194, 5. Bangkok Bright Apartment -WIFI, 6. Bangkok nice, cosy Apartment 201, 7. Bangkok big bright Apartment 402, STUDIO-WIFI-RAIN SHOWER-SATHORN, Luxury Riverview Teakwood Apartment-Great Views :), 1 Bed Pool Access Onnut BTS, ...]"
1,index,15853,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]"
2,id,15853,"[27934, 27979, 28745, 35780, 941865, 1704776, 48736, 1738669, 1744248, 952677, 55681, 1765918, 55686, 59221, 959254, 62217, 1791481, 66046, 105042, 1793000, 960858, 113744, 965722, 1808600, 118118, 1816517, 969792, 121410, 145343, 973830, 156583, 1823321, 159854, 976690, 978531, 166267, 169285, 978969, 1842066, 169514, 1849029, 1862089, 985743, 988373, 172332, 1016487, 1862331, 1862377, 185364, 1887544, 1888303, 1019241, 241416, 1026451, 1028469, 1028486, 1035589, 1035640, 1897982, 296960, 1898332, 1041976, 313459, 1052180, 1926489, 320014, 1933894, 1057173, 1060320, 384924, 1067748, 1077493, 1943048, 385130, 385278, 385979, 390611, 1947314, 1079039, 1086843, 393066, 397449, 405662, 1088343, 1094136, 1961981, 407381, 1975849, 1133843, 413824, 428360, 428421, 428907, 428950, 430691, 430703, 430706, 432004, 439051, 1138679, ...]"
3,host_id,6658,"[120437, 120541, 123784, 153730, 610315, 2129668, 222005, 7045870, 9181769, 5171292, 263049, 9279712, 284095, 5153476, 302658, 9399478, 323158, 545890, 9407280, 3769704, 578110, 5265861, 9478184, 596463, 8492603, 5297436, 703944, 5325919, 58920, 9545111, 766443, 5344120, 5309669, 806600, 5358785, 9626074, 729617, 9652914, 1927968, 822284, 5594281, 8214044, 889670, 6132593, 9821998, 3323622, 960892, 3346331, 2148220, 4115838, 8362130, 175729, 4837310, 5735895, 1611711, 5793490, 9434109, 843854, 9509219, 5822937, 1667508, 4154759, 5929404, 9906827, 1928343, 1681901, 807406, 10070953, 5935474, 4937984, 1425515, 5981006, 10138065, 1463818, 8480912, 6220137, 2122791, 4877320, 10204823, 2592798, 10222460, 10246374, 7454779, 8664261, 6262801, 6313832, 1513875, 5402740, 2625384, 6586465, 9390848, 2864425, 10581237, 1780407, 6647138, 6648722, 2940438, 3533863, 3687435, 5469059, ...]"
4,host_name,5312,"[Nuttee, Emy, Familyroom, Sirilak, Kasem, Wimonpak, Athitaya, Jiraporn, Nol, Somsak, Tor, Jing, Mimi, Natcha, Srisuk, Piyakorn, Sue, Henry, Timo, Pat, Muay, Chuchart, Shine, Dustin, Sudhichai, Anya, Parinya, วสวัตติ์, Gael, Penjit, Gerd, Nattavut, Apiradee, Frances, Danny, Weera, Kanchuya, Jirasak, Evan, Rae And Charlie, Yodying, Evan From Sanctuary House, Narumon, Salvatore, Pichanee, Phoebe, Vajirune, Bee, Marvin, Primrose, Luckana, Mitch & Mam, Veesa, Pariya, Nichapat, Nicky, Sander, Anshera, Piya, Siriwipa, Inn Saladaeng & The Sathon Vimanda, Nokiko, Chanvit, Pornpan, Hollis, Vichit, Tisa, Sugarcane, Peter, Sibyl, S1, Amporn, Chris And Lek, Prapussorn, Maam & Hermann, Nisa, Jahidul, Nokina, Preeda, Arika, Lily Duangdao, Kriengkrai, Andrea, Psirivedin, Suchada, Nattha, Mike, Tayawat, VeeZa, Urcha, Anchana, Feb, NiNew, Taweewat (Ken), Kinifrog, Sarasinee, Avinash, Andrew, Tam, Egidio, ...]"
5,neighbourhood,50,"[Ratchathewi, Bang Na, Bang Kapi, Din Daeng, Bang Kho laen, Rat Burana, Chatu Chak, Khlong San, Bang Rak, Phaya Thai, Sathon, Khlong Toei, Vadhana, Sai Mai, Lat Krabang, Bangkok Yai, Wang Thong Lang, Huai Khwang, Phasi Charoen, Bang Sue, Nong Chok, Phra Khanong, Thawi Watthana, Parthum Wan, Pra Wet, Phra Nakhon, Thon buri, Yan na wa, Suanluang, Don Mueang, Dusit, Lak Si, Samphanthawong, Bueng Kum, Bang Phlat, Saphan Sung, Min Buri, Khan Na Yao, Khlong Sam Wa, Bang Khen, Lat Phrao, Chom Thong, Bangkok Noi, Pom Prap Sattru Phai, Nong Khaem, Thung khru, Bang Khae, Bang Khun thain, Taling Chan, Bang Bon]"
6,latitude,9606,"[13.75983, 13.66818, 13.75232, 13.78823, 13.76872, 13.69757, 13.68556, 13.82925, 13.81693, 13.7204, 13.71934, 13.77486, 13.71802, 13.77941, 13.71516, 13.79152, 13.70719, 13.82298, 13.73378, 13.74668, 13.9077, 13.68568, 13.74444, 13.72097, 13.70441, 13.75351, 13.7547, 13.76747, 13.721868, 13.73292, 13.7285, 13.78938, 13.74293, 13.77931, 13.72291, 13.72733, 13.78118, 13.73224, 13.72287, 13.74464, 13.7137, 13.72062, 13.71803, 13.73122, 13.83148, 13.82148, 13.72073, 13.72063, 13.779, 13.72096, 13.73782, 13.72687, 13.70169, 13.71192, 13.71602, 13.71798, 13.79274, 13.79315, 13.72141, 13.80926, 13.67805, 13.74814, 13.71513, 13.69947, 13.67998, 13.7281, 13.70004, 13.67991, 13.72214, 13.74902, 13.71498, 13.72825, 13.81694, 13.68426, 13.71905, 13.74052, 13.71012, 13.75224, 13.75135, 13.71782, 13.73816, 13.7104, 13.69949, 13.72157, 13.73675, 13.79128, 13.72646255738608, 13.69673, 13.69935, 13.69977, 13.698, 13.70068, 13.69925, 13.70086, 13.71922, 13.67317, 13.71119, 13.70497, 13.69832, 13.70218, ...]"
7,longitude,10224,"[100.54134, 100.61674, 100.62402, 100.57256, 100.63338, 100.5288, 100.49535, 100.56737, 100.56433, 100.50757, 100.5176, 100.54272, 100.51539, 100.57383, 100.56806, 100.53982, 100.59936, 100.56484, 100.56303, 100.56137, 100.64473, 100.49231, 100.57003, 100.57823, 100.59968, 100.53308, 100.53268, 100.63287, 100.771713, 100.46413, 100.52313, 100.6134, 100.55603, 100.54262, 100.53759, 100.52555, 100.58349, 100.57803, 100.51678, 100.55784, 100.59637, 100.54707, 100.54654, 100.46228, 100.52102, 100.58326, 100.5469, 100.54694, 100.83671, 100.52911, 100.55179, 100.52725, 100.5977, 100.51535, 100.52663, 100.52841, 100.33377, 100.33356, 100.72946, 100.56892, 100.62451, 100.52016, 100.5683, 100.52726, 100.61074, 100.57318, 100.6787, 100.61055, 100.50752, 100.55652, 100.53312, 100.53774, 100.56451, 100.49841, 100.50414, 100.55237, 100.60281, 100.58979, 100.49447, 100.51622, 100.56495, 100.60171, 100.54509, 100.57191, 100.54935, 100.49129336748284, 100.53202, 100.52723, 100.52652, 100.52886, 100.5228, 100.52624, 100.52276, 100.5255, 100.54406, 100.63334, 100.55134, 100.52434, 100.52254, 100.52806, ...]"
8,room_type,4,"[Entire home/apt, Private room, Hotel room, Shared room]"
9,price,3039,"[1905, 1316, 800, 1286, 1000, 1558, 1461, 700, 1150, 1893, 1862, 910, 1400, 4156, 1577, 122594, 5680, 5034, 1500, 1385, 3775, 2078, 1732, 2000, 3082, 1190, 1329, 1176, 600, 1659, 5429, 1843, 1870, 2500, 1300, 1200, 3500, 1795, 350, 1450, 8658, 3757, 1490, 2701, 2251, 866, 2696, 1736, 1800, 900, 400, 2900, 2226, 890, 3394, 922, 1543, 1589, 1271, 1747, 797, 6926, 1169, 5195, 829, 950, 2355, 980, 330, 3740, 1143, 831, 790, 720, 1211, 970, 929, 670, 1004, 811, 1629, 835, 926, 650, 5887, 1250, 2571, 3847, 1485, 2814, 707, 2061, 750, 693, 1088, 808, 500, 3097, 850, 1212, ...]"


> - Setelah dilakukan pengetesan dan pengecekan ulang, kolom `ID` yang bertindak sebagai *primary key* sudah dipastikan tidak memiliki duplikasi data.
> - Kolom `room_type` memiliki 4 opsi *value*, yaitu Entire home/apt, Private room, Hotel room, dan Shared room.
> - Sudah tidak terdapat *missing value.*

Secara keseluruhan, pada tahapan *data understanding* ini bisa disimpulkan menjadi beberapa poin berikut:
1. Jumlah keseluruhan pada dataset ini adalah sebanyak 15854 data yang terbagi pada 17 kolom.
1. Format pada kolom `last_review` masih keliru, seharusnya berformat *date.*
1. Anomali data pada kolom `price` telah berhasil ditangani dengan melakukan *drop row data*.
1. Kolom `ID` yang bersifat sebagai *primary key* tidak memiliki duplikasi data.
1. *Missing value* pada kolom `name`, `host_name`, `last_review`, `reviews_per_month` sudah berhasil ditangani.
1. Kolom `room_type` memiliki 4 opsi *value*, yaitu Entire home/apt, Private room, Hotel room, dan Shared room.