# Hawaii Airbnb Data Analysis

Mission statement 

If someone wanted to run an Air B&B in Hawaii, here's somethings they might want to consider...

Reason for topic of choice

Large amount of Data to work with, allowing for the use of many different programs we’ve learned along the way.

Description of source of Data

Inside Airbnb: Get the Data http://insideairbnb.com/get-the-data
    
Data Storage

SQL

Questions

* Which neighborhood are the most popular, profitable and have more renters?
* Which room type is the most popular type?
* What are the most important characteristics of listing to attract more customers and influence the price?
* What time during a year is the busiest?
* Is there a "peak" season?
* What time should property maintenance be done?
* Does weather play a factor in desirability?

In [99]:
# Import our dependencies
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

In [100]:
# Columns that we need 
columns = ["id", "neighbourhood_cleansed", "latitude", "longitude", "room_type", "price",
           "review_scores_location","amenities", "number_of_reviews"] 
# Read the csv dile
hawaii_df = pd.read_csv('listings.csv', usecols=columns)
hawaii_df.head(5)

Unnamed: 0,id,neighbourhood_cleansed,latitude,longitude,room_type,amenities,price,number_of_reviews,review_scores_location
0,5269.0,South Kohala,20.0274,-155.702,Entire home/apt,"[""Dedicated workspace"", ""Hot water"", ""Fire ext...",$140.00,19,5.0
1,157141.0,Koolauloa,21.58989,-157.89154,Entire home/apt,"[""Hot water"", ""Fire extinguisher"", ""Lockbox"", ...",$130.00,335,4.66
2,162600.0,Kapaa-Wailua,22.06174,-159.32052,Entire home/apt,"[""Microwave"", ""Dryer"", ""Free parking on premis...",$557.00,19,5.0
3,6.20423e+17,Koolauloa,21.604002,-157.895791,Entire home/apt,"[""TV"", ""Air conditioning"", ""Fire extinguisher""...",$355.00,0,
4,342351.0,South Kona,19.45073,-155.87281,Entire home/apt,"[""Building staff"", ""Stove"", ""Hot water"", ""Refr...",$109.00,98,4.94


In [101]:
hawaii_df.shape

(26345, 9)

In [102]:
hawaii_df.dtypes

id                        float64
neighbourhood_cleansed     object
latitude                  float64
longitude                 float64
room_type                  object
amenities                  object
price                      object
number_of_reviews           int64
review_scores_location    float64
dtype: object

In [103]:
hawaii_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26345 entries, 0 to 26344
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   id                      26345 non-null  float64
 1   neighbourhood_cleansed  26345 non-null  object 
 2   latitude                26345 non-null  float64
 3   longitude               26345 non-null  float64
 4   room_type               26345 non-null  object 
 5   amenities               26345 non-null  object 
 6   price                   26345 non-null  object 
 7   number_of_reviews       26345 non-null  int64  
 8   review_scores_location  21607 non-null  float64
dtypes: float64(4), int64(1), object(4)
memory usage: 1.8+ MB


In [104]:
hawaii_df.describe()

Unnamed: 0,id,latitude,longitude,number_of_reviews,review_scores_location
count,26345.0,26345.0,26345.0,26345.0,21607.0
mean,6.983362e+16,20.90155,-157.172162,32.18402,4.868784
std,1.91134e+17,0.792978,1.253721,58.22463,0.252785
min,5269.0,18.92025,-159.71462,0.0,1.0
25%,22552110.0,20.698353,-157.83822,1.0,4.83
50%,41292430.0,20.95908,-156.69012,9.0,4.94
75%,51140580.0,21.29123,-156.43727,36.0,5.0
max,6.44241e+17,22.22938,-154.82293,971.0,5.0


In [105]:
hawaii_df.drop_duplicates(inplace=True)

## Clean the data

Handle missing data

In [106]:
# Determine if there are any missing values in the data.
hawaii_df.isnull().sum()

id                           0
neighbourhood_cleansed       0
latitude                     0
longitude                    0
room_type                    0
amenities                    0
price                        0
number_of_reviews            0
review_scores_location    4736
dtype: int64

In [107]:
# Fill in the empty rows
hawaii_df.fillna(0)

Unnamed: 0,id,neighbourhood_cleansed,latitude,longitude,room_type,amenities,price,number_of_reviews,review_scores_location
0,5.269000e+03,South Kohala,20.027400,-155.702000,Entire home/apt,"[""Dedicated workspace"", ""Hot water"", ""Fire ext...",$140.00,19,5.00
1,1.571410e+05,Koolauloa,21.589890,-157.891540,Entire home/apt,"[""Hot water"", ""Fire extinguisher"", ""Lockbox"", ...",$130.00,335,4.66
2,1.626000e+05,Kapaa-Wailua,22.061740,-159.320520,Entire home/apt,"[""Microwave"", ""Dryer"", ""Free parking on premis...",$557.00,19,5.00
3,6.204230e+17,Koolauloa,21.604002,-157.895791,Entire home/apt,"[""TV"", ""Air conditioning"", ""Fire extinguisher""...",$355.00,0,0.00
4,3.423510e+05,South Kona,19.450730,-155.872810,Entire home/apt,"[""Building staff"", ""Stove"", ""Hot water"", ""Refr...",$109.00,98,4.94
...,...,...,...,...,...,...,...,...,...
26340,4.859407e+07,North Shore Kauai,22.226800,-159.471720,Private room,"[""TV"", ""Hair dryer"", ""Essentials"", ""Shared poo...",$337.00,0,0.00
26341,5.295722e+07,North Shore Oahu,21.590430,-158.111510,Entire home/apt,"[""Waterfront"", ""Hot water"", ""Free parking on p...",$199.00,2,5.00
26342,6.250197e+06,Koolauloa,21.650280,-157.913180,Entire home/apt,"[""Hot water"", ""Fire extinguisher"", ""Microwave""...",$261.00,100,4.94
26343,6.129680e+17,Lanai,20.939219,-156.939224,Entire home/apt,"[""Dedicated workspace"", ""Building staff"", ""Wat...",$999.00,0,0.00


In [108]:
[[column,hawaii_df[column].isnull().sum()] for column in hawaii_df.columns]

[['id', 0],
 ['neighbourhood_cleansed', 0],
 ['latitude', 0],
 ['longitude', 0],
 ['room_type', 0],
 ['amenities', 0],
 ['price', 0],
 ['number_of_reviews', 0],
 ['review_scores_location', 4736]]

In [109]:
hawaii_columns_to_keep = [column for column in hawaii_df.columns if hawaii_df[column].isnull().sum() < len(hawaii_df) * 0.9]
hawaii_df = hawaii_df[hawaii_columns_to_keep]
hawaii_df

Unnamed: 0,id,neighbourhood_cleansed,latitude,longitude,room_type,amenities,price,number_of_reviews,review_scores_location
0,5.269000e+03,South Kohala,20.027400,-155.702000,Entire home/apt,"[""Dedicated workspace"", ""Hot water"", ""Fire ext...",$140.00,19,5.00
1,1.571410e+05,Koolauloa,21.589890,-157.891540,Entire home/apt,"[""Hot water"", ""Fire extinguisher"", ""Lockbox"", ...",$130.00,335,4.66
2,1.626000e+05,Kapaa-Wailua,22.061740,-159.320520,Entire home/apt,"[""Microwave"", ""Dryer"", ""Free parking on premis...",$557.00,19,5.00
3,6.204230e+17,Koolauloa,21.604002,-157.895791,Entire home/apt,"[""TV"", ""Air conditioning"", ""Fire extinguisher""...",$355.00,0,
4,3.423510e+05,South Kona,19.450730,-155.872810,Entire home/apt,"[""Building staff"", ""Stove"", ""Hot water"", ""Refr...",$109.00,98,4.94
...,...,...,...,...,...,...,...,...,...
26340,4.859407e+07,North Shore Kauai,22.226800,-159.471720,Private room,"[""TV"", ""Hair dryer"", ""Essentials"", ""Shared poo...",$337.00,0,
26341,5.295722e+07,North Shore Oahu,21.590430,-158.111510,Entire home/apt,"[""Waterfront"", ""Hot water"", ""Free parking on p...",$199.00,2,5.00
26342,6.250197e+06,Koolauloa,21.650280,-157.913180,Entire home/apt,"[""Hot water"", ""Fire extinguisher"", ""Microwave""...",$261.00,100,4.94
26343,6.129680e+17,Lanai,20.939219,-156.939224,Entire home/apt,"[""Dedicated workspace"", ""Building staff"", ""Wat...",$999.00,0,


## Prepare the data 

Converting dollar amounts to floating number. 

In [93]:
price = ['price']
for p in price:
    hawaii_df[p] = hawaii_df[p].str.replace("[$,]", "").astype("float")

hawaii_df[price]

Unnamed: 0,price
0,140.0
1,130.0
2,557.0
3,355.0
4,109.0
...,...
26340,337.0
26341,199.0
26342,261.0
26343,999.0


Which neighborhood are the most popular, profitable and have more renters?

In [None]:
#simple Linear Regression model

In [None]:
#GEOjsn heatmap

Which room type is the most popular type?

What are the most important characteristics of listing to attract more customers and influence the price?

In [None]:
##

In [94]:
reviews_df = pd.read_csv('reviews_1.csv')
reviews_df

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,23726706,246770863,2018-03-26,158582130,Camilla,"Nice place, nice connection to amsterdam, we e..."
1,23726706,248011183,2018-03-30,11198871,Peter,Sehr Empfehlenswert! Alles sauber und ordentli...
2,23726706,248833758,2018-04-01,155953524,Ayme,Una Excelente estancia en casa de Patricia y s...
3,23726706,251489252,2018-04-08,175511774,Hannah,Patricia's room was exactly like the pictures ...
4,23726706,253710522,2018-04-15,178502934,Elen,Made our visit 300% more pleasant and easier.....
...,...,...,...,...,...,...
313339,40575103,609897529448514885,2022-04-21,349584702,Alexandra,Accueil chaleureux de la part de Quirien <br/>...
313340,40575103,618591810255532746,2022-05-03,55531279,Faisal,.
313341,40575103,622265156972840889,2022-05-08,340488171,Bilge Eda,The host is very welcoming. She was able to he...
313342,40575103,627278934112950018,2022-05-15,192927056,Aaron,"Great, clean apartment with ample space. The h..."


In [95]:
reviews_df.drop(columns = ["listing_id", "reviewer_id", "reviewer_name", "comments"], inplace = True)

In [96]:
reviews_df.drop_duplicates(inplace=True)

In [97]:
reviews_df.head(5)

Unnamed: 0,id,date
0,246770863,2018-03-26
1,248011183,2018-03-30
2,248833758,2018-04-01
3,251489252,2018-04-08
4,253710522,2018-04-15


In [98]:
reviews_df['date'] = pd.to_datetime(reviews_df['date'])
reviews_df.dtypes

id               int64
date    datetime64[ns]
dtype: object

What time during a year is the busiest?

What time should property maintenance be done?

Is there a "peak" season?

In [None]:
# groupby

Does weather play a factor in desirability?

In [None]:
# API