# Predicting the Best airbnb Listing Price 

_Author: Evonne Tham_

---


![](https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iKAhd1KFQDfw/v0/-1x-1.jpg)


### Introduction

Airbnb is a online marketplace company that offers arrangement for lodging, homestays, or tourism experiences. It has become the world’s largest marketplaces with many travelers from all around the world, using it for the unique and authentic experience. As of 2020, Airbnb has over 7 million listings across the world, all powered by local hosts.

By sharing with travelers, this provides opportunity for homeowners to make more income from their homes. However with the constant increase of listings prices, hosts or even property managers are now face with a lot more competitors. Hence, getting the right listing price is crucial. If listing price is set too low, you will be missing out on a lot of potential income, and if listing price is set a tad bit higher you might lose a booking to other competitors. 

<br>
<br>

![title](./image/airbnb_site.png)

_Image Source: Airbnb Site_

<br>
<br>


Although there are third party pricing tools that are available to help price these properties, these tools can get quite pricey especially for people who are just starting out.

In order to solve this problem, I will be looking into the data from Toyko, one of the top most popular cities for Airbnb booking experiences. I will build a regression model to predict the best price of the listing in Toyko with respect to featurees such as the property or room type, location, etc. Model performance will be guided by R-Square and RMSE, and the model should at least improve upon baseline by 10%. Baseline is defined as the mean of the listing price.


___Citation:___
- https://news.airbnb.com/about-us/
- https://news.airbnb.com/fast-facts/
- https://ipropertymanagement.com/research/airbnb-statistics

---
## Executive Summary

#### - The opener
To accurately predict Airbnb price, 

Since we know the price for each row, this can be classified as a supervised learning problem, 


#### - The need
To generate a good production model that can accurately predict sale prices of houses in Ames with an unseen dataset, I first conducted data cleaning and EDA on the train dataset, where I was able to gauge how each feature affected sale price. Following which, I conducted conducted features engineering by adding, dropping and tweaking features, as well as using one-hot encoding to convert the remaining categorical columns to numeric, so that modelling can take place. 

#### - Findings

#### - The Evidence 


#### - Call to Action

<div class="alert alert-block alert-info">

- An executive summary:
- What is your goal?
- Where did you get your data?
- What are your metrics?
- What were your findings?
- What risks/limitations/assumptions affect these findings?

</div>

---
## Content
- [Initial Exploratory Data Analysis and Cleaning](./codes/01_Data_Cleaning_and_EDA.ipynb)
- [Full Exploratory Data Analysis](./codes/02_Full_EDA.ipynb)
- [Feature Engineering and Model Benchmarks](./codes/03_Feature_Engineering_and_Model_Benchmarks.ipynb)
- [Model Tuning](./codes/04_Model_Tuning.ipynb)
- [Production Model](./codes/05_Production_Model.ipynb)

---
### Risks & Assumptions

- The data is not tempered and it accurate in producing a non-commercial derivation to allow public analysis, discussion and community benefit. 
- Some reviews may be "spam" allowed by Airbnb. Analysis suggests that spam reviews are small and do not affect the statistics.
- The Airbnb calendar for a listing does not differentiate between a booked night vs an unavailable night, therefore these bookings have been counted as "unavailable". This serves to understate the Availability metric because popular listings will be "booked" rather than being "blacked out" by a host.


---
### Data Source

Source of data of airbnb in Tokyo, Kantō, Japan, can be found on the InsideAirbnb site, [here](http://insideairbnb.com/get-the-data.html). 
It is sourced from publicly from the Airbnb site.

Within the country dataset 

**Target variable:** Price

**And some of the features are:** 

- `calendar`: Provide detail about booking for the next year, containing 7 attributes:
    - `date` (datetime)
    - `available` (categorical)
    - `price` (numeric)
    - `adjusted_price` (numeric)
    - `minimum_nights` (numeric)
    - `maximum_nights` (numeric)
    
    
- `listing`: Detailed Tokyo listings data containing 106 attributes of each listing. Some of them used in analysis are:
    - `price` (numeric)
    - `property_type`(categorical)
    - `room_type`(categorical)
    - `neighbourhood`(categorical)
    - `amenities`(text)
    - `review_scores`(categorical)
    - `accommodates`(categorical)
    - `host_is_superhost`(boolean)



- `reviews`: Detailed reviews given by guests, containing 6 attributes
    - `listing_id` (numeric)
    - `date` (datetime)
    - `comment`(text)


- `neighbourhood`: Neighbourhood list for geo filter. Sourced from city or open source GIS files.


---
## Conclusion and Recommendation 

<div class="alert alert-block alert-info">

Summarize your statistical analysis, including:
- implementation
- evaluation
- inference

</div>