<h1 align="center"> What Makes an Airbnb Superhost? </h1>
<h2 align="center"> An Exploratory Analysis of AirBNB Reviews</h2>
<h3 align="center"> By: John Easter, Josh Elam, Weitao Fu, Emily Moreland, Ryan Wainz </h3>
<h3 align="center"> Last Update: 04/10/2020 </h3>

# 1. Introduction

Airbnb, Inc. is an online marketplace for arranging or offering lodging, primarily homestays, or tourism experiences. The company does not own any of the real estate listings, nor does it host events; it acts as a broker, receiving commissions from each booking. The company is based in San Francisco, California, United States. 

Since its founding in 2008, Airbnb has been one of the top travel sites for people who really want to be immersed in their destination city.

On the Airbnb platform, users have the opportunity to choose their lodging based on price, type of room, number of bedrooms, and most importantly, reviews left by past guests. Hosts also have the opportunity to gain the distinction of a “superhost”, which means that they have provided consistent hospitality to their guests and continue to go above and beyond for those staying in their home.

With our project we plan to make the customer absolutely clear on what specific property they should pursue based on their specific needs so they can have the best experience possible. 

Through this project, we want to answer the following questions:
* What makes someone a good host and gives them a “superhost” title?
     * Superhosts are important for letting the potential renter know that they are renting from an extraordinary and experienced host. 
     
* What types of attributes give listings good/bad reviews?
    * Cleanliness, location relative to popular tourists' sites, amenities (Netflix, Hulu, Cable), price, house rules, and host interactions with guests.
    
* How can we use a predictive model to estimate the quality of a listing?
    * Using a model tailored specifically for you and your needs on your trip.
    
* Which listing should we recommend to customers if they have special needs such as religions, pets, parking, disabilities, and children?
    * While pets and other commodities have made their way to the mainstream amenities for AirBnb's many hosts still do not factor in religion beliefs and disabilities into their rental properties. 

# 2. Data

We found our data set at the following link: https://www.kaggle.com/airbnb/seattle#listings.csv

Key information regarding what makes a "superhost" would be:
* Host response rate
* Host response time
* Experiences offered
* Amenitites
* Price
* Number of reviews
* Review scores rating

This data was directly scraped from the Airbnb website here: https://www.airbnb.com/s/Seattle--WA--United-States/all 

# 3. Data Manipulation

In [85]:
import pandas as pd
import numpy as np
cal = pd.read_csv("calendar.csv")
cal.head()

Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,$85.00
1,241032,2016-01-05,t,$85.00
2,241032,2016-01-06,f,
3,241032,2016-01-07,f,
4,241032,2016-01-08,f,


In [86]:
cal.info

<bound method DataFrame.info of          listing_id        date available   price
0            241032  2016-01-04         t  $85.00
1            241032  2016-01-05         t  $85.00
2            241032  2016-01-06         f     NaN
3            241032  2016-01-07         f     NaN
4            241032  2016-01-08         f     NaN
...             ...         ...       ...     ...
1393565    10208623  2016-12-29         f     NaN
1393566    10208623  2016-12-30         f     NaN
1393567    10208623  2016-12-31         f     NaN
1393568    10208623  2017-01-01         f     NaN
1393569    10208623  2017-01-02         f     NaN

[1393570 rows x 4 columns]>

listing_id: Id for each listing.<br>
date: A date from 01/04/2016 to 01/02/2017.<br>
available: Availability of a listing in a specific date.<br>
Price: Rent price of a listing in a specific date.<br>

In [87]:
pd.crosstab(index=cal['listing_id'],columns="count")

col_0,count
listing_id,Unnamed: 1_level_1
3335,365
4291,365
5682,365
6606,365
7369,365
...,...
10332096,365
10334184,365
10339144,365
10339145,365


Cal dataset stores the daily rent price of 3818 listings in a year.

In [88]:
pd.crosstab(cal['listing_id'], cal['available'], margins=False)

available,f,t
listing_id,Unnamed: 1_level_1,Unnamed: 2_level_1
3335,56,309
4291,0,365
5682,56,309
6606,0,365
7369,312,53
...,...,...
10332096,0,365
10334184,4,361
10339144,365,0
10339145,0,365


In [89]:
cal = cal.dropna()

In [90]:
pd.crosstab(index=cal['listing_id'],columns="count")

col_0,count
listing_id,Unnamed: 1_level_1
3335,309
4291,365
5682,309
6606,365
7369,53
...,...
10331249,354
10332096,365
10334184,361
10339145,365


In [91]:
# Enable inline plotting in notebook
%matplotlib inline

In [92]:
import matplotlib.pyplot as plt

In [93]:
cal['price'].str.replace('$','')

0          85.00
1          85.00
9          85.00
10         85.00
14         85.00
           ...  
1393207    87.00
1393208    87.00
1393211    87.00
1393212    87.00
1393213    87.00
Name: price, Length: 934542, dtype: object

In [94]:
cal

Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,$85.00
1,241032,2016-01-05,t,$85.00
9,241032,2016-01-13,t,$85.00
10,241032,2016-01-14,t,$85.00
14,241032,2016-01-18,t,$85.00
...,...,...,...,...
1393207,10208623,2016-01-06,t,$87.00
1393208,10208623,2016-01-07,t,$87.00
1393211,10208623,2016-01-10,t,$87.00
1393212,10208623,2016-01-11,t,$87.00


In [95]:
cal['price'] = cal['price'].astype(str) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [96]:
cal

Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,$85.00
1,241032,2016-01-05,t,$85.00
9,241032,2016-01-13,t,$85.00
10,241032,2016-01-14,t,$85.00
14,241032,2016-01-18,t,$85.00
...,...,...,...,...
1393207,10208623,2016-01-06,t,$87.00
1393208,10208623,2016-01-07,t,$87.00
1393211,10208623,2016-01-10,t,$87.00
1393212,10208623,2016-01-11,t,$87.00


In [97]:
cal['price'] = cal['price'].str.replace('$','')
cal

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,85.00
1,241032,2016-01-05,t,85.00
9,241032,2016-01-13,t,85.00
10,241032,2016-01-14,t,85.00
14,241032,2016-01-18,t,85.00
...,...,...,...,...
1393207,10208623,2016-01-06,t,87.00
1393208,10208623,2016-01-07,t,87.00
1393211,10208623,2016-01-10,t,87.00
1393212,10208623,2016-01-11,t,87.00


In [98]:
cal['price'] = cal['price'].str.replace(',','')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [99]:
cal

Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,85.00
1,241032,2016-01-05,t,85.00
9,241032,2016-01-13,t,85.00
10,241032,2016-01-14,t,85.00
14,241032,2016-01-18,t,85.00
...,...,...,...,...
1393207,10208623,2016-01-06,t,87.00
1393208,10208623,2016-01-07,t,87.00
1393211,10208623,2016-01-10,t,87.00
1393212,10208623,2016-01-11,t,87.00


In [100]:
cal['price'] = cal['price'].astype(float) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [101]:
cal

Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,85.0
1,241032,2016-01-05,t,85.0
9,241032,2016-01-13,t,85.0
10,241032,2016-01-14,t,85.0
14,241032,2016-01-18,t,85.0
...,...,...,...,...
1393207,10208623,2016-01-06,t,87.0
1393208,10208623,2016-01-07,t,87.0
1393211,10208623,2016-01-10,t,87.0
1393212,10208623,2016-01-11,t,87.0


In [103]:
cal.groupby('listing_id')['price'].mean()

listing_id
3335        120.000000
4291         82.000000
5682         53.944984
6606         92.849315
7369         85.000000
               ...    
10331249     45.000000
10332096     40.000000
10334184    120.000000
10339145    237.904110
10340165     43.000000
Name: price, Length: 3723, dtype: float64

Calculate the average rent price of each listing.

In [105]:
listings = pd.read_csv("listings.csv")
listings.head()

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 18-19: invalid continuation byte