# Analysis on the AirBnb Market in Toronto
Ryan Blackadar, Juno College of Technology
November, 2021

Data Source: http://insideairbnb.com/get-the-data.html

## Data Question: What does a prospective host need to know? What indicators should a host look for before listing their unit?

In [1]:
#Import the CSV using Pandas library with encoding specified for special characters used in the raw data

import pandas as pd
import numpy as np
import statsmodels.api as sm

listings = pd.read_csv("listings_cleaned.csv", encoding='utf-8')

## Exploratory Analysis

### What does a given row in this dataset look like?

In [2]:
listings.head(1)

Unnamed: 0,id,listing_url,name,description,neighborhood_overview,host_id,host_url,host_name,host_since,host_location,...,availability_365,number_of_reviews,review_scores_rating,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,1419,https://www.airbnb.com/rooms/1419,Beautiful home in amazing area!,"This large, family home is located in one of T...",The apartment is located in the Ossington stri...,1565,https://www.airbnb.com/users/show/1565,Alexandra,2008-08-08,"Vancouver, British Columbia, Canada",...,0,7,5.0,,f,1,1,0,0,0.09


### Below is a list of the columns in the data set with the value from the first row on the right:

In [3]:
listings.iloc[0,:]

id                                                                                           1419
listing_url                                                     https://www.airbnb.com/rooms/1419
name                                                              Beautiful home in amazing area!
description                                     This large, family home is located in one of T...
neighborhood_overview                           The apartment is located in the Ossington stri...
host_id                                                                                      1565
host_url                                                   https://www.airbnb.com/users/show/1565
host_name                                                                               Alexandra
host_since                                                                             2008-08-08
host_location                                                 Vancouver, British Columbia, Canada
host_about          

In [4]:
print("In total, there are " + str(len(listings['host_id'])) + " listings and " + str(len(listings['host_id'].unique())) + " unique host ID's")

In total, there are 15155 listings and 9858 unique host ID's


### Data Cleaning Processes

In [5]:
#map the t or f values to a zero or one and replace the columns with the new values

listings['host_is_superhost'] = listings['host_is_superhost'].map({'t':1,'f':0})
listings['host_identity_verified'] = listings['host_identity_verified'].map({'t':1,'f':0})
listings['instant_bookable'] = listings['instant_bookable'].map({'t':1,'f':0})
listings['has_availability'] = listings['has_availability'].map({'t':1,'f':0})

In [6]:
#One-Hot Encoding to Convert the 4 Room Types from t/f to a 0/1 value

roomtype_encoded = pd.get_dummies(listings['room_type'])
listings = pd.concat([listings, roomtype_encoded], axis=1)

### Calculated Column to Indicate Price per Person

In [7]:
#Price alone is not entirely indicative - many listings have several beds to accomodate families or groups. 
#Note: Accomodation factors in the size of beds, as well as couches, sofa-bed and cots
#By determining a price per person, we factor in overall accomodation:

listings['price_per_person'] = listings.price / listings.accommodates

In [8]:
#What are the units with the highest price per person?

highest_ppp = listings[['accommodates', 'price', 'price_per_person', 'name', 'listing_url']]
highest_ppp_sorted = highest_ppp.sort_values(by=['price_per_person'], ascending=False)
highest_ppp_sorted.head(30)

Unnamed: 0,accommodates,price,price_per_person,name,listing_url
8241,1,10000,10000.0,DuPont Toronto ”..,https://www.airbnb.com/rooms/34280869
7667,1,8614,8614.0,Toronto Downtown Spadina Master bed room''...,https://www.airbnb.com/rooms/32162226
4565,2,13000,6500.0,Modern Upscale Condo in downtown Toronto,https://www.airbnb.com/rooms/20653172
8389,2,12400,6200.0,"Lively Little Italy, Main Floor!",https://www.airbnb.com/rooms/34839883
1904,2,11800,5900.0,Sunny Central House with Style in Toronto DT,https://www.airbnb.com/rooms/9570770
6713,2,8617,4308.5,Central Luxury Large bedroom with large terrace,https://www.airbnb.com/rooms/28443226
7625,1,4234,4234.0,Toronto Downtown Spadina Subway single bed roo...,https://www.airbnb.com/rooms/31983642
7672,2,7500,3750.0,"Cosy Room Toronto Downtown spadina Subway ""'..",https://www.airbnb.com/rooms/32170366
6733,2,7105,3552.5,Downtown Rosedale Subway cozy private room,https://www.airbnb.com/rooms/28540945
8958,2,7103,3551.5,Nolinsiki Annex Cozy room ‘’..,https://www.airbnb.com/rooms/36689222


The results of the dataframe above point out an interesting issue with the dataset. It seems that some users are pricing their units far beyond market rates (the highest being 10,000 / night for 1 person). The rationale is unclear, but we have reason to believe that some Airbnb prices are not accuratfor reference, 10,000/night is a price comparable to the Presidential Suite at the Four Seasons)

In [9]:
#By removing results priced $1200 per bed per night, we remove some outlier prices for this analysis.
#26 listings are removed from the dataset in this process.

listings_filtered = listings[listings.price_per_person < 1200]

### Who are the Top 20 hosts on Airbnb by the number of unit listings?

In [10]:
host_unitcounts = listings_filtered.groupby(['host_name', 'host_id']).size().sort_values(ascending=False).to_frame()
host_unitcounts = host_unitcounts.reset_index(drop=False)

In [11]:
host_unitcounts.columns = ['host_name', 'host_id', 'unit_count']
host_unitcounts.head(20)

Unnamed: 0,host_name,host_id,unit_count
0,Sky View,785826,116
1,Simply Comfort,10202618,67
2,Ayk,135718015,49
3,Julie,846505,40
4,Gevorg,327456656,36
5,Alec And Lily,269243315,32
6,Sarah,54422135,32
7,Toronto Heritage Residences,26743967,30
8,Davar,342316738,29
9,Sonder (Toronto),301014754,28


The user with the highest count of listings is 'Sky View', with a staggering 116 units listed on Airbnb in Toronto. Further exploration into this host below:

## Cont'd Analysis: Who is 'Sky View'?

In [12]:
#New data frame filtered for Sky View's host ID:

host785826 = listings_filtered[listings_filtered['host_id'] == 785826][['host_id', 'host_url', 'host_name', 'host_since','host_about','host_response_rate', 'host_acceptance_rate', 'host_is_superhost', 'host_neighbourhood', 'host_identity_verified', 'neighbourhood_cleansed', 'price', 'number_of_reviews', 'review_scores_rating', 'license', 'instant_bookable', 'calculated_host_listings_count_entire_homes']]

Description pulled from URL:
"Sky View Suites is proud to call Toronto our home. We invite you to come and experience all the city has to offer while enjoying our elegant rental properties. Based in Toronto."

In [13]:
#Sky View's unit counts, grouped by neighbourhood

host785826.groupby('neighbourhood_cleansed').size().sort_values(ascending=False)

neighbourhood_cleansed
Waterfront Communities-The Island    87
Bay Street Corridor                  24
Church-Yonge Corridor                 3
Annex                                 1
Niagara                               1
dtype: int64

It turns out 75% of this user's listings are concentrated in the popular "Waterfront Communities-The Island" neighbourhood.

In [14]:
#Additional Stats:

host785826.iloc[0,:]

host_id                                                                                   785826
host_url                                                https://www.airbnb.com/users/show/785826
host_name                                                                               Sky View
host_since                                                                            2011-07-06
host_about                                     Sky View Suites is proud to call Toronto our h...
host_response_rate                                                                           92%
host_acceptance_rate                                                                         86%
host_is_superhost                                                                            0.0
host_neighbourhood                                                               Garden District
host_identity_verified                                                                       1.0
neighbourhood_cleansed        

- Account created in 2011
- Host Response Rate 92% (how likely they are to respond to messages)
- Host Acceptance Rate 86% (how likely they are to approve a booking request)
- Superhost? No
- Identity Verified? Yes
- License? No
- 100% of their listings are for "entire home / apt" (116/116)

In [15]:
print("Sky View has " + str(host785826['number_of_reviews'].sum()) + " reviews on their units at an rating average of " + str(round(host785826['review_scores_rating'].mean(), 2)) + " and an average rental price of " + str(round(host785826['price'].mean())) + " per night.")

Sky View has 317 reviews on their units at an rating average of 4.79 and an average rental price of 116 per night.


### How many hosts have only 1 unit listed? How many users have multiple units?
It appears 8903 (90%) of the 9858 unique hosts have 1 or 2 units listed on Airbnb, with the remaining 10% having 3 or more.

In [16]:
host_unitcounts.groupby(['unit_count']).size().sort_values(ascending=False).head(10)

unit_count
1     7850
2     1053
3      419
4      203
5       89
6       45
7       43
10      25
8       19
11      19
dtype: int64

### What does it take to become a Superhost?

*See Appendix for sources

Superhost status is evaluated by Airbnb every quarter and qualifying hosts earn an automatic badge applied to their listing(s). 
- Criteria:
-- 10+ trips or 3+ reservations that total at least 100 nights
-- 90% response rate or higher
-- 1% (or lower) cancellation rate
-- Minimum 4.8 overall rating

### What are the top 10 neighbourhoods by count of listings?

In [18]:
listings_filtered.groupby('neighbourhood_cleansed').size().sort_values(ascending=False).head(10)

neighbourhood_cleansed
Waterfront Communities-The Island      2668
Niagara                                 608
Church-Yonge Corridor                   486
Annex                                   478
Bay Street Corridor                     453
Trinity-Bellwoods                       408
Dovercourt-Wallace Emerson-Junction     378
Kensington-Chinatown                    371
Moss Park                               351
Willowdale East                         341
dtype: int64

It turns out, the 'Waterfront Communities-The Island' is by far the most saturated neighbourhood in terms of the number of units. It has more than 4x the number of units than the second highest ranking neighbourhood.

### How does the average nightly price-per-person vary by neighbourhood?

In [19]:
neighbourhood_pricing = listings_filtered.groupby('neighbourhood_cleansed')['price_per_person'].mean().round().sort_values(ascending=False).to_frame()
neighbourhood_pricing = neighbourhood_pricing.reset_index(drop=False)
neighbourhood_pricing.head(10)

Unnamed: 0,neighbourhood_cleansed,price_per_person
0,Brookhaven-Amesbury,79.0
1,Etobicoke West Mall,75.0
2,Casa Loma,71.0
3,Forest Hill South,69.0
4,Palmerston-Little Italy,67.0
5,Rustic,66.0
6,Henry Farm,65.0
7,Rosedale-Moore Park,65.0
8,Waterfront Communities-The Island,64.0
9,Church-Yonge Corridor,63.0


### Are there indicators that have a strong relationship with the guest ratings?

In [20]:
candidate_columns = ['review_scores_rating', 'host_is_superhost', 'host_identity_verified', 'instant_bookable', 'has_availability', 'price_per_person', 'accommodates', 'Entire home/apt', 'Hotel room', 'Private room', 'Shared room']

In [21]:
analysis = listings_filtered[candidate_columns].dropna()

In [22]:
#Creating variables (independent/dependent)
dependent_vars = analysis[candidate_columns[0]]
independent_vars = analysis[candidate_columns[1:]]

lin_reg = sm.OLS(dependent_vars, independent_vars)
reg_results = lin_reg.fit()
print(reg_results.summary())

                             OLS Regression Results                             
Dep. Variable:     review_scores_rating   R-squared:                       0.069
Model:                              OLS   Adj. R-squared:                  0.068
Method:                   Least Squares   F-statistic:                     95.43
Date:                  Tue, 30 Nov 2021   Prob (F-statistic):          1.24e-172
Time:                          11:22:21   Log-Likelihood:                -12308.
No. Observations:                 11660   AIC:                         2.464e+04
Df Residuals:                     11650   BIC:                         2.471e+04
Df Model:                             9                                         
Covariance Type:              nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
host_is_

The regression analysis unfortunately doesn't signal a linear relationship with guest's ratings. The F-Statistic shows that many data points are fed into the model, however the R-squared value remains low. Our takeaway is that ratings are complex, subjective and unpredictable.

### What are the units with the highest rating and the most reviews? What observations can be made?

In [23]:
top_listings = listings_filtered[['review_scores_rating', 'number_of_reviews', 'id', 'host_id', 'host_is_superhost', 'host_identity_verified', 'instant_bookable', 'has_availability', 'calculated_host_listings_count', 'neighbourhood_cleansed', 'room_type', 'bathrooms_text', 'accommodates', 'beds', 'price_per_person']]

top_listings_sorted = top_listings.sort_values(['review_scores_rating', 'number_of_reviews'], ascending=False)
top_listings_sorted.head(15)

Unnamed: 0,review_scores_rating,number_of_reviews,id,host_id,host_is_superhost,host_identity_verified,instant_bookable,has_availability,calculated_host_listings_count,neighbourhood_cleansed,room_type,bathrooms_text,accommodates,beds,price_per_person
3519,5.0,165,16901265,108760072,1.0,1.0,1,1,2,Willowridge-Martingrove-Richview,Private room,1 private bath,2,1.0,85.0
3064,5.0,147,14935968,93948352,0.0,1.0,1,1,1,Trinity-Bellwoods,Entire home/apt,1 bath,4,1.0,75.25
3564,5.0,142,17093701,70945480,1.0,1.0,0,1,1,Annex,Entire home/apt,1 bath,4,3.0,63.0
3297,5.0,118,15917764,103274818,1.0,1.0,0,1,1,Palmerston-Little Italy,Entire home/apt,1 bath,3,1.0,59.0
3200,5.0,99,15465463,44191437,0.0,1.0,0,1,1,Waterfront Communities-The Island,Entire home/apt,1 bath,2,2.0,83.0
5916,5.0,97,25192161,150609715,1.0,1.0,0,1,1,Islington-City Centre West,Entire home/apt,1 bath,2,1.0,69.5
4183,5.0,95,19555561,137403170,1.0,1.0,0,1,1,Dovercourt-Wallace Emerson-Junction,Entire home/apt,1 bath,2,2.0,52.5
6229,5.0,95,26746164,201103629,1.0,1.0,0,1,1,Annex,Entire home/apt,1 bath,2,1.0,94.5
8091,5.0,95,33751207,125625405,1.0,1.0,1,1,1,Pleasant View,Entire home/apt,3.5 baths,8,3.0,24.625
3552,5.0,92,17013748,8702985,1.0,1.0,0,1,10,Niagara,Entire home/apt,2.5 baths,8,5.0,62.25


Among 5-star rated listings with the most amount of reviews, here are some takeaways:

- All are marked as having ongoing availability
- Almost all are 'Entire home/apt'
- Price per person tends to range between 60-80 but does not exceed 100
- All hosts are Identity Verified
- Almost all are Superhosts
- The majority of the hosts only have 1 property
- Being an Instantly Bookable property does not seem to impact the rating
- The neighbourhoods are a diverse range

## Conclusion: Recommendations For the Prospective Host

- Deliver quality and earn top ratings by focusing on 1 property
- Take into consideration the saturation of competition in Toronto's different neighbourhoods while factoring in the rates clients are willing to pay in those areas
- Ensure you are Identity Verified
- Strive toward Superhost status by becoming familiar with the critera
-- Minimize cancellations
-- Prioritize responses
- Aim for continuity and consistency with availability

## Appendix (Additional Sources)

- https://www.airbnb.ca/help/article/828/what-is-a-superhost
- https://www.airbnb.ca/help/article/829/how-to-become-a-superhost
- https://www.airbnb.ca/resources/hosting-homes/a/secrets-from-a-seasoned-superhost-51