GitHub fails to render some of the graphs we created, so we recommend you view our notebook using nbviewer. CLICK HERE TO VIEW.
As a team, we were tasked with locating a dataset of our choosing, performing data cleaning and exploratory data analysis, and finally building machine learning models. We chose an Airbnb dataset because of the varied and interesting features, the sizable number of observations, and the relevance of the subject matter. Each row of the "Listings" dataframe represents a listing on Airbnb, containing information about the property itself and the host.
Our primary objectives are:
- Analyzing neighborhood popularity for superhosts and non-superhosts, based on traffic, types of rooms (private rooms/shared rooms/entire apartment), and price
- Drawing a comparison between the pre- and post-COVID listing prices
- Predicting the type of host (superhosts/non-superhosts) and understanding what attributes contribute to the classification of a superhost
- Predicting the price for Airbnb listings in New York City and understanding what features contribute to profitable business opportunities for Airbnb
Our datasets were taken from InsideAirbnb and were last updated in October 2020.
By Yulong Gong, Peter Mankiewich, Ruchika Venkateswaran, Phoenix Wang, Yangyang Zhou