Using the Inside Airbnb listings data for San Francisco, I was able to understand popular trends and predict SF listing prices given certain characteristics.
Python Packages used in the project:
- NumPy
- Pandas
- Scikit-learn
- mpl_toolkits
- matplotlib
This project is meant to give deeper understanding of the airbnb listing data, and introduce how powerful and convenient scikit-learn package's algorithm functions are.
- Most of the listings are clustered near the bart stations and center of the city
- Mission (773), Western Addition (557) and South of Market (440) at the top 3 neighborhoods with most listings
- Average price of all SF listings is $203.64.
- Prices very wildly based on property and room types.
- Golden Gate Park ($308), Marina ($290), Pacific Heights ($287) are the most expensive neighborhoods.
- Majority of listings are rented for their entirety, although private room is a close second. This is the most important factor when people choose where to stay.
- Accomodates is the second most important factor, meaning that most people who use Airbnb at SF travel in groups.
- Almost all of listings are apartment or houses, with few interesting ones like castle or caves mixed in.
- Most frequent words in summaries show that more hosts talk about the surrounding area rather than the listing itself.
- Listings with prices around $200 - 300 get the most reviews, meaning that they are booked most often.
- Cancellation policies are fairly spread out, but it doesn’t make a big difference for most people.