# Adam's Udacity Data Science Blog Post Project

This notebook was made to explore Seattle AirBnB data retrieved from Kaggle July 18 2023 as part of the Blog Post Project of the Udacity Data Scientist Nanodegree. 

For the README, data sources, and other information related to this project, please see [the corresponding github repo](https://github.com/epistemetrica/udacity-blog-post-project). 

In [1]:
#install libraries
import numpy as np
import pandas as pd
import matplotlib as plot

## Inspecting the data

Let's take a look at the data, starting by loading into pandas dataframes, inspecting the first few rows, and looking at the info for each dataframe. 

In [3]:
#load csv files into dfs
calendar_df = pd.read_csv('airbnb_data_seattle/calendar.csv')
listings_df = pd.read_csv('airbnb_data_seattle/listings.csv')
reviews_df = pd.read_csv('airbnb_data_seattle/reviews.csv')

In [11]:
#inspect calendar data
display(calendar_df.head())
calendar_df.info()

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,6606,2023-06-24,f,$99.00,$99.00,30.0,1125.0
1,6606,2023-06-25,f,$99.00,$99.00,30.0,1125.0
2,6606,2023-06-26,f,$99.00,$99.00,30.0,1125.0
3,6606,2023-06-27,f,$99.00,$99.00,30.0,1125.0
4,6606,2023-06-28,f,$99.00,$99.00,30.0,1125.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2421778 entries, 0 to 2421777
Data columns (total 7 columns):
 #   Column          Dtype  
---  ------          -----  
 0   listing_id      int64  
 1   date            object 
 2   available       object 
 3   price           object 
 4   adjusted_price  object 
 5   minimum_nights  float64
 6   maximum_nights  float64
dtypes: float64(2), int64(1), object(4)
memory usage: 129.3+ MB


Wait. A 30-night minimum is odd for AirBnB, right? Let's take a deeper look at that column.  

In [19]:
display(calendar_df.minimum_nights.describe())
print('The most common minimum is {} nights, and the median minimum is {} nights.'.format(max(calendar_df.minimum_nights.mode()), calendar_df.minimum_nights.median()))

count    2.421777e+06
mean     3.160028e+01
std      8.415235e+01
min      1.000000e+00
25%      2.000000e+00
50%      3.000000e+00
75%      3.000000e+01
max      3.660000e+02
Name: minimum_nights, dtype: float64

The most common minimum is 2.0 nights, and the median minimum is 3.0 nights.


OK, that's about what we'd expect for AirBnB rentals. That first listing in the calendar data is odd indeed! But nothing seems to be structurally wrong with our data. Moving along... 

In [5]:
#inspect listings data
pd.set_option('display.max_columns', None) #removes limit on number of displayed columns
display(listings_df.head())
listings_df.info()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,13226114,https://www.airbnb.com/rooms/13226114,20230624173239,2023-06-25,city scrape,Home in Seattle · ★4.79 · 2 bedrooms · 4 beds ...,"Explore Columbia City's lively culinary, bar ...",Columbia City's diverse restaurants and locall...,https://a0.muscache.com/pictures/miso/Hosting-...,1884549,https://www.airbnb.com/users/show/1884549,Denise & Sean,2012-03-08,"Seattle, WA","We enjoy hiking, volleyball, culinary adventur...",,,83%,t,https://a0.muscache.com/im/pictures/user/89cbb...,https://a0.muscache.com/im/pictures/user/89cbb...,Columbia City,1,2,"['email', 'phone', 'work_email']",t,t,"Seattle, Washington, United States",Columbia City,Rainier Valley,47.56555,-122.29385,Entire home,Entire home/apt,5,,2 baths,2.0,4.0,"[""Private patio or balcony"", ""Ceiling fan"", ""B...",$240.00,4,200,4.0,7.0,22.0,200.0,4.1,189.5,,t,9,9,9,9,2023-06-25,24,5,0,2016-06-19,2022-12-01,4.79,4.92,4.83,5.0,5.0,4.61,4.65,STR-OPLI-19-000171,f,1,1,0,0,0.28
1,12518952,https://www.airbnb.com/rooms/12518952,20230624173239,2023-06-25,city scrape,Guest suite in Seattle · ★5.0 · 2 bedrooms · 6...,"Newly painted, beautiful, bright and wonderfu...","Many good restaurants and cafés, bathhouse the...",https://a0.muscache.com/pictures/9ff506f6-8927...,12677600,https://www.airbnb.com/users/show/12677600,Joe,2014-02-28,"Seattle, WA","I enjoy travel, languages and am curious of cu...",within an hour,100%,100%,t,https://a0.muscache.com/im/pictures/user/69ec4...,https://a0.muscache.com/im/pictures/user/69ec4...,,1,1,"['email', 'phone']",t,t,"Seattle, Washington, United States",Green Lake,Other neighborhoods,47.68243,-122.33086,Entire guest suite,Entire home/apt,4,,1 bath,2.0,6.0,"[""45\"" HDTV with Netflix, HBO Max, Hulu, Amazo...",$200.00,3,1125,3.0,7.0,1125.0,1125.0,3.8,1125.0,,t,1,3,24,113,2023-06-25,60,9,1,2016-07-02,2023-06-13,5.0,4.98,4.98,4.98,5.0,5.0,4.83,STR-OPLI-19-002061,f,1,1,0,0,0.71
2,521597880867717063,https://www.airbnb.com/rooms/521597880867717063,20230624173239,2023-06-24,city scrape,Serviced apartment in Seattle · Studio · 1 bath,Centrally located in the Adams neighborhood of...,,https://a0.muscache.com/pictures/prohost-api/H...,48005494,https://www.airbnb.com/users/show/48005494,Zeus,2015-11-02,"San Francisco, CA",We built Zeus Living so you can feel at home w...,within an hour,99%,97%,f,https://a0.muscache.com/im/pictures/user/c4bea...,https://a0.muscache.com/im/pictures/user/c4bea...,Redwood City,750,1988,"['email', 'phone', 'work_email']",t,t,,Adams,Ballard,47.66646,-122.3765,Entire serviced apartment,Entire home/apt,2,,1 bath,,,"[""Dryer \u2013\u00a0In unit"", ""Carbon monoxide...",$81.00,30,731,30.0,30.0,731.0,731.0,30.0,731.0,,t,30,60,90,191,2023-06-24,2,2,0,2023-02-09,2023-04-10,5.0,5.0,5.0,5.0,5.0,3.5,5.0,,f,36,36,0,0,0.44
3,17889172,https://www.airbnb.com/rooms/17889172,20230624173239,2023-06-24,city scrape,Rental unit in Seattle · ★5.0 · 1 bedroom · 1 ...,Mid-century apartment well preserved with rece...,Central Wallingford location. Close to Stone W...,https://a0.muscache.com/pictures/706ab66c-9cda...,66909032,https://www.airbnb.com/users/show/66909032,Randy,2016-04-12,"Seattle, WA",,within an hour,100%,100%,t,https://a0.muscache.com/im/pictures/user/13104...,https://a0.muscache.com/im/pictures/user/13104...,Wallingford,4,4,"['email', 'phone']",t,t,"Seattle, Washington, United States",Wallingford,Other neighborhoods,47.6548,-122.34042,Entire rental unit,Entire home/apt,2,,1 bath,1.0,1.0,"[""Baking sheet"", ""Carbon monoxide alarm"", ""Dis...",$125.00,30,150,30.0,30.0,150.0,150.0,30.0,150.0,,t,0,2,32,307,2023-06-24,28,4,0,2017-05-14,2023-05-06,5.0,4.96,4.96,5.0,5.0,4.96,4.89,STR-OPLI-19-000865,f,4,4,0,0,0.38
4,15917796,https://www.airbnb.com/rooms/15917796,20230624173239,2023-06-24,city scrape,Home in Seattle · ★4.60 · Studio · 3 beds · 1 ...,"The Carriage House is close to Eastlake Union,...",The neighborhood contains a mixture of residen...,https://a0.muscache.com/pictures/d56dc8ce-e594...,38021932,https://www.airbnb.com/users/show/38021932,Rocky,2015-07-09,"Seattle, WA",Award winning interior designer base in the PN...,within an hour,100%,100%,f,https://a0.muscache.com/im/users/38021932/prof...,https://a0.muscache.com/im/users/38021932/prof...,Eastlake,2,5,"['email', 'phone']",t,f,"Seattle, Washington, United States",Montlake,Capitol Hill,47.64017,-122.32271,Entire home,Entire home/apt,4,,1 bath,,3.0,"[""Free driveway parking on premises \u2013 1 s...",$128.00,30,1125,30.0,30.0,1125.0,1125.0,30.0,1125.0,,t,0,11,41,316,2023-06-24,31,2,0,2016-11-11,2022-07-31,4.6,4.77,4.53,4.87,4.77,4.93,4.33,STR-OPLI-19-003016,f,1,1,0,0,0.38


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6636 entries, 0 to 6635
Data columns (total 75 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            6636 non-null   int64  
 1   listing_url                                   6636 non-null   object 
 2   scrape_id                                     6636 non-null   int64  
 3   last_scraped                                  6636 non-null   object 
 4   source                                        6636 non-null   object 
 5   name                                          6636 non-null   object 
 6   description                                   6630 non-null   object 
 7   neighborhood_overview                         4859 non-null   object 
 8   picture_url                                   6636 non-null   object 
 9   host_id                                       6636 non-null   i

In [6]:
#inspect reviews data
display(reviews_df.head())
reviews_df.info()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,6606,5664,2009-07-17,18085,Vivian,"The Urban Cottage is comfortable, beautiful, f..."
1,6606,338761,2011-06-27,434031,Elliott,Joyce was a wonderful host and the urban cotta...
2,6606,467904,2011-08-22,976182,Allegra,Beautiful cottage and warm hospitality from Jo...
3,6606,480017,2011-08-27,997921,Brittney,"Joyce is a wonderful host! She is warm, helpfu..."
4,6606,487278,2011-08-30,206901,Pascal,Joyce's cottage is the perfect Seattle locatio...


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 425070 entries, 0 to 425069
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   listing_id     425070 non-null  int64 
 1   id             425070 non-null  int64 
 2   date           425070 non-null  object
 3   reviewer_id    425070 non-null  int64 
 4   reviewer_name  425070 non-null  object
 5   comments       425005 non-null  object
dtypes: int64(3), object(3)
memory usage: 19.5+ MB


These csv files were downloaded from Inside AirBnB, who scraped these public data from AirBnB's website. Their [data assumptions](http://insideairbnb.com/data-assumptions/) and [data dictionary](https://docs.google.com/spreadsheets/d/1iWCNJcSutYqpULSQHlNyGInUvHg2BoUGoNRIGa6Szc4/edit?usp=sharing) pages help clarify some things. Still, much about the nature of these data is left to the analyst's imagination. Based on the dataset itself and InsideAirBnB's data assumptions page, for this project I assume the following:
- The calendar data refers to the _planned_ availabilty of each listing as of some date prior to the entry. 
    - I assume that these plans were carried out, and no changes were made to the **availability** or **price** of the listing throughout the time period represented in the data set. 
