# Introduction

The **[Inside Airbnb](http://insideairbnb.com/about.html)** project provides data on Airbnb listings by location since 2015. The data is updated nearly every month with previous datasets archived but kept available for the public. For a determined location and date, three datasets are provided. The *listings* dataset contains detailed information on registered listings such as location (i.e. latitude and longitude), host information, price, availability in the next 30-365 days, and reviews ratings. The *reviews* dataset provides the date, reviewers’ identification number, and the review text on all reviews for a specific listing. Finally, the *calendar* dataset provides data on availability of a listing by day as well as price and listing identification number. 

For our project, we are interested in compiling information from all datasets for the San Francisco Bay Area, which would include datasets on Oakland, San Francisco, San Mateo County, and Santa Clara County. For all months available, datasets for each of these locations were downloaded and a final dataset for data exploration was created. The current notebook guides the creation of the dataset step by step. 


# Airbnb *Calendar* Dataset

### Importing the data

First, we need to import all files downloaded from Inside Airbnb into the notebook. The files were organized by location and date, so the import follows in the same order. 

In [1]:
import pandas as pd
import glob
import os
import geocoder

In [2]:
# Creating a list of files' names

Bay_Area_calendar = sorted(glob.glob("**/*calendar.csv"), key=os.path.getmtime)

In [3]:
# Reading the csv files

Bay_Area_calendar_price = [pd.read_csv(file) for file in Bay_Area_calendar]
Bay_Area_calendar_available = [pd.read_csv(file) for file in Bay_Area_calendar]

------------

### *Calendar* Datasets

The *calendar* datasets contain the following columns:

1. listing_id: The unique identifier of the Airbnb listing. 
2. date: The date of interest
3. available: Availability of the listing for the specific date, in which *t* = available and *f* = not available.
4. price: Price of the listing.


Our goal with this dataset is to investigate the amount of bookings in the Bay Area across time as well as the listings' price changes. 

First, it is important to note that the dates listed in this dataset refer to upcoming bookings and not past bookings. For example, the dataset that corresponds to the file 'Oakland_2020_Oct25_calendar.csv' contains availability of listings from October 26th, 2020 and on. Given the multiple files and dates, we will compile the availability of listings by date according to the most updated dataset.

For example, using the dataset Oakland_2020_May18_listings.csv, we will extract the availability of listings in Oakland from May 19th, 2020 until the starting date of the next dataset (i.e. Oakland_2020_Oct25_calendar.csv), which is October 25th, 2020. 

In [4]:
# Adding Datetime format 
for i in range(len(Bay_Area_calendar_price)):
    Bay_Area_calendar_price[i]['Date'] = pd.to_datetime(Bay_Area_calendar_price[i]['date'])
    
for i in range(len(Bay_Area_calendar_available)):
    Bay_Area_calendar_available[i]['Date'] = pd.to_datetime(Bay_Area_calendar_available[i]['date'])

In [5]:
# Removing not used column 
for i in range(len(Bay_Area_calendar_price)):
    Bay_Area_calendar_price[i] = Bay_Area_calendar_price[i].drop(columns = ['available', 'date'])
    
for i in range(len(Bay_Area_calendar_available)):
    Bay_Area_calendar_available[i] = Bay_Area_calendar_available[i].drop(columns = ['price', 'date'])

In [6]:
Bay_Area_calendar_price[0].head()

Unnamed: 0,listing_id,price,Date
0,524299,$250.00,2015-05-04
1,524299,$250.00,2015-05-05
2,524299,$250.00,2015-05-06
3,524299,$250.00,2015-05-07
4,524299,$250.00,2015-05-08


In [7]:
Bay_Area_calendar_available[0].head()

Unnamed: 0,listing_id,available,Date
0,524299,t,2015-05-04
1,524299,t,2015-05-05
2,524299,t,2015-05-06
3,524299,t,2015-05-07
4,524299,t,2015-05-08


In [8]:
# Saving files
#for i in range(len(Bay_Area_calendar_price)):
#    Bay_Area_calendar_price[i].to_csv('Bay_Area_calendar_price' + [i] + '.csv', index=False)
    
#for i in range(len(Bay_Area_calendar_available)):
#    Bay_Area_calendar_available[i].to_csv('Bay_Area_calendar_available' + [i] + '.csv', index=False)

# Not run

### Price Data by Listings Across Time

In [9]:
# Pivoting
for i in range(len(Bay_Area_calendar_price)):
    Bay_Area_calendar_price[i] = (Bay_Area_calendar_price[i].groupby(['listing_id','Date'])
   .price
   .first()
   .unstack())

In [10]:
Bay_Area_calendar_price[0].head()

Date,2015-05-04,2015-05-05,2015-05-06,2015-05-07,2015-05-08,2015-05-09,2015-05-10,2015-05-11,2015-05-12,2015-05-13,...,2016-04-24,2016-04-25,2016-04-26,2016-04-27,2016-04-28,2016-04-29,2016-04-30,2016-05-01,2016-05-02,2016-05-03
listing_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
958,,,,,,,,,,,...,$170.00,$170.00,$170.00,$170.00,$170.00,$170.00,$170.00,$170.00,$170.00,
3850,$275.00,$275.00,$275.00,$275.00,$275.00,$275.00,$275.00,$275.00,$275.00,$275.00,...,$69.00,$69.00,$69.00,$69.00,$69.00,$69.00,$69.00,$69.00,$69.00,
5193,,,,,,,,,$105.00,,...,$165.00,$165.00,$165.00,$165.00,$165.00,$165.00,$165.00,$165.00,$165.00,
5841,,,,,,,,,,,...,$180.00,$180.00,$180.00,$180.00,$180.00,$165.00,$165.00,$180.00,$180.00,
5858,$210.00,$210.00,,,,,$210.00,$210.00,$210.00,$210.00,...,$210.00,$210.00,$210.00,$210.00,$210.00,$210.00,$210.00,$210.00,$210.00,


In [11]:
# Getting dates dfs should be sliced by

for i in range(len(Bay_Area_calendar_price)):
    print([i], Bay_Area_calendar[i], 'Start:', Bay_Area_calendar_price[i].columns[0])

[0] San_Francisco\San_Francisco_2015_May04_calendar.csv Start: 2015-05-04 00:00:00
[1] Oakland\Oakland_2015_Jun22_calendar.csv Start: 2015-06-29 00:00:00
[2] San_Francisco\San_Francisco_2015_Sep02_calendar.csv Start: 2015-09-01 00:00:00
[3] San_Francisco\San_Francisco_2015_Nov01_calendar.csv Start: 2015-11-01 00:00:00
[4] San_Francisco\San_Francisco_2015_Dec02_calendar.csv Start: 2015-12-01 00:00:00
[5] San_Francisco\San_Francisco_2016_Feb02_calendar.csv Start: 2016-02-02 00:00:00
[6] San_Francisco\San_Francisco_2016_Apr03_calendar.csv Start: 2016-04-02 00:00:00
[7] San_Francisco\San_Francisco_2016_May02_calendar.csv Start: 2016-05-01 00:00:00
[8] Oakland\Oakland_2016_May04_calendar.csv Start: 2016-05-04 00:00:00
[9] San_Francisco\San_Francisco_2016_Jun02_calendar.csv Start: 2016-06-01 00:00:00
[10] San_Francisco\San_Francisco_2016_Jul02_calendar.csv Start: 2016-07-02 00:00:00
[11] San_Francisco\San_Francisco_2016_Aug02_calendar.csv Start: 2016-08-02 00:00:00
[12] San_Francisco\San_Fra

In [12]:
for i in range(len(Bay_Area_calendar_price)):
    print([i], Bay_Area_calendar[i], 'End:', Bay_Area_calendar_price[i].columns[-1])

[0] San_Francisco\San_Francisco_2015_May04_calendar.csv End: 2016-05-03 00:00:00
[1] Oakland\Oakland_2015_Jun22_calendar.csv End: 2016-07-09 00:00:00
[2] San_Francisco\San_Francisco_2015_Sep02_calendar.csv End: 2016-08-31 00:00:00
[3] San_Francisco\San_Francisco_2015_Nov01_calendar.csv End: 2016-10-30 00:00:00
[4] San_Francisco\San_Francisco_2015_Dec02_calendar.csv End: 2016-11-30 00:00:00
[5] San_Francisco\San_Francisco_2016_Feb02_calendar.csv End: 2017-01-31 00:00:00
[6] San_Francisco\San_Francisco_2016_Apr03_calendar.csv End: 2017-04-02 00:00:00
[7] San_Francisco\San_Francisco_2016_May02_calendar.csv End: 2017-05-01 00:00:00
[8] Oakland\Oakland_2016_May04_calendar.csv End: 2017-05-03 00:00:00
[9] San_Francisco\San_Francisco_2016_Jun02_calendar.csv End: 2017-06-01 00:00:00
[10] San_Francisco\San_Francisco_2016_Jul02_calendar.csv End: 2017-07-01 00:00:00
[11] San_Francisco\San_Francisco_2016_Aug02_calendar.csv End: 2017-08-01 00:00:00
[12] San_Francisco\San_Francisco_2016_Sep02_calend

In [13]:
# Downloaded datasets varied by location and dates, so need to first concat by location then create final dataset
# Creating final datasets by city 

In [14]:
# Oakland final dataset
Bay_Area_bookings_Oakland = pd.concat([Bay_Area_calendar_price[1].loc[:, :'2016-05-03 00:00:00'],
                                       Bay_Area_calendar_price[8].loc[:, :'2017-05-03 00:00:00'],
                                       Bay_Area_calendar_price[35].loc[:, :'2018-05-16 00:00:00'],
                                       Bay_Area_calendar_price[37].loc[:, :'2018-07-15 00:00:00'],
                                       Bay_Area_calendar_price[40].loc[:, :'2018-08-15 00:00:00'],
                                       Bay_Area_calendar_price[42].loc[:, :'2018-09-12 00:00:00'],
                                       Bay_Area_calendar_price[46].loc[:, :'2018-10-10 00:00:00'],
                                       Bay_Area_calendar_price[48].loc[:, :'2018-11-14 00:00:00'],
                                       Bay_Area_calendar_price[51].loc[:, :'2018-12-11 00:00:00'],
                                       Bay_Area_calendar_price[55].loc[:, :'2019-01-16 00:00:00'],
                                       Bay_Area_calendar_price[58].loc[:, :'2019-02-08 00:00:00'],
                                       Bay_Area_calendar_price[61].loc[:, :'2019-03-10 00:00:00'],
                                       Bay_Area_calendar_price[64].loc[:, :'2019-04-13 00:00:00'],
                                       Bay_Area_calendar_price[68].loc[:, :'2019-05-17 00:00:00'],
                                       Bay_Area_calendar_price[70].loc[:, :'2019-06-12 00:00:00'],
                                       Bay_Area_calendar_price[74].loc[:, :'2019-07-12 00:00:00'],
                                       Bay_Area_calendar_price[77].loc[:, :'2019-08-13 00:00:00'],
                                       Bay_Area_calendar_price[81].loc[:, :'2019-09-19 00:00:00'],
                                       Bay_Area_calendar_price[83].loc[:, :'2019-10-17 00:00:00'],
                                       Bay_Area_calendar_price[86].loc[:, :'2019-11-19 00:00:00'],
                                       Bay_Area_calendar_price[88].loc[:, :'2019-12-14 00:00:00'],
                                       Bay_Area_calendar_price[92].loc[:, :'2020-01-13 00:00:00'],
                                       Bay_Area_calendar_price[95].loc[:, :'2020-02-21 00:00:00'],
                                       Bay_Area_calendar_price[97].loc[:, :'2020-03-16 00:00:00'],
                                       Bay_Area_calendar_price[100].loc[:, :'2020-04-20 00:00:00'],
                                       Bay_Area_calendar_price[103].loc[:, :'2020-05-17 00:00:00'],
                                       Bay_Area_calendar_price[106].loc[:, :'2020-06-16 00:00:00'],
                                       Bay_Area_calendar_price[110].loc[:, :'2020-10-24 00:00:00'],
                                       Bay_Area_calendar_price[116].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [15]:
Bay_Area_bookings_Oakland.shape

(8003, 1668)

In [16]:
# San_Francisco final dataset
Bay_Area_bookings_San_Francisco = pd.concat([Bay_Area_calendar_price[0].loc[:, :'2015-08-31 00:00:00'],
                                       Bay_Area_calendar_price[2].loc[:, :'2015-10-31 00:00:00'],
                                       Bay_Area_calendar_price[3].loc[:, :'2015-11-30 00:00:00'],
                                       Bay_Area_calendar_price[4].loc[:, :'2016-02-01 00:00:00'],
                                       Bay_Area_calendar_price[5].loc[:, :'2016-04-01 00:00:00'],
                                       Bay_Area_calendar_price[6].loc[:, :'2016-04-30 00:00:00'],
                                       Bay_Area_calendar_price[7].loc[:, :'2016-05-31 00:00:00'],
                                       Bay_Area_calendar_price[9].loc[:, :'2016-07-01 00:00:00'],
                                       Bay_Area_calendar_price[10].loc[:, :'2016-08-01 00:00:00'],
                                       Bay_Area_calendar_price[11].loc[:, :'2016-09-01 00:00:00'],
                                       Bay_Area_calendar_price[12].loc[:, :'2016-09-30 00:00:00'],
                                       Bay_Area_calendar_price[13].loc[:, :'2016-10-31 00:00:00'],
                                       Bay_Area_calendar_price[14].loc[:, :'2016-12-02 00:00:00'],
                                       Bay_Area_calendar_price[15].loc[:, :'2016-12-31 00:00:00'],
                                       Bay_Area_calendar_price[16].loc[:, :'2017-02-01 00:00:00'],
                                       Bay_Area_calendar_price[17].loc[:, :'2017-02-28 00:00:00'],
                                       Bay_Area_calendar_price[18].loc[:, :'2017-03-31 00:00:00'],
                                       Bay_Area_calendar_price[19].loc[:, :'2017-05-01 00:00:00'],
                                       Bay_Area_calendar_price[20].loc[:, :'2017-05-30 00:00:00'],
                                       Bay_Area_calendar_price[21].loc[:, :'2017-07-01 00:00:00'],
                                       Bay_Area_calendar_price[22].loc[:, :'2017-07-31 00:00:00'],
                                       Bay_Area_calendar_price[23].loc[:, :'2017-09-01 00:00:00'],
                                       Bay_Area_calendar_price[24].loc[:, :'2017-10-01 00:00:00'],
                                       Bay_Area_calendar_price[25].loc[:, :'2017-10-31 00:00:00'],
                                       Bay_Area_calendar_price[26].loc[:, :'2017-11-07 00:00:00'],
                                       Bay_Area_calendar_price[27].loc[:, :'2017-11-30 00:00:00'],
                                       Bay_Area_calendar_price[28].loc[:, :'2017-12-05 00:00:00'],
                                       Bay_Area_calendar_price[29].loc[:, :'2018-01-09 00:00:00'],
                                       Bay_Area_calendar_price[30].loc[:, :'2018-01-16 00:00:00'],
                                       Bay_Area_calendar_price[31].loc[:, :'2018-02-01 00:00:00'],
                                       Bay_Area_calendar_price[32].loc[:, :'2018-03-02 00:00:00'],
                                       Bay_Area_calendar_price[33].loc[:, :'2018-04-05 00:00:00'],
                                       Bay_Area_calendar_price[34].loc[:, :'2018-05-08 00:00:00'],
                                       Bay_Area_calendar_price[36].loc[:, :'2018-07-04 00:00:00'],
                                       Bay_Area_calendar_price[38].loc[:, :'2018-08-05 00:00:00'],
                                       Bay_Area_calendar_price[41].loc[:, :'2018-09-07 00:00:00'],
                                       Bay_Area_calendar_price[44].loc[:, :'2018-10-02 00:00:00'],
                                       Bay_Area_calendar_price[47].loc[:, :'2018-11-02 00:00:00'],
                                       Bay_Area_calendar_price[50].loc[:, :'2018-12-05 00:00:00'],
                                       Bay_Area_calendar_price[53].loc[:, :'2019-01-08 00:00:00'],
                                       Bay_Area_calendar_price[56].loc[:, :'2019-01-31 00:00:00'],
                                       Bay_Area_calendar_price[59].loc[:, :'2019-03-05 00:00:00'],
                                       Bay_Area_calendar_price[62].loc[:, :'2019-04-02 00:00:00'],
                                       Bay_Area_calendar_price[65].loc[:, :'2019-05-02 00:00:00'],
                                       Bay_Area_calendar_price[66].loc[:, :'2019-06-01 00:00:00'],
                                       Bay_Area_calendar_price[71].loc[:, :'2019-07-07 00:00:00'],
                                       Bay_Area_calendar_price[75].loc[:, :'2019-08-05 00:00:00'],
                                       Bay_Area_calendar_price[78].loc[:, :'2019-09-11 00:00:00'],
                                       Bay_Area_calendar_price[80].loc[:, :'2019-10-13 00:00:00'],
                                       Bay_Area_calendar_price[84].loc[:, :'2019-10-31 00:00:00'],
                                       Bay_Area_calendar_price[89].loc[:, :'2019-12-03 00:00:00'],
                                       Bay_Area_calendar_price[90].loc[:, :'2020-01-03 00:00:00'],
                                       Bay_Area_calendar_price[93].loc[:, :'2020-02-11 00:00:00'],
                                       Bay_Area_calendar_price[96].loc[:, :'2020-03-12 00:00:00'],
                                       Bay_Area_calendar_price[99].loc[:, :'2020-04-06 00:00:00'],
                                       Bay_Area_calendar_price[102].loc[:, :'2020-05-05 00:00:00'],
                                       Bay_Area_calendar_price[105].loc[:, :'2020-06-07 00:00:00'],
                                       Bay_Area_calendar_price[108].loc[:, :'2020-07-06 00:00:00'],
                                       Bay_Area_calendar_price[112].loc[:, :'2020-08-14 00:00:00'],
                                       Bay_Area_calendar_price[113].loc[:, :'2020-09-06 00:00:00'],
                                       Bay_Area_calendar_price[115].loc[:, :'2020-10-04 00:00:00'],
                                       Bay_Area_calendar_price[118].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [17]:
Bay_Area_bookings_San_Francisco.shape

(34203, 2068)

In [18]:
# San_Mateo final dataset
Bay_Area_bookings_San_Mateo = pd.concat([Bay_Area_calendar_price[73].loc[:, :'2020-06-14 00:00:00'],
                                       Bay_Area_calendar_price[111].loc[:, :'2020-07-14 00:00:00'],
                                       Bay_Area_calendar_price[114].loc[:, :'2020-10-24 00:00:00'],
                                       Bay_Area_calendar_price[117].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [19]:
Bay_Area_bookings_San_Mateo.shape

(3472, 565)

In [20]:
# Santa_Clara final dataset
Bay_Area_bookings_Santa_Clara = pd.concat([Bay_Area_calendar_price[39].loc[:, :'2018-08-13 00:00:00'],
                                       Bay_Area_calendar_price[43].loc[:, :'2018-09-09 00:00:00'],
                                       Bay_Area_calendar_price[45].loc[:, :'2018-10-17 00:00:00'],
                                       Bay_Area_calendar_price[49].loc[:, :'2018-11-16 00:00:00'],
                                       Bay_Area_calendar_price[52].loc[:, :'2018-12-07 00:00:00'],
                                       Bay_Area_calendar_price[54].loc[:, :'2019-01-13 00:00:00'],
                                       Bay_Area_calendar_price[57].loc[:, :'2019-02-04 00:00:00'],
                                       Bay_Area_calendar_price[60].loc[:, :'2019-03-06 00:00:00'],
                                       Bay_Area_calendar_price[63].loc[:, :'2019-04-08 00:00:00'],
                                       Bay_Area_calendar_price[67].loc[:, :'2019-05-12 00:00:00'],
                                       Bay_Area_calendar_price[69].loc[:, :'2019-06-05 00:00:00'],
                                       Bay_Area_calendar_price[72].loc[:, :'2019-07-08 00:00:00'],
                                       Bay_Area_calendar_price[76].loc[:, :'2019-08-16 00:00:00'],
                                       Bay_Area_calendar_price[79].loc[:, :'2019-09-15 00:00:00'],
                                       Bay_Area_calendar_price[82].loc[:, :'2019-10-14 00:00:00'],
                                       Bay_Area_calendar_price[85].loc[:, :'2019-11-06 00:00:00'],
                                       Bay_Area_calendar_price[87].loc[:, :'2019-12-08 00:00:00'],
                                       Bay_Area_calendar_price[91].loc[:, :'2020-01-08 00:00:00'],
                                       Bay_Area_calendar_price[94].loc[:, :'2020-02-15 00:00:00'],
                                       Bay_Area_calendar_price[98].loc[:, :'2020-03-16 00:00:00'],
                                       Bay_Area_calendar_price[101].loc[:, :'2020-04-21 00:00:00'],
                                       Bay_Area_calendar_price[104].loc[:, :'2020-05-29 00:00:00'],
                                       Bay_Area_calendar_price[107].loc[:, :'2020-06-11 00:00:00'],
                                       Bay_Area_calendar_price[109].loc[:, :'2020-10-24 00:00:00'],
                                       Bay_Area_calendar_price[119].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [21]:
Bay_Area_bookings_Santa_Clara.shape

(15623, 909)

In [22]:
# Melting dataset 

Bay_Area_bookings_Oakland_final = Bay_Area_bookings_Oakland.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

Bay_Area_bookings_San_Francisco_final = Bay_Area_bookings_San_Francisco.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

Bay_Area_bookings_San_Mateo_final = Bay_Area_bookings_San_Mateo.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

Bay_Area_bookings_Santa_Clara_final = Bay_Area_bookings_Santa_Clara.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

In [23]:
Bay_Area_bookings_Santa_Clara_final.head()

Unnamed: 0,listing_id,Date,Value
0,4952,2018-07-07,
1,11464,2018-07-07,$75.00
2,11466,2018-07-07,$128.00
3,17884,2018-07-07,
4,19181,2018-07-07,


To complete the dataset, we can add the zipcodes from the *Final Listings* dataset.

### Availability Data by Listings Across Time

In [24]:
Bay_Area_calendar_available[0].columns

Index(['listing_id', 'available', 'Date'], dtype='object')

In [25]:
# Pivoting
for i in range(len(Bay_Area_calendar_available)):
    Bay_Area_calendar_available[i] = (Bay_Area_calendar_available[i].groupby(['listing_id','Date'])
   .available
   .first()
   .unstack())

In [45]:
Bay_Area_calendar_available[1].head()

Date,2015-06-29,2015-06-30,2015-07-01,2015-07-02,2015-07-03,2015-07-04,2015-07-05,2015-07-06,2015-07-07,2015-07-08,...,2016-06-30,2016-07-01,2016-07-02,2016-07-03,2016-07-04,2016-07-05,2016-07-06,2016-07-07,2016-07-08,2016-07-09
listing_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3083,f,f,f,f,f,f,f,f,f,f,...,,,,,,,,,,
5739,f,f,f,t,t,t,f,f,f,f,...,,,,,,,,,,
11022,f,f,f,f,f,f,f,f,f,f,...,,,,,,,,,,
13291,,f,f,f,f,f,f,f,f,f,...,,,,,,,,,,
24387,,,f,f,f,f,f,f,f,f,...,,,,,,,,,,


In [27]:
# Downloaded datasets varied by location and dates, so need to first concat by location then create final dataset
# Creating final datasets by city 

In [28]:
# Oakland final dataset
Bay_Area_available_Oakland = pd.concat([Bay_Area_calendar_available[1].loc[:, :'2016-05-03 00:00:00'],
                                       Bay_Area_calendar_available[8].loc[:, :'2017-05-03 00:00:00'],
                                       Bay_Area_calendar_available[35].loc[:, :'2018-05-16 00:00:00'],
                                       Bay_Area_calendar_available[37].loc[:, :'2018-07-15 00:00:00'],
                                       Bay_Area_calendar_available[40].loc[:, :'2018-08-15 00:00:00'],
                                       Bay_Area_calendar_available[42].loc[:, :'2018-09-12 00:00:00'],
                                       Bay_Area_calendar_available[46].loc[:, :'2018-10-10 00:00:00'],
                                       Bay_Area_calendar_available[48].loc[:, :'2018-11-14 00:00:00'],
                                       Bay_Area_calendar_available[51].loc[:, :'2018-12-11 00:00:00'],
                                       Bay_Area_calendar_available[55].loc[:, :'2019-01-16 00:00:00'],
                                       Bay_Area_calendar_available[58].loc[:, :'2019-02-08 00:00:00'],
                                       Bay_Area_calendar_available[61].loc[:, :'2019-03-10 00:00:00'],
                                       Bay_Area_calendar_available[64].loc[:, :'2019-04-13 00:00:00'],
                                       Bay_Area_calendar_available[68].loc[:, :'2019-05-17 00:00:00'],
                                       Bay_Area_calendar_available[70].loc[:, :'2019-06-12 00:00:00'],
                                       Bay_Area_calendar_available[74].loc[:, :'2019-07-12 00:00:00'],
                                       Bay_Area_calendar_available[77].loc[:, :'2019-08-13 00:00:00'],
                                       Bay_Area_calendar_available[81].loc[:, :'2019-09-19 00:00:00'],
                                       Bay_Area_calendar_available[83].loc[:, :'2019-10-17 00:00:00'],
                                       Bay_Area_calendar_available[86].loc[:, :'2019-11-19 00:00:00'],
                                       Bay_Area_calendar_available[88].loc[:, :'2019-12-14 00:00:00'],
                                       Bay_Area_calendar_available[92].loc[:, :'2020-01-13 00:00:00'],
                                       Bay_Area_calendar_available[95].loc[:, :'2020-02-21 00:00:00'],
                                       Bay_Area_calendar_available[97].loc[:, :'2020-03-16 00:00:00'],
                                       Bay_Area_calendar_available[100].loc[:, :'2020-04-20 00:00:00'],
                                       Bay_Area_calendar_available[103].loc[:, :'2020-05-17 00:00:00'],
                                       Bay_Area_calendar_available[106].loc[:, :'2020-06-16 00:00:00'],
                                       Bay_Area_calendar_available[110].loc[:, :'2020-10-24 00:00:00'],
                                       Bay_Area_calendar_available[116].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [47]:
Bay_Area_available_Oakland.head()

Date,2015-06-29,2015-06-30,2015-07-01,2015-07-02,2015-07-03,2015-07-04,2015-07-05,2015-07-06,2015-07-07,2015-07-08,...,2020-12-22,2020-12-23,2020-12-24,2020-12-25,2020-12-26,2020-12-27,2020-12-28,2020-12-29,2020-12-30,2020-12-31
listing_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3083,f,f,f,f,f,f,f,f,f,f,...,t,t,t,t,t,t,t,t,t,t
3264,,,,,,,,,,,...,,,,,,,,,,
5739,f,f,f,t,t,t,f,f,f,f,...,t,t,t,t,t,t,t,t,t,t
6201,,,,,,,,,,,...,,,,,,,,,,
8478,,,,,,,,,,,...,,,,,,,,,,


In [30]:
# San_Francisco final dataset
Bay_Area_available_San_Francisco = pd.concat([Bay_Area_calendar_available[0].loc[:, :'2015-08-31 00:00:00'],
                                       Bay_Area_calendar_available[2].loc[:, :'2015-10-31 00:00:00'],
                                       Bay_Area_calendar_available[3].loc[:, :'2015-11-30 00:00:00'],
                                       Bay_Area_calendar_available[4].loc[:, :'2016-02-01 00:00:00'],
                                       Bay_Area_calendar_available[5].loc[:, :'2016-04-01 00:00:00'],
                                       Bay_Area_calendar_available[6].loc[:, :'2016-04-30 00:00:00'],
                                       Bay_Area_calendar_available[7].loc[:, :'2016-05-31 00:00:00'],
                                       Bay_Area_calendar_available[9].loc[:, :'2016-07-01 00:00:00'],
                                       Bay_Area_calendar_available[10].loc[:, :'2016-08-01 00:00:00'],
                                       Bay_Area_calendar_available[11].loc[:, :'2016-09-01 00:00:00'],
                                       Bay_Area_calendar_available[12].loc[:, :'2016-09-30 00:00:00'],
                                       Bay_Area_calendar_available[13].loc[:, :'2016-10-31 00:00:00'],
                                       Bay_Area_calendar_available[14].loc[:, :'2016-12-02 00:00:00'],
                                       Bay_Area_calendar_available[15].loc[:, :'2016-12-31 00:00:00'],
                                       Bay_Area_calendar_available[16].loc[:, :'2017-02-01 00:00:00'],
                                       Bay_Area_calendar_available[17].loc[:, :'2017-02-28 00:00:00'],
                                       Bay_Area_calendar_available[18].loc[:, :'2017-03-31 00:00:00'],
                                       Bay_Area_calendar_available[19].loc[:, :'2017-05-01 00:00:00'],
                                       Bay_Area_calendar_available[20].loc[:, :'2017-05-30 00:00:00'],
                                       Bay_Area_calendar_available[21].loc[:, :'2017-07-01 00:00:00'],
                                       Bay_Area_calendar_available[22].loc[:, :'2017-07-31 00:00:00'],
                                       Bay_Area_calendar_available[23].loc[:, :'2017-09-01 00:00:00'],
                                       Bay_Area_calendar_available[24].loc[:, :'2017-10-01 00:00:00'],
                                       Bay_Area_calendar_available[25].loc[:, :'2017-10-31 00:00:00'],
                                       Bay_Area_calendar_available[26].loc[:, :'2017-11-07 00:00:00'],
                                       Bay_Area_calendar_available[27].loc[:, :'2017-11-30 00:00:00'],
                                       Bay_Area_calendar_available[28].loc[:, :'2017-12-05 00:00:00'],
                                       Bay_Area_calendar_available[29].loc[:, :'2018-01-09 00:00:00'],
                                       Bay_Area_calendar_available[30].loc[:, :'2018-01-16 00:00:00'],
                                       Bay_Area_calendar_available[31].loc[:, :'2018-02-01 00:00:00'],
                                       Bay_Area_calendar_available[32].loc[:, :'2018-03-02 00:00:00'],
                                       Bay_Area_calendar_available[33].loc[:, :'2018-04-05 00:00:00'],
                                       Bay_Area_calendar_available[34].loc[:, :'2018-05-08 00:00:00'],
                                       Bay_Area_calendar_available[36].loc[:, :'2018-07-04 00:00:00'],
                                       Bay_Area_calendar_available[38].loc[:, :'2018-08-05 00:00:00'],
                                       Bay_Area_calendar_available[41].loc[:, :'2018-09-07 00:00:00'],
                                       Bay_Area_calendar_available[44].loc[:, :'2018-10-02 00:00:00'],
                                       Bay_Area_calendar_available[47].loc[:, :'2018-11-02 00:00:00'],
                                       Bay_Area_calendar_available[50].loc[:, :'2018-12-05 00:00:00'],
                                       Bay_Area_calendar_available[53].loc[:, :'2019-01-08 00:00:00'],
                                       Bay_Area_calendar_available[56].loc[:, :'2019-01-31 00:00:00'],
                                       Bay_Area_calendar_available[59].loc[:, :'2019-03-05 00:00:00'],
                                       Bay_Area_calendar_available[62].loc[:, :'2019-04-02 00:00:00'],
                                       Bay_Area_calendar_available[65].loc[:, :'2019-05-02 00:00:00'],
                                       Bay_Area_calendar_available[66].loc[:, :'2019-06-01 00:00:00'],
                                       Bay_Area_calendar_available[71].loc[:, :'2019-07-07 00:00:00'],
                                       Bay_Area_calendar_available[75].loc[:, :'2019-08-05 00:00:00'],
                                       Bay_Area_calendar_available[78].loc[:, :'2019-09-11 00:00:00'],
                                       Bay_Area_calendar_available[80].loc[:, :'2019-10-13 00:00:00'],
                                       Bay_Area_calendar_available[84].loc[:, :'2019-10-31 00:00:00'],
                                       Bay_Area_calendar_available[89].loc[:, :'2019-12-03 00:00:00'],
                                       Bay_Area_calendar_available[90].loc[:, :'2020-01-03 00:00:00'],
                                       Bay_Area_calendar_available[93].loc[:, :'2020-02-11 00:00:00'],
                                       Bay_Area_calendar_available[96].loc[:, :'2020-03-12 00:00:00'],
                                       Bay_Area_calendar_available[99].loc[:, :'2020-04-06 00:00:00'],
                                       Bay_Area_calendar_available[102].loc[:, :'2020-05-05 00:00:00'],
                                       Bay_Area_calendar_available[105].loc[:, :'2020-06-07 00:00:00'],
                                       Bay_Area_calendar_available[108].loc[:, :'2020-07-06 00:00:00'],
                                       Bay_Area_calendar_available[112].loc[:, :'2020-08-14 00:00:00'],
                                       Bay_Area_calendar_available[113].loc[:, :'2020-09-06 00:00:00'],
                                       Bay_Area_calendar_available[115].loc[:, :'2020-10-04 00:00:00'],
                                       Bay_Area_calendar_available[118].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [31]:
Bay_Area_available_San_Francisco.shape

(34203, 2068)

In [32]:
# San_Mateo final dataset
Bay_Area_available_San_Mateo = pd.concat([Bay_Area_calendar_available[73].loc[:, :'2020-06-14 00:00:00'],
                                       Bay_Area_calendar_available[111].loc[:, :'2020-07-14 00:00:00'],
                                       Bay_Area_calendar_available[114].loc[:, :'2020-10-24 00:00:00'],
                                       Bay_Area_calendar_available[117].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [33]:
Bay_Area_available_San_Mateo.shape

(3472, 565)

In [34]:
# Santa_Clara final dataset
Bay_Area_available_Santa_Clara = pd.concat([Bay_Area_calendar_available[39].loc[:, :'2018-08-13 00:00:00'],
                                       Bay_Area_calendar_available[43].loc[:, :'2018-09-09 00:00:00'],
                                       Bay_Area_calendar_available[45].loc[:, :'2018-10-17 00:00:00'],
                                       Bay_Area_calendar_available[49].loc[:, :'2018-11-16 00:00:00'],
                                       Bay_Area_calendar_available[52].loc[:, :'2018-12-07 00:00:00'],
                                       Bay_Area_calendar_available[54].loc[:, :'2019-01-13 00:00:00'],
                                       Bay_Area_calendar_available[57].loc[:, :'2019-02-04 00:00:00'],
                                       Bay_Area_calendar_available[60].loc[:, :'2019-03-06 00:00:00'],
                                       Bay_Area_calendar_available[63].loc[:, :'2019-04-08 00:00:00'],
                                       Bay_Area_calendar_available[67].loc[:, :'2019-05-12 00:00:00'],
                                       Bay_Area_calendar_available[69].loc[:, :'2019-06-05 00:00:00'],
                                       Bay_Area_calendar_available[72].loc[:, :'2019-07-08 00:00:00'],
                                       Bay_Area_calendar_available[76].loc[:, :'2019-08-16 00:00:00'],
                                       Bay_Area_calendar_available[79].loc[:, :'2019-09-15 00:00:00'],
                                       Bay_Area_calendar_available[82].loc[:, :'2019-10-14 00:00:00'],
                                       Bay_Area_calendar_available[85].loc[:, :'2019-11-06 00:00:00'],
                                       Bay_Area_calendar_available[87].loc[:, :'2019-12-08 00:00:00'],
                                       Bay_Area_calendar_available[91].loc[:, :'2020-01-08 00:00:00'],
                                       Bay_Area_calendar_available[94].loc[:, :'2020-02-15 00:00:00'],
                                       Bay_Area_calendar_available[98].loc[:, :'2020-03-16 00:00:00'],
                                       Bay_Area_calendar_available[101].loc[:, :'2020-04-21 00:00:00'],
                                       Bay_Area_calendar_available[104].loc[:, :'2020-05-29 00:00:00'],
                                       Bay_Area_calendar_available[107].loc[:, :'2020-06-11 00:00:00'],
                                       Bay_Area_calendar_available[109].loc[:, :'2020-10-24 00:00:00'],
                                       Bay_Area_calendar_available[119].loc[:, :'2020-12-31 00:00:00']], axis=1, sort=False)

In [35]:
Bay_Area_available_Santa_Clara.shape

(15623, 909)

In [49]:
# Melting dataset 

Bay_Area_available_Oakland_final = Bay_Area_available_Oakland.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

Bay_Area_available_San_Francisco_final = Bay_Area_available_San_Francisco.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

Bay_Area_available_San_Mateo_final = Bay_Area_available_San_Mateo.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

Bay_Area_available_Santa_Clara_final = Bay_Area_available_Santa_Clara.reset_index().melt(id_vars=['listing_id'], 
        var_name="Date", 
        value_name="Value")

In [50]:
Bay_Area_available_Oakland_final.head()

Unnamed: 0,listing_id,Date,Value
0,3083,2015-06-29,f
1,3264,2015-06-29,
2,5739,2015-06-29,f
3,6201,2015-06-29,
4,8478,2015-06-29,


To complete the dataset, we can add the zipcodes from the *Final Listings* dataset.