<h2> Airbnb California versus Airbnb Texas</h2>

Airbnb, Inc is an online marketplace that connects people who want to rent out their homes with people who are looking for accommodations in that locale. The company does not own properties but acts as a broker, receiving commissions from each booking. The company is headquartered in Sanfrancisco, United States with offices in 30 other locations.


The company was founded by Joe Gebbia, Brian Chesky, and Nathan Blecharczyk in 2008. The idea kicked off when Gebbia and Chesky used their own place as a bed and breakfast to make a few extra money to pay rent. They saw potential market for their idea with the big design conferences coming to the San Francisco area and a city full of sold-out hotels at that time. They developed a website called airbedandbreakfast.com which is known today as Airbnb.


 <p>Using the datasets describing the calendar, listing and review activities of Airbnb homes in Austin and 
    San Francisco, we will analyse the following questions in two phases:</p>

<h3>Phase one</h3>
 <ul>
    <li>Which cities have more expensive homestay accross different times of the year(Austin or San francisco)?</li>
    <li>What are the factors that contribute to the price of homestay in Austin and San francisco?</li>  
    <li>What property type is the most popular in Austin and San Francisco?</li>
    <li>What are the factors responsible for good ratings among customers in Austin and San Francisco?</li>
 </ul>
 
<h3>Phase two</h3>
  <ul>
     <li> What reviews can be taken as positive or negative from property renters?</li>
  </ul>  
 

# Prepare Data

In [1]:
# import necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
# import data
sanfrancisco_calendar = pd.read_csv('./data/sanfrancisco_calendar.csv')
sanfrancisco_listings = pd.read_csv('./data/sanfrancisco_listings.csv', low_memory=False)
austin_calendar = pd.read_csv('./data/austin_calendar.csv')
austin_listings = pd.read_csv('./data/austin_listings.csv', low_memory=False)

# Understand Data

In [3]:
# check the number of rows, columns and features of the San Francisco's calendar dataset
sanfrancisco_calendar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65534 entries, 0 to 65533
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   listing_id      65534 non-null  int64 
 1   date            65534 non-null  object
 2   available       65534 non-null  object
 3   price           65534 non-null  object
 4   adjusted_price  65534 non-null  object
 5   minimum_nights  65534 non-null  int64 
 6   maximum_nights  65534 non-null  int64 
dtypes: int64(3), object(4)
memory usage: 3.5+ MB


In [4]:
#List of sanfrancisco_calendar column headers
list(sanfrancisco_calendar.columns.values)

['listing_id',
 'date',
 'available',
 'price',
 'adjusted_price',
 'minimum_nights',
 'maximum_nights']

In [5]:
# check the number of rows, columns and features of the Austin's calendar dataset
austin_calendar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65534 entries, 0 to 65533
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   listing_id      65534 non-null  int64 
 1   date            65534 non-null  object
 2   available       65534 non-null  object
 3   price           65534 non-null  object
 4   adjusted_price  65534 non-null  object
 5   minimum_nights  65534 non-null  int64 
 6   maximum_nights  65534 non-null  int64 
dtypes: int64(3), object(4)
memory usage: 3.5+ MB


In [6]:
#List of austin_calendar column headers
list(austin_calendar.columns.values)

['listing_id',
 'date',
 'available',
 'price',
 'adjusted_price',
 'minimum_nights',
 'maximum_nights']

In [7]:
# check the number of rows, columns and features of the San Francisco's listings dataset
sanfrancisco_listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8138 entries, 0 to 8137
Columns: 106 entries, id to reviews_per_month
dtypes: float64(20), int64(23), object(63)
memory usage: 6.6+ MB


In [8]:
#List of sanfrancisco_listings column headers
list(sanfrancisco_listings.columns.values)

['id',
 'listing_url',
 'scrape_id',
 'last_scraped',
 'name',
 'summary',
 'space',
 'description',
 'experiences_offered',
 'neighborhood_overview',
 'notes',
 'transit',
 'access',
 'interaction',
 'house_rules',
 'thumbnail_url',
 'medium_url',
 'picture_url',
 'xl_picture_url',
 'host_id',
 'host_url',
 'host_name',
 'host_since',
 'host_location',
 'host_about',
 'host_response_time',
 'host_response_rate',
 'host_acceptance_rate',
 'host_is_superhost',
 'host_thumbnail_url',
 'host_picture_url',
 'host_neighbourhood',
 'host_listings_count',
 'host_total_listings_count',
 'host_verifications',
 'host_has_profile_pic',
 'host_identity_verified',
 'street',
 'neighbourhood',
 'neighbourhood_cleansed',
 'neighbourhood_group_cleansed',
 'city',
 'state',
 'zipcode',
 'market',
 'smart_location',
 'country_code',
 'country',
 'latitude',
 'longitude',
 'is_location_exact',
 'property_type',
 'room_type',
 'accommodates',
 'bathrooms',
 'bedrooms',
 'beds',
 'bed_type',
 'amenities',


In [9]:
# check the number of rows, columns and features of the Austin's listings dataset
austin_listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11668 entries, 0 to 11667
Columns: 106 entries, id to reviews_per_month
dtypes: float64(23), int64(22), object(61)
memory usage: 9.4+ MB


In [10]:
#List of austin_listings column headers
list(austin_listings.columns.values)

['id',
 'listing_url',
 'scrape_id',
 'last_scraped',
 'name',
 'summary',
 'space',
 'description',
 'experiences_offered',
 'neighborhood_overview',
 'notes',
 'transit',
 'access',
 'interaction',
 'house_rules',
 'thumbnail_url',
 'medium_url',
 'picture_url',
 'xl_picture_url',
 'host_id',
 'host_url',
 'host_name',
 'host_since',
 'host_location',
 'host_about',
 'host_response_time',
 'host_response_rate',
 'host_acceptance_rate',
 'host_is_superhost',
 'host_thumbnail_url',
 'host_picture_url',
 'host_neighbourhood',
 'host_listings_count',
 'host_total_listings_count',
 'host_verifications',
 'host_has_profile_pic',
 'host_identity_verified',
 'street',
 'neighbourhood',
 'neighbourhood_cleansed',
 'neighbourhood_group_cleansed',
 'city',
 'state',
 'zipcode',
 'market',
 'smart_location',
 'country_code',
 'country',
 'latitude',
 'longitude',
 'is_location_exact',
 'property_type',
 'room_type',
 'accommodates',
 'bathrooms',
 'bedrooms',
 'beds',
 'bed_type',
 'amenities',


In [11]:
# Check the first five rows of the San francisco's calendar data
sanfrancisco_calendar.head()

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,958,2020-04-08,f,$115.00,$115.00,1,1125
1,958,2020-04-09,f,$115.00,$115.00,1,1125
2,958,2020-04-10,f,$115.00,$115.00,1,1125
3,958,2020-04-11,f,$115.00,$115.00,1,1125
4,958,2020-04-12,f,$115.00,$115.00,1,1125


In [12]:
# Check the first five rows of the Austin's calendar data
austin_calendar.head()

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,155359,2020-03-17,f,$90.00,$90.00,90,830
1,154103,2020-03-17,f,$475.00,$475.00,3,60
2,154103,2020-03-18,f,$475.00,$475.00,3,60
3,154103,2020-03-19,t,$475.00,$475.00,3,60
4,154103,2020-03-20,t,$475.00,$475.00,3,60


In [13]:
# Check the first five rows of the San francisco's listings data
sanfrancisco_listings.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,958,https://www.airbnb.com/rooms/958,20200407152614,2020-04-08,"Bright, Modern Garden Unit - 1BR/1B",New update: the house next door is under const...,"Newly remodeled, modern, and bright garden uni...",New update: the house next door is under const...,none,*Quiet cul de sac in friendly neighborhood *St...,...,t,f,moderate,f,f,1,1,0,0,1.84
1,5858,https://www.airbnb.com/rooms/5858,20200407152614,2020-04-09,Creative Sanctuary,,We live in a large Victorian house on a quiet ...,We live in a large Victorian house on a quiet ...,none,I love how our neighborhood feels quiet but is...,...,f,f,strict_14_with_grace_period,f,f,1,1,0,0,0.83
2,7918,https://www.airbnb.com/rooms/7918,20200407152614,2020-04-08,A Friendly Room - UCSF/USF - San Francisco,Nice and good public transportation. 7 minute...,"Settle down, S.F. resident, student, hospital,...",Nice and good public transportation. 7 minute...,none,"Shopping old town, restaurants, McDonald, Whol...",...,f,f,strict_14_with_grace_period,f,f,9,0,9,0,0.15
3,8142,https://www.airbnb.com/rooms/8142,20200407152614,2020-04-08,Friendly Room Apt. Style -UCSF/USF - San Franc...,Nice and good public transportation. 7 minute...,"Settle down, S.F. resident, student, hospital,...",Nice and good public transportation. 7 minute...,none,,...,f,f,strict_14_with_grace_period,f,f,9,0,9,0,0.12
4,8339,https://www.airbnb.com/rooms/8339,20200407152614,2020-04-07,Historic Alamo Square Victorian,Pls email before booking. Interior featured i...,Please send us a quick message before booking ...,Pls email before booking. Interior featured i...,none,,...,f,f,moderate,t,t,2,2,0,0,0.22


In [14]:
# Check the first five rows of the Austin's listings data
austin_listings.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2265,https://www.airbnb.com/rooms/2265,20200317143754,2020-03-17,Zen-East in the Heart of Austin (monthly rental),Zen East is situated in a vibrant & diverse mu...,This colorful and clean 1923 house was complet...,Zen East is situated in a vibrant & diverse mu...,none,,...,f,f,strict_14_with_grace_period,f,f,3,2,1,0,0.18
1,5245,https://www.airbnb.com/rooms/5245,20200317143754,2020-03-17,"Eco friendly, Colorful, Clean, Cozy monthly share",Situated in a vibrant & diverse multicultural ...,"This green, colorful, clean and cozy house was...",Situated in a vibrant & diverse multicultural ...,none,,...,f,f,strict_14_with_grace_period,f,f,3,2,1,0,0.07
2,5456,https://www.airbnb.com/rooms/5456,20200317143754,2020-03-17,"Walk to 6th, Rainey St and Convention Ctr",Great central location for walking to Convent...,Cute Private Studio apartment located in Willo...,Great central location for walking to Convent...,none,My neighborhood is ideally located if you want...,...,f,f,strict_14_with_grace_period,f,t,1,1,0,0,3.94
3,5769,https://www.airbnb.com/rooms/5769,20200317143754,2020-03-17,NW Austin Room,,Looking for a comfortable inexpensive room to ...,Looking for a comfortable inexpensive room to ...,none,Quiet neighborhood with lots of trees and good...,...,f,f,moderate,t,t,1,0,1,0,2.12
4,6413,https://www.airbnb.com/rooms/6413,20200317143754,2020-03-17,Gem of a Studio near Downtown,"Great studio apartment, perfect for couples or...",!!!!! SXSW info !!!!! Presently open (again!) ...,"Great studio apartment, perfect for couples or...",none,Travis Heights is one of the oldest neighborho...,...,t,f,strict_14_with_grace_period,f,f,1,1,0,0,0.9


# Clean Data

<p>Data cleaning is a vital process in data science. Its a process of preparing data for analysis by removing or modifying data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted. In most cases, the eliminated data does not contribute meaningfully to data analysis because they are not useful indicators and they usually generate inaccurate results.  </p> 

<p>The calendar datasets have no missing data but for the listings datasets we will be removing missing values(NaN) where necessary. The reason being that there are no generic standards for handling missing data especially when it comes to dropping values. This is dependent on your data types and what you actually intend doing with the data(insight). We will also perform some data wrangling in a that befits our proposed analysis.</p>

In [15]:
def check_null_columns(df):
    '''List all columns with missing values
    
    Input:
        df: Dataframe

    Returns:
        Dataframe: a dataframe with column names, number of missing values, and percentage of missing values
    '''
    cols = df.columns[df.isnull().sum() >= 0]
    df_null = pd.DataFrame(df[cols].isnull().sum().sort_values(), columns=['Number of Nulls'])
    df_null['% of Nulls'] = df[cols].isnull().mean().sort_values() * 100
    
    return df_null


In [16]:
# check all columns with missing values in sanfrancisco_listings
check_null_columns(sanfrancisco_listings)

Unnamed: 0,Number of Nulls,% of Nulls
id,0,0.000000
minimum_maximum_nights,0,0.000000
maximum_minimum_nights,0,0.000000
minimum_minimum_nights,0,0.000000
maximum_nights,0,0.000000
...,...,...
square_feet,8023,98.586876
thumbnail_url,8138,100.000000
medium_url,8138,100.000000
xl_picture_url,8138,100.000000


In [17]:
# check all columns with missing values in austin_listings
check_null_columns(austin_listings)

Unnamed: 0,Number of Nulls,% of Nulls
id,0,0.000000
minimum_maximum_nights,0,0.000000
maximum_minimum_nights,0,0.000000
minimum_minimum_nights,0,0.000000
maximum_nights,0,0.000000
...,...,...
license,11621,99.597189
xl_picture_url,11668,100.000000
medium_url,11668,100.000000
thumbnail_url,11668,100.000000


In [18]:
# Let's check the datatype of the dataframe
def check_type(df):
    '''Function to check the datatype of the dataframe
    
    Input:
        df: Dataframe

    Returns:
        Series: Data type of each column
    '''
    df_col_types = df.dtypes

    return df_col_types

In [19]:
# check data types of each column belonging to sanfrancisco_listings dataframe
dict(check_type(sanfrancisco_listings))

{'id': dtype('int64'),
 'listing_url': dtype('O'),
 'scrape_id': dtype('int64'),
 'last_scraped': dtype('O'),
 'name': dtype('O'),
 'summary': dtype('O'),
 'space': dtype('O'),
 'description': dtype('O'),
 'experiences_offered': dtype('O'),
 'neighborhood_overview': dtype('O'),
 'notes': dtype('O'),
 'transit': dtype('O'),
 'access': dtype('O'),
 'interaction': dtype('O'),
 'house_rules': dtype('O'),
 'thumbnail_url': dtype('float64'),
 'medium_url': dtype('float64'),
 'picture_url': dtype('O'),
 'xl_picture_url': dtype('float64'),
 'host_id': dtype('int64'),
 'host_url': dtype('O'),
 'host_name': dtype('O'),
 'host_since': dtype('O'),
 'host_location': dtype('O'),
 'host_about': dtype('O'),
 'host_response_time': dtype('O'),
 'host_response_rate': dtype('O'),
 'host_acceptance_rate': dtype('O'),
 'host_is_superhost': dtype('O'),
 'host_thumbnail_url': dtype('O'),
 'host_picture_url': dtype('O'),
 'host_neighbourhood': dtype('O'),
 'host_listings_count': dtype('int64'),
 'host_total_li

In [20]:
# check data types of each column belonging to austin_listings dataframe
dict(check_type(austin_listings))

{'id': dtype('int64'),
 'listing_url': dtype('O'),
 'scrape_id': dtype('int64'),
 'last_scraped': dtype('O'),
 'name': dtype('O'),
 'summary': dtype('O'),
 'space': dtype('O'),
 'description': dtype('O'),
 'experiences_offered': dtype('O'),
 'neighborhood_overview': dtype('O'),
 'notes': dtype('O'),
 'transit': dtype('O'),
 'access': dtype('O'),
 'interaction': dtype('O'),
 'house_rules': dtype('O'),
 'thumbnail_url': dtype('float64'),
 'medium_url': dtype('float64'),
 'picture_url': dtype('O'),
 'xl_picture_url': dtype('float64'),
 'host_id': dtype('int64'),
 'host_url': dtype('O'),
 'host_name': dtype('O'),
 'host_since': dtype('O'),
 'host_location': dtype('O'),
 'host_about': dtype('O'),
 'host_response_time': dtype('O'),
 'host_response_rate': dtype('O'),
 'host_acceptance_rate': dtype('O'),
 'host_is_superhost': dtype('O'),
 'host_thumbnail_url': dtype('O'),
 'host_picture_url': dtype('O'),
 'host_neighbourhood': dtype('O'),
 'host_listings_count': dtype('float64'),
 'host_total_

In [21]:
# Let's check the shape of the dataframe
def check_shape(df):
    '''check shape of dataframe
    
    Input:
        df: Dataframe

    Returns:
        Tuple: Dimensionality of dataframe
    '''
    df_shape = df.shape
    return df_shape
    

In [22]:
# check shape of the sanfrancisco_listings dataframe before cleaning the data
check_shape(sanfrancisco_listings)

(8138, 106)

In [23]:
# check shape of the austin_listings dataframe before cleaning the data
check_shape(austin_listings)

(11668, 106)

In [24]:
def drop_columns_with_all_values_missing(df):
    '''Function to drop all columns with 100% missing values
    
    Input:
        df: Dataframe

    Returns:
        Dataframe: a dataframe without columns that have 100% missing values
    '''

    # Drop column: Columns with 100% missing values exist in the dataframe. That is alot, so we can safely drop 
    # the 4 affected columns Drop.  
    df.drop(['neighbourhood_group_cleansed', 'thumbnail_url', 'medium_url','xl_picture_url'], axis=1, inplace=True)
    return df

In [25]:
# clean sanfrancisco_listings dataset
pd.set_option('display.max_columns', 120)
drop_columns_with_all_values_missing(sanfrancisco_listings)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,958,https://www.airbnb.com/rooms/958,20200407152614,2020-04-08,"Bright, Modern Garden Unit - 1BR/1B",New update: the house next door is under const...,"Newly remodeled, modern, and bright garden uni...",New update: the house next door is under const...,none,*Quiet cul de sac in friendly neighborhood *St...,Due to the fact that we have children and a do...,*Public Transportation is 1/2 block away. *Ce...,*Full access to patio and backyard (shared wit...,A family of 4 lives upstairs with their dog. N...,* No Pets - even visiting guests for a short t...,https://a0.muscache.com/im/pictures/b7c2a199-4...,1169,https://www.airbnb.com/users/show/1169,Holly,2008-07-31,"San Francisco, California, United States",We are a family with 2 boys born in 2009 and 2...,within an hour,90%,98%,t,https://a0.muscache.com/im/pictures/user/efdad...,https://a0.muscache.com/im/pictures/user/efdad...,Duboce Triangle,1,1,"['email', 'phone', 'facebook', 'reviews', 'kba']",t,t,"San Francisco, CA, United States",Lower Haight,Western Addition,San Francisco,CA,94117,San Francisco,"San Francisco, CA",US,United States,37.769310,-122.433860,t,Apartment,Entire home/apt,3,1.0,1.0,2.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets liv...",,$170.00,"$1,120.00","$4,200.00",$100.00,$100.00,2,$25.00,1,1125,1,1,1125,1125,1.0,1125.0,3 weeks ago,t,25,43,58,106,2020-04-08,240,56,2009-07-23,2020-03-13,97.0,10.0,10.0,10.0,10.0,10.0,9.0,t,STR-0001256,"{""SAN FRANCISCO""}",t,f,moderate,f,f,1,1,0,0,1.84
1,5858,https://www.airbnb.com/rooms/5858,20200407152614,2020-04-09,Creative Sanctuary,,We live in a large Victorian house on a quiet ...,We live in a large Victorian house on a quiet ...,none,I love how our neighborhood feels quiet but is...,All the furniture in the house was handmade so...,The train is two blocks away and you can stop ...,"Our deck, garden, gourmet kitchen and extensiv...",,"Please respect the house, the art work, the fu...",https://a0.muscache.com/im/pictures/17714/3a7a...,8904,https://www.airbnb.com/users/show/8904,Philip And Tania,2009-03-02,"San Francisco, California, United States",Philip: English transplant to the Bay Area and...,within a day,100%,81%,f,https://a0.muscache.com/im/users/8904/profile_...,https://a0.muscache.com/im/users/8904/profile_...,Bernal Heights,2,2,"['email', 'phone', 'reviews', 'kba', 'work_ema...",t,t,"San Francisco, CA, United States",Bernal Heights,Bernal Heights,San Francisco,CA,94110,San Francisco,"San Francisco, CA",US,United States,37.745110,-122.421020,t,Apartment,Entire home/apt,5,1.0,2.0,3.0,Real Bed,"{Internet,Wifi,Kitchen,Heating,""Family/kid fri...",,$235.00,"$1,600.00","$5,500.00",,$100.00,2,$0.00,30,60,30,30,60,60,30.0,60.0,2 weeks ago,t,0,0,0,0,2020-04-09,111,0,2009-05-03,2017-08-06,98.0,10.0,10.0,10.0,10.0,10.0,9.0,t,,"{""SAN FRANCISCO""}",f,f,strict_14_with_grace_period,f,f,1,1,0,0,0.83
2,7918,https://www.airbnb.com/rooms/7918,20200407152614,2020-04-08,A Friendly Room - UCSF/USF - San Francisco,Nice and good public transportation. 7 minute...,"Settle down, S.F. resident, student, hospital,...",Nice and good public transportation. 7 minute...,none,"Shopping old town, restaurants, McDonald, Whol...",Wi-Fi signal in common areas. Large eat in k...,N Juda Muni and bus stop. Street parking.,,,"No party, No smoking, not for any kinds of smo...",https://a0.muscache.com/im/pictures/26356/8030...,21994,https://www.airbnb.com/users/show/21994,Aaron,2009-06-17,"San Francisco, California, United States",7 minutes walk to UCSF hospital & school campu...,within an hour,100%,86%,f,https://a0.muscache.com/im/users/21994/profile...,https://a0.muscache.com/im/users/21994/profile...,Cole Valley,10,10,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"San Francisco, CA, United States",Cole Valley,Haight Ashbury,San Francisco,CA,94117,San Francisco,"San Francisco, CA",US,United States,37.765550,-122.452130,t,Apartment,Private room,2,4.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,Kitchen,""Free street parking...",,$65.00,$485.00,"$1,685.00",$200.00,$50.00,1,$12.00,32,60,32,32,60,60,32.0,60.0,5 months ago,t,30,60,90,365,2020-04-08,19,2,2009-08-31,2020-03-06,84.0,7.0,8.0,9.0,9.0,9.0,8.0,t,,"{""SAN FRANCISCO""}",f,f,strict_14_with_grace_period,f,f,9,0,9,0,0.15
3,8142,https://www.airbnb.com/rooms/8142,20200407152614,2020-04-08,Friendly Room Apt. Style -UCSF/USF - San Franc...,Nice and good public transportation. 7 minute...,"Settle down, S.F. resident, student, hospital,...",Nice and good public transportation. 7 minute...,none,,Wi-Fi signal in common areas. Large eat in k...,"N Juda Muni, Bus and UCSF Shuttle. small shopp...",,,no pet no smoke no party inside the building,https://a0.muscache.com/im/pictures/27832/3b1f...,21994,https://www.airbnb.com/users/show/21994,Aaron,2009-06-17,"San Francisco, California, United States",7 minutes walk to UCSF hospital & school campu...,within an hour,100%,86%,f,https://a0.muscache.com/im/users/21994/profile...,https://a0.muscache.com/im/users/21994/profile...,Cole Valley,10,10,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"San Francisco, CA, United States",Cole Valley,Haight Ashbury,San Francisco,CA,94117,San Francisco,"San Francisco, CA",US,United States,37.765550,-122.452130,t,Apartment,Private room,2,4.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,Kitchen,""Free street parking...",,$65.00,$490.00,"$1,685.00",$200.00,$50.00,1,$12.00,32,90,32,32,90,90,32.0,90.0,9 months ago,t,30,60,90,365,2020-04-08,8,0,2014-09-08,2018-09-12,93.0,9.0,9.0,10.0,10.0,9.0,9.0,t,,"{""SAN FRANCISCO""}",f,f,strict_14_with_grace_period,f,f,9,0,9,0,0.12
4,8339,https://www.airbnb.com/rooms/8339,20200407152614,2020-04-07,Historic Alamo Square Victorian,Pls email before booking. Interior featured i...,Please send us a quick message before booking ...,Pls email before booking. Interior featured i...,none,,tax ID on file tax ID on file,,Guests have access to everything listed and sh...,,House Manual and House Rules will be provided ...,https://a0.muscache.com/im/pictures/213fbf05-3...,24215,https://www.airbnb.com/users/show/24215,Rosy,2009-07-02,"San Francisco, California, United States",I'm an Interior Stylist living in SF. \r\n\r\n...,within a few hours,100%,43%,f,https://a0.muscache.com/im/pictures/user/6e05b...,https://a0.muscache.com/im/pictures/user/6e05b...,Alamo Square,2,2,"['email', 'phone', 'reviews', 'kba']",t,t,"San Francisco, CA, United States",Western Addition/NOPA,Western Addition,San Francisco,CA,94117,San Francisco,"San Francisco, CA",US,United States,37.775250,-122.436370,t,Condominium,Entire home/apt,4,1.5,2.0,2.0,Real Bed,"{TV,Internet,Wifi,Kitchen,""Free street parking...",,$703.00,,,$0.00,$166.00,2,$189.00,5,111,5,5,111,111,5.0,111.0,4 months ago,t,30,60,90,365,2020-04-07,28,1,2009-09-25,2019-06-28,97.0,10.0,10.0,10.0,10.0,10.0,10.0,t,STR-0000264,"{""SAN FRANCISCO""}",f,f,moderate,t,t,2,2,0,0,0.22
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8133,43103703,https://www.airbnb.com/rooms/43103703,20200407152614,2020-04-08,San Francisco Home in the Heart of Lower Haight,Lovingly furnished and private studio with que...,,Lovingly furnished and private studio with que...,none,,,,,,,https://a0.muscache.com/im/pictures/73c74945-7...,21078778,https://www.airbnb.com/users/show/21078778,Paige,2014-09-08,"Los Angeles, California, United States",,,,,f,https://a0.muscache.com/im/users/21078778/prof...,https://a0.muscache.com/im/users/21078778/prof...,Lower Haight,1,1,"['email', 'phone', 'reviews', 'kba', 'work_ema...",t,t,"San Francisco, CA, United States",Hayes Valley,Western Addition,San Francisco,CA,94117,San Francisco,"San Francisco, CA",US,United States,37.771960,-122.431190,t,Apartment,Entire home/apt,3,1.0,0.0,2.0,Real Bed,"{Wifi,Kitchen,Washer,Dryer,""Smoke detector"",""C...",,$112.00,,,$500.00,$55.00,1,$0.00,30,1125,30,30,1125,1125,30.0,1125.0,yesterday,t,24,54,84,173,2020-04-08,0,0,,,,,,,,,,t,,"{""SAN FRANCISCO""}",t,f,flexible,f,f,1,1,0,0,
8134,43105611,https://www.airbnb.com/rooms/43105611,20200407152614,2020-04-07,UP to 10 First Responders WANTED for 3 LEVELS ...,IDEAL HOME FOR PEOPLE ON THE FRONTLINES OF THE...,This remodeled Edwardian House in the prime lo...,IDEAL HOME FOR PEOPLE ON THE FRONTLINES OF THE...,none,Distance to SF attractions: West Portal- 5 min...,UPSTAIRS: 3 Bedrooms & 2.5 bathrooms incl. 1 m...,L-Taraval Muni Metro Station 1 block from our ...,Entire house including Garage and Backyard. Pr...,WE DO NOT LIVE AT THE PROPERTY.,No Shoes in the house. No Smoking. No Drinking.,https://a0.muscache.com/im/pictures/2d796f60-3...,7479539,https://www.airbnb.com/users/show/7479539,Igor,2013-07-14,"San Francisco, California, United States",I love meeting & hosting people from around th...,within a few hours,100%,100%,f,https://a0.muscache.com/im/pictures/user/991af...,https://a0.muscache.com/im/pictures/user/991af...,Richmond District,7,7,"['email', 'phone', 'reviews', 'manual_offline'...",t,f,"San Francisco, CA, United States",Parkside,Parkside,San Francisco,CA,94116,San Francisco,"San Francisco, CA",US,United States,37.741630,-122.480370,t,House,Entire home/apt,10,3.5,4.0,10.0,Real Bed,"{TV,Wifi,Pool,Kitchen,""Free parking on premise...",,$269.00,,,"$5,000.00",$150.00,1,$0.00,30,90,30,30,1125,1125,30.0,1125.0,4 days ago,t,30,60,90,90,2020-04-07,0,0,,,,,,,,,,t,,"{""SAN FRANCISCO""}",t,f,flexible,f,f,4,1,1,2,
8135,43116773,https://www.airbnb.com/rooms/43116773,20200407152614,2020-04-07,Furnished 3 bedroom house,This newly renovated and modern house is perfe...,"This Designer house is located in the ""safer"" ...",This newly renovated and modern house is perfe...,none,Smack in the middle of the vibrant Valencia co...,,,,,,https://a0.muscache.com/im/pictures/9ed7be69-a...,23319517,https://www.airbnb.com/users/show/23319517,Chris,2014-11-04,"San Francisco, California, United States",Airbnb superhost who was previousy in the hosp...,within a few hours,100%,0%,f,https://a0.muscache.com/im/pictures/user/0f02b...,https://a0.muscache.com/im/pictures/user/0f02b...,Mission District,1,1,"['email', 'phone', 'google', 'reviews', 'jumio...",t,t,"San Francisco, CA, United States",Mission District,Mission,San Francisco,CA,94103,San Francisco,"San Francisco, CA",US,United States,37.767120,-122.423700,t,House,Entire home/apt,6,2.5,3.0,4.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets all...",,$375.00,,,$500.00,$650.00,1,$0.00,30,1125,30,30,1125,1125,30.0,1125.0,2 days ago,t,0,5,35,310,2020-04-07,0,0,,,,,,,,,,t,,"{""SAN FRANCISCO""}",f,f,flexible,f,f,2,2,0,0,
8136,43125044,https://www.airbnb.com/rooms/43125044,20200407152614,2020-04-08,Beautiful Room Near Ghirardelli Square for Sublet,Beautifully furnished private room in a two be...,,Beautifully furnished private room in a two be...,none,,,,,,,https://a0.muscache.com/im/pictures/8bbf4484-2...,22616037,https://www.airbnb.com/users/show/22616037,Morgan,2014-10-16,"Washington, District of Columbia, United States",,,,,f,https://a0.muscache.com/im/users/22616037/prof...,https://a0.muscache.com/im/users/22616037/prof...,Fisherman's Wharf,0,0,"['email', 'phone', 'offline_government_id', 'g...",t,f,"San Francisco, CA, United States",SoMa,Russian Hill,San Francisco,CA,94109,San Francisco,"San Francisco, CA",US,United States,37.805720,-122.422270,t,Apartment,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,Wifi,Kitchen,Heating,Washer,Dryer,""Smoke d...",,$86.00,,,$200.00,$100.00,1,$0.00,30,120,30,30,1125,1125,30.0,1125.0,yesterday,t,24,54,84,84,2020-04-08,0,0,,,,,,,,,,t,,"{""SAN FRANCISCO""}",t,f,flexible,f,f,1,0,1,0,


In [26]:
# clean austin_listings dataset
pd.set_option('display.max_columns', 120)
drop_columns_with_all_values_missing(austin_listings)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2265,https://www.airbnb.com/rooms/2265,20200317143754,2020-03-17,Zen-East in the Heart of Austin (monthly rental),Zen East is situated in a vibrant & diverse mu...,This colorful and clean 1923 house was complet...,Zen East is situated in a vibrant & diverse mu...,none,,A 2013 Genuine Buddy Scooter 125 may be availa...,5 min walk to Capitol Metro Rail (train that t...,"Several local restaurants, small clubs, music ...","Depending on your dates and arrival time, I am...",• Check-in time is 4 pm. Check out is 11 am. I...,https://a0.muscache.com/im/pictures/4740524/63...,2466,https://www.airbnb.com/users/show/2466,Paddy,2008-08-23,"Austin, Texas, United States",I am a long time resident of Austin. I earned ...,,,100%,t,https://a0.muscache.com/im/users/2466/profile_...,https://a0.muscache.com/im/users/2466/profile_...,East Downtown,3.0,3.0,"['email', 'phone', 'facebook', 'reviews', 'kba']",t,t,"Austin, TX, United States",East Downtown,78702,Austin,TX,78702.0,Austin,"Austin, TX",US,United States,30.277500,-97.713980,f,House,Entire home/apt,4,2.0,2.0,2.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",,$225.00,,,$500.00,$100.00,4,$30.00,30,90,30,30,90,90,30.0,90.0,7 months ago,t,0,0,0,0,2020-03-17,24,0,2009-03-17,2019-03-16,93.0,9.0,10.0,10.0,10.0,8.0,9.0,f,,"{""Texas State""}",f,f,strict_14_with_grace_period,f,f,3,2,1,0,0.18
1,5245,https://www.airbnb.com/rooms/5245,20200317143754,2020-03-17,"Eco friendly, Colorful, Clean, Cozy monthly share",Situated in a vibrant & diverse multicultural ...,"This green, colorful, clean and cozy house was...",Situated in a vibrant & diverse multicultural ...,none,,Please note: A two story studio was built in t...,,,"I should be available, upon your arrival, to a...",A brief profile for all guests along with phot...,https://a0.muscache.com/im/pictures/5167505/b3...,2466,https://www.airbnb.com/users/show/2466,Paddy,2008-08-23,"Austin, Texas, United States",I am a long time resident of Austin. I earned ...,,,100%,t,https://a0.muscache.com/im/users/2466/profile_...,https://a0.muscache.com/im/users/2466/profile_...,East Downtown,3.0,3.0,"['email', 'phone', 'facebook', 'reviews', 'kba']",t,t,"Austin, TX, United States",East Downtown,78702,Austin,TX,78702.0,Austin,"Austin, TX",US,United States,30.275770,-97.713790,t,House,Private room,2,1.0,1.0,2.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",,$100.00,,,$500.00,$75.00,2,$35.00,30,60,30,30,60,60,30.0,60.0,9 months ago,t,0,0,0,0,2020-03-17,9,0,2009-03-19,2018-03-14,91.0,10.0,8.0,10.0,9.0,10.0,9.0,f,,"{""Texas State""}",f,f,strict_14_with_grace_period,f,f,3,2,1,0,0.07
2,5456,https://www.airbnb.com/rooms/5456,20200317143754,2020-03-17,"Walk to 6th, Rainey St and Convention Ctr",Great central location for walking to Convent...,Cute Private Studio apartment located in Willo...,Great central location for walking to Convent...,none,My neighborhood is ideally located if you want...,Parking on street requires a permit. Permits ...,"Bus stop around the block. Uber, Lyft, Ride, ...",Guests have access to yard and patio.,I am happy to welcome my guests and show them in.,No Pets allowed. No smoking in the room. No m...,https://a0.muscache.com/im/pictures/14084884/b...,8028,https://www.airbnb.com/users/show/8028,Sylvia,2009-02-16,"Austin, Texas, United States",I am a licensed Real Estate Broker and owner o...,within an hour,100%,98%,t,https://a0.muscache.com/im/users/8028/profile_...,https://a0.muscache.com/im/users/8028/profile_...,East Downtown,1.0,1.0,"['email', 'phone', 'reviews', 'kba']",t,t,"Austin, TX, United States",East Downtown,78702,Austin,TX,78702.0,Austin,"Austin, TX",US,United States,30.261120,-97.734480,t,Guesthouse,Entire home/apt,3,1.0,1.0,2.0,Real Bed,"{TV,Wifi,""Air conditioning"",Kitchen,""Pets live...",,$95.00,,,$100.00,,2,$45.00,2,90,2,2,90,90,2.0,90.0,6 days ago,t,23,46,70,334,2020-03-17,529,53,2009-03-08,2020-03-01,97.0,10.0,10.0,10.0,10.0,10.0,10.0,f,,"{""Texas State""}",f,f,strict_14_with_grace_period,f,t,1,1,0,0,3.94
3,5769,https://www.airbnb.com/rooms/5769,20200317143754,2020-03-17,NW Austin Room,,Looking for a comfortable inexpensive room to ...,Looking for a comfortable inexpensive room to ...,none,Quiet neighborhood with lots of trees and good...,,We are approximately 16 miles from downtown Au...,Gravel Parking Kitchen,We interact with our guests as little or as mu...,I will need to see identification at check in....,https://a0.muscache.com/im/pictures/23822033/a...,8186,https://www.airbnb.com/users/show/8186,Elizabeth,2009-02-19,"Austin, Texas, United States",We're easygoing professionals that enjoy meeti...,,,92%,t,https://a0.muscache.com/im/users/8186/profile_...,https://a0.muscache.com/im/users/8186/profile_...,SW Williamson Co.,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"Austin, TX, United States",SW Williamson Co.,78729,Austin,TX,78729.0,Austin,"Austin, TX",US,United States,30.456970,-97.784220,t,House,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",,$40.00,$160.00,,,,2,$0.00,1,14,1,1,14,14,1.0,14.0,7 weeks ago,t,0,0,14,14,2020-03-17,257,16,2010-04-10,2019-11-03,98.0,10.0,10.0,10.0,10.0,10.0,10.0,f,,"{""Texas State""}",f,f,moderate,t,t,1,0,1,0,2.12
4,6413,https://www.airbnb.com/rooms/6413,20200317143754,2020-03-17,Gem of a Studio near Downtown,"Great studio apartment, perfect for couples or...",!!!!! SXSW info !!!!! Presently open (again!) ...,"Great studio apartment, perfect for couples or...",none,Travis Heights is one of the oldest neighborho...,Our calendar only extends a few months. If you...,"Parking for our place is on the street, roughl...",Private patio with lounge chairs and umbrella.,"You may see us during your stay, but you'll ma...",Posted prices include the 11% in local occupan...,https://a0.muscache.com/im/pictures/349818/97e...,13879,https://www.airbnb.com/users/show/13879,Todd,2009-04-17,"Austin, Texas, United States","We're a young family that likes to travel, we ...",within an hour,100%,100%,t,https://a0.muscache.com/im/pictures/user/4f35e...,https://a0.muscache.com/im/pictures/user/4f35e...,Travis Heights,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'offlin...",t,f,"Austin, TX, United States",Travis Heights,78704,Austin,TX,78704.0,Austin,"Austin, TX",US,United States,30.248290,-97.737260,t,Guesthouse,Entire home/apt,2,1.0,0.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",550.0,$99.00,$700.00,"$1,900.00",,$50.00,2,$25.00,3,365,3,3,1125,1125,3.0,1125.0,yesterday,t,0,0,0,0,2020-03-17,112,23,2009-12-14,2020-03-08,99.0,10.0,10.0,10.0,10.0,10.0,10.0,f,32041657928,"{""Texas State""}",t,f,strict_14_with_grace_period,f,f,1,1,0,0,0.90
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11663,42929742,https://www.airbnb.com/rooms/42929742,20200317143754,2020-03-17,Large 1/1 South Congress Prime Location Fun Area,Come stay at my spacious 1/1 right in the midd...,,Come stay at my spacious 1/1 right in the midd...,none,,,,,,,https://a0.muscache.com/im/pictures/f3b8b669-e...,6861434,https://www.airbnb.com/users/show/6861434,Jen,2013-06-11,"Voorhees Township, New Jersey, United States",I love my family and like to travel :),,,,f,https://a0.muscache.com/im/users/6861434/profi...,https://a0.muscache.com/im/users/6861434/profi...,,6.0,6.0,"['email', 'phone', 'reviews', 'kba']",t,t,"Austin, TX, United States",South Congress,78704,Austin,TX,78704.0,Austin,"Austin, TX",US,United States,30.252145,-97.749723,f,Apartment,Entire home/apt,1,1.0,1.0,1.0,Real Bed,"{TV,Wifi,""Air conditioning"",Pool,Kitchen,""Free...",,$60.00,,,,$50.00,1,$0.00,7,1125,1,7,1125,1125,6.9,1125.0,today,t,30,59,59,59,2020-03-17,0,0,,,,,,,,,,f,,"{""Texas State""}",f,f,strict_14_with_grace_period,f,f,5,5,0,0,
11664,42930229,https://www.airbnb.com/rooms/42930229,20200317143754,2020-03-17,Spacious 1/1 convenient location and close to fun,,,,none,,,,,,,https://a0.muscache.com/im/pictures/028fc533-f...,296602448,https://www.airbnb.com/users/show/296602448,Lana,2019-09-20,"Austin, Texas, United States","Lover of traveling, making new friends, food, ...",a few days or more,0%,0%,f,https://a0.muscache.com/im/pictures/user/aa488...,https://a0.muscache.com/im/pictures/user/aa488...,Parker Lane,5.0,5.0,"['phone', 'offline_government_id', 'selfie', '...",t,f,"Austin, TX, United States",Parker Lane,78741,Austin,TX,78741.0,Austin,"Austin, TX",US,United States,30.229216,-97.735202,f,Apartment,Entire home/apt,3,1.0,1.0,1.0,Real Bed,"{TV,Wifi,""Air conditioning"",Pool,Kitchen,""Free...",,$40.00,,,,$50.00,1,$0.00,7,1125,1,7,1125,1125,6.9,1125.0,today,t,28,57,57,57,2020-03-17,0,0,,,,,,,,,,f,,"{""Texas State""}",t,f,strict_14_with_grace_period,f,f,5,5,0,0,
11665,42930678,https://www.airbnb.com/rooms/42930678,20200317143754,2020-03-17,Cozy & sleek 1/1 w cool amenities & easy DT ac...,,,,none,,,,,,,https://a0.muscache.com/im/pictures/6f0b16e4-d...,296602448,https://www.airbnb.com/users/show/296602448,Lana,2019-09-20,"Austin, Texas, United States","Lover of traveling, making new friends, food, ...",a few days or more,0%,0%,f,https://a0.muscache.com/im/pictures/user/aa488...,https://a0.muscache.com/im/pictures/user/aa488...,Parker Lane,5.0,5.0,"['phone', 'offline_government_id', 'selfie', '...",t,f,"Austin, TX, United States",East Riverside,78741,Austin,TX,78741.0,Austin,"Austin, TX",US,United States,30.233525,-97.732541,f,Apartment,Entire home/apt,3,1.0,1.0,1.0,Real Bed,"{TV,Wifi,""Air conditioning"",Pool,Kitchen,""Free...",,$40.00,,,,$50.00,1,$0.00,7,1125,1,7,1125,1125,6.9,1125.0,today,t,29,58,58,58,2020-03-17,0,0,,,,,,,,,,f,,"{""Texas State""}",t,f,strict_14_with_grace_period,f,f,5,5,0,0,
11666,42930768,https://www.airbnb.com/rooms/42930768,20200317143754,2020-03-17,Clean/Comfy 1/1 Travis Heights area w I-35 access,,,,none,,,,,,,https://a0.muscache.com/im/pictures/c9d296b8-c...,296602448,https://www.airbnb.com/users/show/296602448,Lana,2019-09-20,"Austin, Texas, United States","Lover of traveling, making new friends, food, ...",a few days or more,0%,0%,f,https://a0.muscache.com/im/pictures/user/aa488...,https://a0.muscache.com/im/pictures/user/aa488...,Parker Lane,5.0,5.0,"['phone', 'offline_government_id', 'selfie', '...",t,f,"Austin, TX, United States",Travis Heights,78704,Austin,TX,78704.0,Austin,"Austin, TX",US,United States,30.239666,-97.741078,t,Apartment,Entire home/apt,3,1.0,1.0,1.0,Real Bed,"{TV,Wifi,""Air conditioning"",Pool,Kitchen,""Free...",,$45.00,,,,$50.00,1,$0.00,7,1125,1,7,1125,1125,6.9,1125.0,today,t,30,59,59,59,2020-03-17,0,0,,,,,,,,,,f,,"{""Texas State""}",f,f,strict_14_with_grace_period,f,f,5,5,0,0,


In [27]:
# check the shape of the dataframe after cleaning the data
check_shape(sanfrancisco_listings)

(8138, 102)

In [28]:
# Check the sum of Missing values of particular columns in San francisco's calendar data
sanfrancisco_calendar.isna().sum()

listing_id        0
date              0
available         0
price             0
adjusted_price    0
minimum_nights    0
maximum_nights    0
dtype: int64

In [29]:
# Check the sum of Missing values of particular columns in Austin's calendar data
austin_calendar.isna().sum()

listing_id        0
date              0
available         0
price             0
adjusted_price    0
minimum_nights    0
maximum_nights    0
dtype: int64

In [30]:
# Check the sum of Missing values of particular columns in San Francisco's listings data
check_null_columns(sanfrancisco_listings)

Unnamed: 0,Number of Nulls,% of Nulls
id,0,0.000000
street,0,0.000000
neighbourhood_cleansed,0,0.000000
city,0,0.000000
availability_60,0,0.000000
...,...,...
access,3274,40.231015
license,3302,40.575080
weekly_price,7168,88.080609
monthly_price,7176,88.178914


In [31]:
# Check the sum of Missing values of particular columns in Austin's listings data
check_null_columns(austin_listings)

Unnamed: 0,Number of Nulls,% of Nulls
id,0,0.000000
street,0,0.000000
availability_60,0,0.000000
neighbourhood_cleansed,0,0.000000
availability_30,0,0.000000
...,...,...
notes,6115,52.408296
weekly_price,10793,92.500857
monthly_price,10931,93.683579
square_feet,11476,98.354474


## Highlighted Features
<p> At this point, in order to avoid unneccesary complications let us select relevant features to this topic.</p> 
 
<p>Features such as city, state, beds, bathrooms, accommodates , guests_included, bedrooms, zipcode, neighbourhood, review_scores_rating, host_since, security_deposit, host_acceptance_rate, host_response_time, host_identity_verified, property_type,room_type , host_response_rate, price, instant_bookable, calculated_host_listings_count_entire_homes, cancellation_policy and calendar_updated are selected because they are useful indicators for our analysis. 		</p>

In [32]:
def df_columns_highlighted(df):
    '''Function to select specific columns that serve as useful indicators for the project
    
    Input:
        df: Dataframe

    Returns:
        Dataframe: a dataframe with selected columns 
    '''
    
    columns_highlighted = df[['city', 'state', 'beds', 'bathrooms', 'accommodates', 'guests_included', 
                            'bedrooms', 'zipcode', 'neighbourhood', 'review_scores_rating', 
                            'host_since', 'security_deposit', 'host_acceptance_rate', 'host_response_time', 
                            'host_identity_verified', 'property_type','room_type', 'host_response_rate', 
                            'price', 'instant_bookable', 'calculated_host_listings_count_entire_homes', 
                            'cancellation_policy', 'calendar_updated']].copy()
    
    return columns_highlighted

In [33]:
# Dataframe with columns highlighted for sanfrancisco_listings
df_sf_columns_highlight = df_columns_highlighted(sanfrancisco_listings)

In [34]:
# Dataframe with columns highlighted for austin_listings
df_aux_columns_highlight = df_columns_highlighted(austin_listings)

<p><b>We will leave the remaining data columns with missing values untouched. As we proceed if there is need to get 
some insight from columns with missing values we will decide on how to work with those missing values <b></p>

In [35]:
def fillWithMean(df):
    '''Function to fill columns with missing values with their mean
    
    Input:
        df: Dataframe

    Returns:
        Dataframe: a dataframe with selected columns 
    '''
    return df.fillna(df.mean()).dropna(how='all')

In [36]:
def clean_highlighted_feature_data(df):
    '''Function to perform data cleaning and wrangling
    
    Input:
        df: Dataframe

    Returns:
        Dataframe: a dataframe with highlighted features of clean datasets without missing values 
    '''
    #Drop missing values of host_since column because they are small in numbers
    df.dropna(subset=['host_since'], inplace=True)
    
    # Fill all missing values in state, zipcode, host_response_time and review_scores_rating with their mode values
    # Replaced with Frequently occurring values.
    df['state'].fillna(df['state'].value_counts().index[0], inplace=True) 
    df['city'].fillna(df['city'].value_counts().index[0], inplace=True)
    df['zipcode'].fillna(df['zipcode'].value_counts().index[0], inplace=True) 
    df['host_response_time'].fillna(df['host_response_time'].value_counts().index[0], inplace=True) 
    df['review_scores_rating'].fillna(df['review_scores_rating'].value_counts().index[0], inplace=True) 
    df['neighbourhood'].fillna(df['neighbourhood'].value_counts().index[0], inplace=True)

    # Fill all null values in bathroom, bedrooms and beds columns with zero. 
    # So we assume zero implies that the statuses of these values are not provided.
    df.bathrooms.fillna(0, inplace=True)
    df.bedrooms.fillna(0, inplace=True)
    df.beds.fillna(0, inplace=True)
    df['host_identity_verified'].fillna('f', inplace=True)
    
    # Retain the year info of the host_since
    df['host_since'] = df['host_since'].str[:4]
    df['host_since'] = df['host_since'].astype(int)
    
    # Remove percentage sign from host_response_rate and convert to float type
    df['host_response_rate'] = df['host_response_rate'].str.replace('%', '')
    df['host_response_rate'] = df['host_response_rate'].astype(float)
    df['host_response_rate'] = fillWithMean(df['host_response_rate'])
    
    # Remove percentage sign from host_acceptance_rate and convert to float type
    df['host_acceptance_rate'] = df['host_acceptance_rate'].str.replace('%', '')
    df['host_acceptance_rate'] = df['host_acceptance_rate'].astype(float)
    df['host_acceptance_rate'] = fillWithMean(df['host_acceptance_rate'])
    
    # Remove dollar sign from security deposit and Convert to float type 
    df['security_deposit'] = df['security_deposit'].str.replace('$', '')
    df['security_deposit'] = df.security_deposit.str.replace(',', '')
    df['security_deposit'] = df.security_deposit.astype(float)
    df['security_deposit']=  fillWithMean(df['security_deposit'])
   
    return df 

In [37]:
# Return clean datasets for sanfrancisco listings
df_sf_highlights = clean_highlighted_feature_data(df_sf_columns_highlight)
df_sf_highlights.head()

Unnamed: 0,city,state,beds,bathrooms,accommodates,guests_included,bedrooms,zipcode,neighbourhood,review_scores_rating,host_since,security_deposit,host_acceptance_rate,host_response_time,host_identity_verified,property_type,room_type,host_response_rate,price,instant_bookable,calculated_host_listings_count_entire_homes,cancellation_policy,calendar_updated
0,San Francisco,CA,2.0,1.0,3,2,1.0,94117,Lower Haight,97.0,2008,100.0,98.0,within an hour,t,Apartment,Entire home/apt,90.0,$170.00,t,1,moderate,3 weeks ago
1,San Francisco,CA,3.0,1.0,5,2,2.0,94110,Bernal Heights,98.0,2009,458.276754,81.0,within a day,t,Apartment,Entire home/apt,100.0,$235.00,f,1,strict_14_with_grace_period,2 weeks ago
2,San Francisco,CA,1.0,4.0,2,1,1.0,94117,Cole Valley,84.0,2009,200.0,86.0,within an hour,t,Apartment,Private room,100.0,$65.00,f,0,strict_14_with_grace_period,5 months ago
3,San Francisco,CA,1.0,4.0,2,1,1.0,94117,Cole Valley,93.0,2009,200.0,86.0,within an hour,t,Apartment,Private room,100.0,$65.00,f,0,strict_14_with_grace_period,9 months ago
4,San Francisco,CA,2.0,1.5,4,2,2.0,94117,Western Addition/NOPA,97.0,2009,0.0,43.0,within a few hours,t,Condominium,Entire home/apt,100.0,$703.00,f,2,moderate,4 months ago


In [38]:
# Check the sum of Missing values of the selected specific columns in San Francisco's listings data
#check_null_columns(df_sf_highlights)

check_null_columns(df_sf_highlights)

Unnamed: 0,Number of Nulls,% of Nulls
city,0,0.0
calculated_host_listings_count_entire_homes,0,0.0
instant_bookable,0,0.0
price,0,0.0
host_response_rate,0,0.0
room_type,0,0.0
property_type,0,0.0
host_identity_verified,0,0.0
host_response_time,0,0.0
host_acceptance_rate,0,0.0


In [39]:
# Return clean datasets for austin listings
df_aux_highlights = clean_highlighted_feature_data(df_aux_columns_highlight)
df_aux_highlights.head()

Unnamed: 0,city,state,beds,bathrooms,accommodates,guests_included,bedrooms,zipcode,neighbourhood,review_scores_rating,host_since,security_deposit,host_acceptance_rate,host_response_time,host_identity_verified,property_type,room_type,host_response_rate,price,instant_bookable,calculated_host_listings_count_entire_homes,cancellation_policy,calendar_updated
0,Austin,TX,2.0,2.0,4,4,2.0,78702.0,East Downtown,93.0,2008,500.0,100.0,within an hour,t,House,Entire home/apt,96.857665,$225.00,f,2,strict_14_with_grace_period,7 months ago
1,Austin,TX,2.0,1.0,2,2,1.0,78702.0,East Downtown,91.0,2008,500.0,100.0,within an hour,t,House,Private room,96.857665,$100.00,f,2,strict_14_with_grace_period,9 months ago
2,Austin,TX,2.0,1.0,3,2,1.0,78702.0,East Downtown,97.0,2009,100.0,98.0,within an hour,t,Guesthouse,Entire home/apt,100.0,$95.00,f,1,strict_14_with_grace_period,6 days ago
3,Austin,TX,1.0,1.0,2,2,1.0,78729.0,SW Williamson Co.,98.0,2009,288.011905,92.0,within an hour,t,House,Private room,96.857665,$40.00,f,0,moderate,7 weeks ago
4,Austin,TX,1.0,1.0,2,2,0.0,78704.0,Travis Heights,99.0,2009,288.011905,100.0,within an hour,f,Guesthouse,Entire home/apt,100.0,$99.00,t,1,strict_14_with_grace_period,yesterday


In [40]:
# Let's again Check the sum of Missing values of the selected specific columns in Austin's listings data

check_null_columns(df_aux_highlights)

Unnamed: 0,Number of Nulls,% of Nulls
city,0,0.0
calculated_host_listings_count_entire_homes,0,0.0
instant_bookable,0,0.0
price,0,0.0
host_response_rate,0,0.0
room_type,0,0.0
property_type,0,0.0
host_identity_verified,0,0.0
host_response_time,0,0.0
host_acceptance_rate,0,0.0
