# Bay Area, CA - Airbnb Data


## Context

Since its inception in 2008, Airbnb has disrupted the hospitality industry by allowing almost anyone to rent out a spare room and host travelers looking for an overnight stay. 

While Airbnb has publicly available data from many locations, **Santa Clara County is of particular interest as it is one of the major counties in the Bay Area and home to many prominent tech companies and startups in the Silicon Valley.**
As such, home prices and cost of living are exceptionally high in this area, as one will see in the data.


## Content

All data is publicly available under the Creative Commons "Public Domain Dedication" license and has been updated as of June 12th, 2020.Provided are metrics that are publicly visible for each listing (e.g. name, description, price, reviews, etc.), but the dataset contains some Airbnb's internal metrics as well (review score accuracy, host acceptance rate, etc.).

## Acknowledgements

Acknowledgments go to Airbnb for their publicly released datasets that are available at this website.




## Inspiration & ideas:

1. Create a price-suggestion model for new Airbnb hosts who might not know the value of their listing.
    - Word cloud
2. Can we predict the rating of an Airbnb listing utilizing **NLP** of the description columns?How have Airbnb
3. prices changed over time? Are prices seasonal?
   - Time series analysis?
4. Which areas in the Santa Clara County are most "popular"?
   - Region analysis
5. Which features of an Airbnb listing are important to add to its perceived value?
   - Feature analysis,PCA
   - Correlation analysis
   

## Import  libraries and data

In [60]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [61]:
data = pd.read_csv('./data/Airbnb_Listings.csv')
data.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,4952,https://www.airbnb.com/rooms/4952,20200530151957,2020-05-30,Butterfly Inn - Graceful Living !,Lovely garden setting in a serene and art-fill...,Very comfortable Queen bed and small desk in b...,Lovely garden setting in a serene and art-fill...,none,"Located in Professorville, Palo Alto, one of t...",...,f,f,moderate,f,f,5,0,5,0,0.57
1,11464,https://www.airbnb.com/rooms/11464,20200530151957,2020-05-31,Deluxe Private Studio-custom int.,Custom built Studio with exquisite design. Rea...,Description A favorite for international corpo...,Custom built Studio with exquisite design. Rea...,none,10 min. to Lucky's and Starbucks at El Camino ...,...,f,f,moderate,f,t,12,12,0,0,0.08
2,17884,https://www.airbnb.com/rooms/17884,20200530151957,2020-05-31,Silicon Valley Suite,"A guest suite for one or two, in a house in a ...",This is a private suite at the rear of a house...,"A guest suite for one or two, in a house in a ...",none,"This is a very quiet family neighborhood, but ...",...,f,f,strict_14_with_grace_period,f,f,2,2,0,0,0.11
3,21373,https://www.airbnb.com/rooms/21373,20200530151957,2020-05-30,Bonsai Garden Inn in Professorville,Room in gracious home with beautiful garden. ...,"Bright, garden-facing room in beautiful home. ...",Room in gracious home with beautiful garden. ...,none,This room is in an ultra convenient location i...,...,f,f,moderate,f,f,5,0,5,0,2.17
4,37512,https://www.airbnb.com/rooms/37512,20200530151957,2020-05-31,Private room - Parking 3 carport,We live in a safe community close to public tr...,I have a really nice room in a quiet neighborh...,We live in a safe community close to public tr...,none,Our community is a safe environment and at nig...,...,f,f,moderate,t,t,2,0,2,0,1.53


In [62]:
reviews = pd.read_csv('./data/reviews.csv')
reviews.head()

Unnamed: 0,listing_id,date
0,4952,2009-08-02
1,4952,2009-09-04
2,4952,2009-10-16
3,4952,2009-12-10
4,4952,2010-06-08


In [63]:
reviews['listing_id'].nunique()

5795

In [64]:
neighbors = pd.read_csv('./data/neighbourhoods.csv')
neighbors.shape

(16, 2)

In [65]:
neighbors

Unnamed: 0,neighbourhood_group,neighbourhood
0,,Campbell
1,,Cupertino
2,,Gilroy
3,,Los Altos
4,,Los Altos Hills
5,,Los Gatos
6,,Milpitas
7,,Monte Sereno
8,,Morgan Hill
9,,Mountain View


In [66]:
data['id'].nunique()

7221

In [67]:
data.shape

(7221, 106)

In [68]:
data.info

<bound method DataFrame.info of             id                            listing_url       scrape_id  \
0         4952      https://www.airbnb.com/rooms/4952  20200530151957   
1        11464     https://www.airbnb.com/rooms/11464  20200530151957   
2        17884     https://www.airbnb.com/rooms/17884  20200530151957   
3        21373     https://www.airbnb.com/rooms/21373  20200530151957   
4        37512     https://www.airbnb.com/rooms/37512  20200530151957   
...        ...                                    ...             ...   
7216  43567384  https://www.airbnb.com/rooms/43567384  20200530151957   
7217  43568844  https://www.airbnb.com/rooms/43568844  20200530151957   
7218  43579879  https://www.airbnb.com/rooms/43579879  20200530151957   
7219  43580120  https://www.airbnb.com/rooms/43580120  20200530151957   
7220  43591340  https://www.airbnb.com/rooms/43591340  20200530151957   

     last_scraped                                              name  \
0      2020-05-30   

In [69]:
cols = pd.Series(data.columns)
cols

0                                                id
1                                       listing_url
2                                         scrape_id
3                                      last_scraped
4                                              name
                           ...                     
101                  calculated_host_listings_count
102     calculated_host_listings_count_entire_homes
103    calculated_host_listings_count_private_rooms
104     calculated_host_listings_count_shared_rooms
105                               reviews_per_month
Length: 106, dtype: object

In [70]:
# find missing values and single value columns in data:
del_cols = []

for col in data.columns:
    if data[col].isna().all():
        del_cols.append(col)
    elif data[col].nunique() == 1:
        del_cols.append(col)

In [71]:
del_cols

['scrape_id',
 'experiences_offered',
 'thumbnail_url',
 'medium_url',
 'xl_picture_url',
 'neighbourhood_group_cleansed',
 'country_code',
 'country',
 'has_availability',
 'requires_license',
 'license',
 'is_business_travel_ready']

In [72]:
data = data.drop(del_cols, axis=1)

In [73]:
data.shape

(7221, 94)

In [74]:
cols = pd.Series(data.columns)
cols

0                                               id
1                                      listing_url
2                                     last_scraped
3                                             name
4                                          summary
                          ...                     
89                  calculated_host_listings_count
90     calculated_host_listings_count_entire_homes
91    calculated_host_listings_count_private_rooms
92     calculated_host_listings_count_shared_rooms
93                               reviews_per_month
Length: 94, dtype: object

In [75]:
data['id'].isna().sum()

0

In [76]:
# delete url column:
urls = [
    'listing_url', 'picture_url', 'host_url', 'host_thumbnail_url',
    'host_picture_url'
]


In [77]:
data = data.drop(urls, axis=1)

In [78]:
cols = data.columns
# cols

In [79]:
data['property_type'].unique()


array(['Villa', 'Apartment', 'Guest suite', 'Bungalow', 'House',
       'Guesthouse', 'Loft', 'Other', 'Condominium', 'Townhouse',
       'Cottage', 'Bed and breakfast', 'Cabin', 'Camper/RV', 'Tiny house',
       'Serviced apartment', 'Treehouse', 'Tent', 'Train', 'Barn', 'Yurt',
       'Boutique hotel', 'Lighthouse', 'Farm stay', 'Campsite',
       'Earth house', 'Aparthotel', 'Chalet'], dtype=object)

In [80]:
data['property_type'].value_counts()

House                 3715
Apartment             1152
Serviced apartment     504
Townhouse              429
Guest suite            363
Guesthouse             318
Condominium            303
Villa                  129
Bungalow               115
Loft                    41
Cottage                 29
Camper/RV               27
Boutique hotel          23
Tiny house              16
Other                   13
Bed and breakfast       12
Cabin                    7
Farm stay                6
Tent                     5
Treehouse                3
Yurt                     3
Barn                     2
Train                    1
Lighthouse               1
Campsite                 1
Earth house              1
Aparthotel               1
Chalet                   1
Name: property_type, dtype: int64

In [81]:
data['room_type'].unique()

array(['Private room', 'Entire home/apt', 'Shared room', 'Hotel room'],
      dtype=object)

In [82]:
accomo_type = data['accommodates'].unique()
accomo_type.sort()
accomo_type

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])

In [83]:
data['bed_type'].unique()

array(['Real Bed', 'Futon', 'Pull-out Sofa', 'Airbed', 'Couch'],
      dtype=object)

In [84]:
guests_included = data['guests_included'].unique()
guests_included.sort()
guests_included

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 12, 13, 14, 15, 16])

In [85]:
compare_acc_guests = data['accommodates'] == data['guests_included']

In [86]:
compare_acc_guests

0       False
1        True
2       False
3        True
4       False
        ...  
7216     True
7217    False
7218     True
7219     True
7220    False
Length: 7221, dtype: bool

In [87]:
data['jurisdiction_names'].unique()

array(['{"PALO ALTO"}', '{"SANTA CLARA"}', '{"Mountain View"," CA"}',
       '{"SAN JOSE"}', '{Cupertino," CA"}', '{Sunnyvale," CA"}', nan,
       '{Milpitas," CA"}', '{"Morgan Hill"," CA"}', '{"Los Gatos"," CA"}',
       '{"San Benito County"," CA"}'], dtype=object)

In [88]:
data['jurisdiction_names'].isna().sum()

979

In [89]:
data['cancellation_policy'].unique()

array(['moderate', 'strict_14_with_grace_period', 'flexible',
       'super_strict_60'], dtype=object)

In [90]:
data['amenities'].unique()

array(['{TV,"Cable TV",Internet,Wifi,Kitchen,Heating,Washer,Dryer,"Smoke alarm","Carbon monoxide alarm","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"Lock on bedroom door",Hangers,"Hair dryer",Iron,"Laptop-friendly workspace","translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50","Private entrance","Hot water","Bed linens","Extra pillows and blankets",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"Patio or balcony","Garden or backyard","Long term stays allowed","Host greets you"}',
       '{TV,"Cable TV",Internet,Wifi,"Air conditioning",Pool,Kitchen,"Free parking on premises","Pets allowed","Pets live on this property",Dog(s),Cat(s),"Free street parking",Heating,Washer,Dryer,"Smoke alarm","Carbon monoxide alarm","Fire extinguisher",Hangers,"Hair dryer",Iron,"Laptop-friendly workspace","Self check-in","Building staff","Hot water",Microwave,"Coffee maker",Refrigerator,"Dishes

In [91]:
data['city'].unique()

array(['Palo Alto', 'Santa Clara', 'Mountain View', 'San Jose',
       'Cupertino', 'Sunnyvale', 'Campbell', 'Milpitas', 'Saratoga',
       'Morgan Hill', 'Los Altos', 'Los Gatos', 'Menlo Park',
       'Los Altos Hills', 'Monte Sereno', 'Hollister', 'Gilroy',
       'San Jose ', 'San Martin', 'Stanford', 'Santa Clara County',
       'Sunnyvale ', 'Palo Alto ', 'Milpitas ', 'Los Gatos ', 'san jose',
       'Mountain View ', 'Danville ', nan, 'Campbell ', '洛斯阿尔托斯',
       'santa clara', 'Fremont', '圣何塞', 'Watsonville'], dtype=object)

In [92]:
data['state'].unique()

array(['CA', 'Ca', 'ca', nan, 'California '], dtype=object)

In [93]:
data['state'].isna().sum()

1

In [94]:
data['zipcode'].isna().sum()

55

In [95]:
len(data.columns)

89

In [96]:
for col in data.columns:
    list_unique = (col, len(data[col].unique()))
    print(list_unique)

('id', 7221)
('last_scraped', 2)
('name', 7078)
('summary', 6052)
('space', 4403)
('description', 6440)
('neighborhood_overview', 3509)
('notes', 2747)
('transit', 3200)
('access', 3378)
('interaction', 3166)
('house_rules', 3718)
('host_id', 3533)
('host_name', 2103)
('host_since', 2085)
('host_location', 242)
('host_about', 2030)
('host_response_time', 5)
('host_response_rate', 44)
('host_acceptance_rate', 78)
('host_is_superhost', 3)
('host_neighbourhood', 145)
('host_listings_count', 62)
('host_total_listings_count', 62)
('host_verifications', 240)
('host_has_profile_pic', 3)
('host_identity_verified', 3)
('street', 37)
('neighbourhood', 37)
('neighbourhood_cleansed', 16)
('city', 35)
('state', 5)
('zipcode', 83)
('market', 5)
('smart_location', 37)
('latitude', 5882)
('longitude', 6357)
('is_location_exact', 2)
('property_type', 28)
('room_type', 4)
('accommodates', 16)
('bathrooms', 16)
('bedrooms', 11)
('beds', 21)
('bed_type', 5)
('amenities', 6396)
('square_feet', 25)
('price'

In [97]:
# find binary columns:
for col in data.columns:
    if data[col].nunique() == 2:
        print(col)

last_scraped
host_is_superhost
host_has_profile_pic
host_identity_verified
is_location_exact
calendar_last_scraped
instant_bookable
require_guest_profile_picture
require_guest_phone_verification


In [98]:
for col in data.columns:
    if data[col].nunique() == 3:
        print(col)

In [99]:
for col in data.columns:
    if data[col].nunique() == 4:
        print(col)

host_response_time
state
market
room_type
cancellation_policy


In [100]:
for col in data.columns:
    if data[col].nunique() == 5:
        print(col)

bed_type


In [101]:
for col, n in zip(data.columns, data.nunique()):
    if n <10 and n > 1:
        print(col,n)

last_scraped 2
host_response_time 4
host_is_superhost 2
host_has_profile_pic 2
host_identity_verified 2
state 4
market 4
is_location_exact 2
room_type 4
bed_type 5
calendar_last_scraped 2
review_scores_accuracy 8
review_scores_cleanliness 8
review_scores_checkin 8
review_scores_communication 8
review_scores_location 7
review_scores_value 8
instant_bookable 2
cancellation_policy 4
require_guest_profile_picture 2
require_guest_phone_verification 2


In [102]:
data_null = data.isnull().sum()

In [103]:
data_null.sort_values(ascending=False)

square_feet               7192
weekly_price              6647
monthly_price             6629
notes                     3529
access                    3117
                          ... 
maximum_minimum_nights       0
minimum_maximum_nights       0
maximum_maximum_nights       0
minimum_nights_avg_ntm       0
bed_type                     0
Length: 89, dtype: int64

In [104]:
data = data.drop(['square_feet', 'weekly_price', 'monthly_price'], axis=1)


In [105]:
data.head()

Unnamed: 0,id,last_scraped,name,summary,space,description,neighborhood_overview,notes,transit,access,...,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,4952,2020-05-30,Butterfly Inn - Graceful Living !,Lovely garden setting in a serene and art-fill...,Very comfortable Queen bed and small desk in b...,Lovely garden setting in a serene and art-fill...,"Located in Professorville, Palo Alto, one of t...","Housekeeping every Monday, leave your door ope...",Walking distance to Stanford University (30 mi...,"Kitchen, laundry, family/TV room, garden, free...",...,"{""PALO ALTO""}",f,moderate,f,f,5,0,5,0,0.57
1,11464,2020-05-31,Deluxe Private Studio-custom int.,Custom built Studio with exquisite design. Rea...,Description A favorite for international corpo...,Custom built Studio with exquisite design. Rea...,10 min. to Lucky's and Starbucks at El Camino ...,Pet Policy: - Well Behaved pet up to 25 lb. of...,Public transportation at Homestead and Pomeroy...,Complimentary Wifi-internet + Basic Cable,...,"{""SANTA CLARA""}",f,moderate,f,t,12,12,0,0,0.08
2,17884,2020-05-31,Silicon Valley Suite,"A guest suite for one or two, in a house in a ...",This is a private suite at the rear of a house...,"A guest suite for one or two, in a house in a ...","This is a very quiet family neighborhood, but ...","I cannot accommodate cats, sorry. I can usuall...","The CalTrain station is in walking distance, a...",Private outdoor patio. Shared washer/dryer on ...,...,"{""Mountain View"","" CA""}",f,strict_14_with_grace_period,f,f,2,2,0,0,0.11
3,21373,2020-05-30,Bonsai Garden Inn in Professorville,Room in gracious home with beautiful garden. ...,"Bright, garden-facing room in beautiful home. ...",Room in gracious home with beautiful garden. ...,This room is in an ultra convenient location i...,The family room has a flat panel tv and desk f...,The Stanford Shopping Center is 20 minutes wal...,"Kitchen, Laundry, Garden, Family Room with TV,...",...,"{""PALO ALTO""}",f,moderate,f,f,5,0,5,0,2.17
4,37512,2020-05-31,Private room - Parking 3 carport,We live in a safe community close to public tr...,I have a really nice room in a quiet neighborh...,We live in a safe community close to public tr...,Our community is a safe environment and at nig...,Please remember that you are in my home and re...,We have a bus stop right outside our community...,The kitchen and all that it offers. The laund...,...,"{""SAN JOSE""}",f,moderate,t,t,2,0,2,0,1.53


What we got now:

- Pre data cleaning
- Neighbors does not match the neighbors in data
  - sub data by region?
- Location :  clean later
  -  ('latitude', 5882)
  -  ('longitude', 6357)

 

In [106]:
# Calm down, can you do more data cleaning & preprocessing?

## Fusion with reviews:

In [107]:
rev_row = reviews['listing_id'].unique()

In [108]:
data_rev = data.loc[data['id'].isin(rev_row)]
data_rev.head()

Unnamed: 0,id,last_scraped,name,summary,space,description,neighborhood_overview,notes,transit,access,...,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,4952,2020-05-30,Butterfly Inn - Graceful Living !,Lovely garden setting in a serene and art-fill...,Very comfortable Queen bed and small desk in b...,Lovely garden setting in a serene and art-fill...,"Located in Professorville, Palo Alto, one of t...","Housekeeping every Monday, leave your door ope...",Walking distance to Stanford University (30 mi...,"Kitchen, laundry, family/TV room, garden, free...",...,"{""PALO ALTO""}",f,moderate,f,f,5,0,5,0,0.57
1,11464,2020-05-31,Deluxe Private Studio-custom int.,Custom built Studio with exquisite design. Rea...,Description A favorite for international corpo...,Custom built Studio with exquisite design. Rea...,10 min. to Lucky's and Starbucks at El Camino ...,Pet Policy: - Well Behaved pet up to 25 lb. of...,Public transportation at Homestead and Pomeroy...,Complimentary Wifi-internet + Basic Cable,...,"{""SANTA CLARA""}",f,moderate,f,t,12,12,0,0,0.08
2,17884,2020-05-31,Silicon Valley Suite,"A guest suite for one or two, in a house in a ...",This is a private suite at the rear of a house...,"A guest suite for one or two, in a house in a ...","This is a very quiet family neighborhood, but ...","I cannot accommodate cats, sorry. I can usuall...","The CalTrain station is in walking distance, a...",Private outdoor patio. Shared washer/dryer on ...,...,"{""Mountain View"","" CA""}",f,strict_14_with_grace_period,f,f,2,2,0,0,0.11
3,21373,2020-05-30,Bonsai Garden Inn in Professorville,Room in gracious home with beautiful garden. ...,"Bright, garden-facing room in beautiful home. ...",Room in gracious home with beautiful garden. ...,This room is in an ultra convenient location i...,The family room has a flat panel tv and desk f...,The Stanford Shopping Center is 20 minutes wal...,"Kitchen, Laundry, Garden, Family Room with TV,...",...,"{""PALO ALTO""}",f,moderate,f,f,5,0,5,0,2.17
4,37512,2020-05-31,Private room - Parking 3 carport,We live in a safe community close to public tr...,I have a really nice room in a quiet neighborh...,We live in a safe community close to public tr...,Our community is a safe environment and at nig...,Please remember that you are in my home and re...,We have a bus stop right outside our community...,The kitchen and all that it offers. The laund...,...,"{""SAN JOSE""}",f,moderate,t,t,2,0,2,0,1.53


In [109]:
data_rev.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5734 entries, 0 to 7214
Data columns (total 86 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            5734 non-null   int64  
 1   last_scraped                                  5734 non-null   object 
 2   name                                          5734 non-null   object 
 3   summary                                       5563 non-null   object 
 4   space                                         4571 non-null   object 
 5   description                                   5606 non-null   object 
 6   neighborhood_overview                         3884 non-null   object 
 7   notes                                         3191 non-null   object 
 8   transit                                       3618 non-null   object 
 9   access                                        3642 non-null   o

In [110]:
data_rev.describe()

Unnamed: 0,id,host_id,host_listings_count,host_total_listings_count,latitude,longitude,accommodates,bathrooms,bedrooms,beds,...,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
count,5734.0,5734.0,5733.0,5733.0,5734.0,5734.0,5734.0,5734.0,5729.0,5707.0,...,5649.0,5648.0,5649.0,5648.0,5648.0,5734.0,5734.0,5734.0,5734.0,5707.0
mean,24618250.0,84285940.0,102.008373,102.008373,37.352121,-121.96665,3.115452,1.371992,1.400768,1.816015,...,9.580457,9.81852,9.796955,9.797805,9.531339,26.45361,22.485699,2.877224,1.058598,1.359872
std,12096880.0,90378140.0,405.285405,405.285405,0.065509,0.109539,2.365797,0.678344,0.981305,1.535468,...,0.848019,0.643308,0.691166,0.584477,0.83767,83.43773,83.821405,6.912066,5.849206,1.535631
min,4952.0,7054.0,0.0,0.0,36.9656,-122.18868,1.0,0.0,0.0,0.0,...,2.0,2.0,2.0,2.0,2.0,1.0,0.0,0.0,0.0,0.01
25%,14890020.0,14809140.0,1.0,1.0,37.32029,-122.043607,2.0,1.0,1.0,1.0,...,9.0,10.0,10.0,10.0,9.0,1.0,0.0,0.0,0.0,0.32
50%,25982720.0,48005490.0,3.0,3.0,37.35774,-121.96194,2.0,1.0,1.0,1.0,...,10.0,10.0,10.0,10.0,10.0,3.0,1.0,1.0,0.0,0.83
75%,35007040.0,135143800.0,9.0,9.0,37.39832,-121.886337,4.0,1.5,2.0,2.0,...,10.0,10.0,10.0,10.0,10.0,8.0,2.0,3.0,0.0,1.81
max,43560330.0,347737200.0,2007.0,2007.0,37.46274,-121.38012,16.0,8.0,9.0,26.0,...,10.0,10.0,10.0,10.0,10.0,411.0,411.0,62.0,48.0,13.12


In [111]:
counts = reviews['listing_id'].value_counts()


In [112]:
counts.head()

7476637     489
52786       478
10814836    445
19641513    429
13828514    427
Name: listing_id, dtype: int64

## Fusion with neighborhood data

In [113]:
data_neighbor = data['neighbourhood'].unique().tolist()
neighbors_hood = neighbors['neighbourhood'].unique().tolist()

In [114]:
same_locations = []

for neighbor in neighbors_hood:
    if neighbor in data_neighbor:
        same_locations.append(neighbor)

In [115]:
same_locations

['Campbell',
 'Cupertino',
 'Los Altos',
 'Los Altos Hills',
 'Mountain View',
 'Palo Alto',
 'Santa Clara',
 'Sunnyvale']

In [116]:
data_neighbor

['Palo Alto',
 'Santa Clara',
 'Mountain View',
 'South San Jose',
 'Cupertino',
 'Sunnyvale',
 'Downtown',
 'Campbell',
 'West Valley',
 'Edenvale',
 nan,
 'Willow Glen',
 'Central San Jose',
 'Los Altos',
 'Berryessa',
 'Cory',
 'College Park',
 'Alum Rock',
 'Cambrian/Pioneer',
 'North San Jose',
 'Los Altos Hills',
 'Burbank/Del Monte',
 'Evergreen',
 'Newhall/Sherwood',
 'Shasta/Hanchett Park',
 'Rose Garden',
 'Five Wounds/Brookwood Terrace',
 'Naglee Park',
 'Forest/Pruneridge',
 'Alviso',
 'Japantown',
 'Stanford',
 'Delmas Park',
 'Chapman/Morse',
 'Menlo Park',
 'Autumn/Montgomery',
 'Vermont/McKendrie']

In [117]:
neighbors_hood

['Campbell',
 'Cupertino',
 'Gilroy',
 'Los Altos',
 'Los Altos Hills',
 'Los Gatos',
 'Milpitas',
 'Monte Sereno',
 'Morgan Hill',
 'Mountain View',
 'Palo Alto',
 'San Jose',
 'Santa Clara',
 'Saratoga',
 'Sunnyvale',
 'Unincorporated Areas']