## Table Of Contents


- [Store CSV into DataFrame](#Store-CSV-into-DataFrame)
- [Data Exploration](#Data-Exploration)
- [Data Clean-Up Process](#New-Data-With-Selected-Columns)
- [Database Process](#Database-Process)
- [Queries](#Queries)


  ### [Project Report](https://docs.google.com/document/d/1qSFce4Ubi3k_l5FdNinCG_Jhylltwt4KJD3ewPj3d9c/edit?usp=sharing)

### Libraries

In [17]:
import pandas as pd
from sqlalchemy import create_engine
import psycopg2
from config import password

### Store CSV into DataFrame

In [18]:
# Cambridge Listing --> CSV file
cambridge_path_listings = "../ETLProject/Resources/Cambridge_Listings1.csv"
cambridge_listings = pd.read_csv(cambridge_path_listings)
cambridge_listings.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,8521,https://www.airbnb.com/rooms/8521,20191126082128,2019-11-26,SunsplashedSerenity near Harvard $1300/wk flex...,"An elegant, sun-splashed, 2 bedroom (+2offices...","An elegant, sun-splashed, 2+ bedroom apartment...","An elegant, sun-splashed, 2 bedroom (+2offices...",none,Huron Village is known for its charm. We have...,...,f,f,strict_14_with_grace_period,f,f,3,3,0,0,0.26
1,11169,https://www.airbnb.com/rooms/11169,20191126082128,2019-11-26,Lovely Studio Room: Thu-Mons Near Universities!,Large sunny room which comfortably fits a coup...,"We have a peaceful, large, sunny room w/ attac...",Large sunny room which comfortably fits a coup...,none,The neighborhood is quiet and friendly and our...,...,f,f,strict_14_with_grace_period,t,t,4,1,3,0,1.22
2,11945,https://www.airbnb.com/rooms/11945,20191126082128,2019-11-26,Near Harvard: Safe & Lovely Room,Room next to kitchen and living room in wonder...,"Quiet peaceful room w/shared marble bath, wifi...",Room next to kitchen and living room in wonder...,none,Amazing neighborhood: Quiet yet close walk to ...,...,f,f,strict_14_with_grace_period,t,t,4,1,3,0,0.27
3,19581,https://www.airbnb.com/rooms/19581,20191126082128,2019-11-26,"Furnished suite, Windsor","Welcome to Area IV! We are located, convenient...","Furnished suite at the Windsor Inn, Cambridge....","Welcome to Area IV! We are located, convenient...",none,,...,f,f,strict_14_with_grace_period,f,f,3,0,3,0,0.05
4,22006,https://www.airbnb.com/rooms/22006,20191126082128,2019-11-26,B & B near Harvard's Quad Houses,"Two comfortable guest rooms in quiet, tree-fil...","Comfortable, convenient B&B at the north end o...","Two comfortable guest rooms in quiet, tree-fil...",none,"We're in a beautiful neighborhood, with nearby...",...,f,f,moderate,f,f,1,0,1,0,0.89


In [19]:
# Cambridge Reviews -> CSV file
cambridge_path_reviews = "../ETLProject/Resources/Cambridge_Reviews.csv"
cambridge_reviews = pd.read_csv(cambridge_path_reviews)
cambridge_reviews.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,8521,6009,2009-07-23,25629,Mara,This is a fabulous apartment! Great neighborh...
1,8521,6386,2009-07-30,26357,Stephanie,"Wonderful host and hostess, great home, locati..."
2,8521,6910163,2013-08-30,5747319,Lynette,We had a delightful 2 week stay in Cambridge. ...
3,8521,37307811,2015-07-06,27670573,Salina,We had a great time during our short stay in C...
4,8521,38475589,2015-07-16,10811805,Benoni,Janet's place is as beautiful as portrayed. Ni...


### Data Exploration

In [20]:
# Data Types - Listings
cambridge_listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1277 entries, 0 to 1276
Columns: 106 entries, id to reviews_per_month
dtypes: float64(22), int64(22), object(62)
memory usage: 1.0+ MB


In [21]:
# Looking for Missing Values --> Listings
cambridge_listings.isnull().sum()

id                                                 0
listing_url                                        0
scrape_id                                          0
last_scraped                                       0
name                                               0
summary                                           29
space                                            263
description                                       10
experiences_offered                                0
neighborhood_overview                            339
notes                                            623
transit                                          339
access                                           506
interaction                                      435
house_rules                                      430
thumbnail_url                                   1277
medium_url                                      1277
picture_url                                        0
xl_picture_url                                

In [22]:
# Data Types - Reviews
cambridge_reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64199 entries, 0 to 64198
Data columns (total 6 columns):
listing_id       64199 non-null int64
id               64199 non-null int64
date             64199 non-null object
reviewer_id      64199 non-null int64
reviewer_name    64197 non-null object
comments         64169 non-null object
dtypes: int64(3), object(3)
memory usage: 2.9+ MB


In [23]:
# Looking for Missing Values --> Reviews
cambridge_reviews.isnull().sum()

listing_id        0
id                0
date              0
reviewer_id       0
reviewer_name     2
comments         30
dtype: int64

### Data Clean-Up Process

In [24]:
# Data Frame 
listings_locations_df = cambridge_listings[['id', 'city', 'state','zipcode','latitude','longitude']].copy()
listing_details_df = cambridge_listings[['id', 'name', 'summary', 'property_type','bathrooms','bedrooms', 'beds']].copy()
Url_df = cambridge_listings[['id', 'listing_url', 'picture_url']].copy()
hosts_df = cambridge_listings[['id', 'host_id', 'host_url','host_name','host_since','host_location','host_response_rate']].copy()

In [25]:
# Preview Listing Locations
listings_locations_df.head()

Unnamed: 0,id,city,state,zipcode,latitude,longitude
0,8521,Cambridge,MA,2138,42.38329,-71.13617
1,11169,Cambridge,MA,2140,42.39469,-71.13223
2,11945,Cambridge,MA,2140,42.39454,-71.13431
3,19581,Cambridge,MA,2139,42.36276,-71.09765
4,22006,Cambridge,MA,2140,42.3867,-71.12387


In [26]:
# Preview Listing Details
listing_details_df.head()

Unnamed: 0,id,name,summary,property_type,bathrooms,bedrooms,beds
0,8521,SunsplashedSerenity near Harvard $1300/wk flex...,"An elegant, sun-splashed, 2 bedroom (+2offices...",Apartment,1.0,2,2.0
1,11169,Lovely Studio Room: Thu-Mons Near Universities!,Large sunny room which comfortably fits a coup...,House,1.0,1,1.0
2,11945,Near Harvard: Safe & Lovely Room,Room next to kitchen and living room in wonder...,Condominium,1.0,1,1.0
3,19581,"Furnished suite, Windsor","Welcome to Area IV! We are located, convenient...",Bed and breakfast,1.0,1,1.0
4,22006,B & B near Harvard's Quad Houses,"Two comfortable guest rooms in quiet, tree-fil...",House,2.5,1,1.0


In [11]:
# Preview Url
Url_df.head()

Unnamed: 0,id,listing_url,picture_url
0,8521,https://www.airbnb.com/rooms/8521,https://a0.muscache.com/im/pictures/30536/072e...
1,11169,https://www.airbnb.com/rooms/11169,https://a0.muscache.com/im/pictures/75383179/7...
2,11945,https://www.airbnb.com/rooms/11945,https://a0.muscache.com/im/pictures/88bf993e-1...
3,19581,https://www.airbnb.com/rooms/19581,https://a0.muscache.com/im/pictures/188f1b4b-f...
4,22006,https://www.airbnb.com/rooms/22006,https://a0.muscache.com/im/pictures/10277743/5...


In [12]:
# Preview Host Info
hosts_df.head()

Unnamed: 0,id,host_id,host_url,host_name,host_since,host_location,host_response_rate
0,8521,306681,https://www.airbnb.com/users/show/306681,Janet,2010-12-01,"Cambridge, Massachusetts, United States",100%
1,11169,40965,https://www.airbnb.com/users/show/40965,Mazzy,2009-09-24,"Cambridge, Massachusetts, United States",100%
2,11945,40965,https://www.airbnb.com/users/show/40965,Mazzy,2009-09-24,"Cambridge, Massachusetts, United States",100%
3,19581,74249,https://www.airbnb.com/users/show/74249,Marc And Patty,2010-01-27,"Cambridge, Massachusetts, United States",100%
4,22006,84280,https://www.airbnb.com/users/show/84280,Blue,2010-02-22,"Cambridge, Massachusetts, United States",100%


### Database Process

In [13]:
# Connect to local database
engine = create_engine(f'postgresql://postgres:{password}@localhost:5432/Airbnb')

In [14]:
# Check for table
engine.table_names()

['hosts', 'listings_locations', 'listings_details', 'url']

In [None]:
# Use pandas to load csv converted DataFrame into database
listings_locations_df.to_sql(name='listings_locations', con=engine, index=False)


In [None]:
listing_details_df.to_sql(name='listings_details', con=engine, index=False)


In [None]:
Url_df.to_sql(name='url', con=engine, index=False)


In [None]:
hosts_df.to_sql(name='hosts', con=engine, index=False)

### Queries