## Part 1 - Scenario 1 – Chicago Airbnb

You and a group of friends are considering purchasing a property in Chicago that you can use as an investment. You have heard from other people that they have made a lot of money by renting out either a room or an entire unit (apartment or house). Your friends ask you to analyze data so that they can understand how much you would charge per night based on the type of dwelling you were to purchase.

In [1]:
#Loading packages
import seaborn as sns
import pandas as pd
import numpy as np
import string
import sqlite3
from sqlalchemy import create_engine

import warnings
warnings.filterwarnings('ignore')

We will start by importing the data from csv

In [2]:
#Importing dataset from csv
df = pd.read_csv('listings.csv')
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2384,"Hyde Park - Walk to UChicago, 10 min to McCormick",2613,Rebecca,,Hyde Park,41.7879,-87.5878,Private room,60,2,178,2019-12-15,2.56,1,353
1,4505,394 Great Reviews. 127 y/o House. 40 yds to tr...,5775,Craig & Kathleen,,South Lawndale,41.85495,-87.69696,Entire home/apt,105,2,395,2020-07-14,2.81,1,155
2,7126,Tiny Studio Apartment 94 Walk Score,17928,Sarah,,West Town,41.90289,-87.68182,Entire home/apt,60,2,384,2020-03-08,2.81,1,321
3,9811,Barbara's Hideaway - Old Town,33004,At Home Inn,,Lincoln Park,41.91769,-87.63788,Entire home/apt,65,4,49,2019-10-23,0.63,9,300
4,10610,3 Comforts of Cooperative Living,2140,Lois,,Hyde Park,41.79612,-87.59261,Private room,21,1,44,2020-02-14,0.61,5,168


We will look at the data shape to ensure the row number before extracting 100 rows from the dataset.

In [3]:
#Looking at the data shape
df.shape

(6397, 16)

We will create a new dataset from 100 rows of our listings dataset.

In [4]:
#Split the data and create one new csv with random 100 rows
split100 = df.sample(n=100)
split100.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
3233,30284277,"City Flat; Sleeps 6; Steps to Zoo, Near Downtown",142047837,Kate,,Lincoln Park,41.9207,-87.64236,Entire home/apt,154,3,48,2020-03-27,2.4,2,0
1793,20235190,302 Sweet Sensation room Andersonville Free wifi,127235673,Alex,,Edgewater,41.98532,-87.67212,Private room,36,1,37,2020-07-23,0.97,11,351
1518,18123049,Fun & Funky Musician Condo (Entire Home),36370758,Rebecca,,Irving Park,41.96015,-87.71004,Entire home/apt,120,3,16,2019-08-12,0.39,1,0
2792,27398743,"Modern Urban Suite - West Town, Centrally Located",29622436,Dina,,West Town,41.8955,-87.66042,Entire home/apt,79,2,81,2020-08-29,3.11,1,0
2498,25184769,Sonder | 943 Crosby | Lively 1BR,12243051,Sonder,,Near North Side,41.89917,-87.64299,Entire home/apt,84,30,4,2020-09-05,0.2,47,79


We will look at the shape of the split100 to ensure there are 100 rows.

In [5]:
#Looking at the shape
split100.shape

(100, 16)

We will save the split100 to a csv

In [6]:
#Saving a csv
split100.to_csv('split100.csv', index=False)

Now, we will remove the 100 from our dataframe and save the data frame in a "RawData" schema.

In [7]:
#Remove the split100 from the original dataset
df = df.drop(split100.index)

In [8]:
#Ensuring that the 100 rows have been remved
df.shape

(6297, 16)

Saving the remaining of our dataframe into SQL

In [9]:
#Information from PostgreSQL
host = r'localhost' 
db = r'MSDS610' 
user = r'postgres' 
pw = r'82328' 
port = r'5432' 
schema = r'Rawdata' 

In [10]:
#Creating a connection
db_conn = create_engine("postgresql://{}:{}@{}:{}/{}".format(user, pw, host, port, db))

In [11]:
#Listing the tables
sql="select tables.table_name from information_schema.tables where (table_schema ='"+schema+"')order by 1;"
tbl_df = pd.read_sql(sql, db_conn, index_col=None)
tbl_df

Unnamed: 0,table_name


In [12]:
#Table name
table_name= r'listings'

In [13]:
schema = r'RawData' 

df.to_sql(table_name, con=db_conn, if_exists='replace', index=False, schema=schema, chunksize=1000, method='multi')

6297

Dataframe saved in a table in the "raw data" schema.

<img align="left" style="padding-right:15px;" src="rawdata-screenshot.png" width=350><br>