# 4 Pre-Processing and Training Data<a id='4_Pre-Processing_and_Training_Data'></a>

## 4.1 Contents<a id='4.1_Contents'></a>
* [4 Pre-Processing and Training Data](#4_Pre-Processing_and_Training_Data)
  * [4.1 Contents](#4.1_Contents)
  * [4.2 Introduction](#4.2_Introduction)
  * [4.3 Imports](#4.3_Imports)
  * [4.4 Load Data](#4.4_Load_Data)
  * [4.5 Create dummy features for room_type, bathrooms, and home_type](#4.5_Create_dummy_features_for_room_type_,_ bathrooms_,_home_type)
  * [4.6 Standardize numeric features using a scaler](#4.6_Standardize_numeric_features_using_a_scaler)
  * [4.7 Train/Test Split](#4.7_Train/Test_Split)

## 4.2 Introduction<a id='4.2_Introduction'></a>

In preceding notebooks, performed preliminary assessments of data quality and refined the question to be answered. You found a small number of data values that gave clear choices about whether to replace values or drop a whole row. You determined that predicting the price was your primary aim. You threw away records with missing price data, but not before making the most of the other available data to look for any patterns between the regions. You didn't see any and decided to treat all states equally; the region label didn't seem to be particularly useful.

In this notebook you'll start to build machine learning models. Before even starting with learning a machine learning model, however, start by considering how useful the mean value is as a predictor. This is more than just a pedagogical device. You never want to go to stakeholders with a machine learning model only to have the CEO point out that it performs worse than just guessing the average! Your first model is a baseline performance comparitor for any subsequent model. You then build up the process of efficiently and robustly creating and assessing models against it. The development we lay out may be little slower than in the real world, but this step of the capstone is definitely more than just instructional. It is good practice to build up an understanding that the machine learning pipelines you build work as expected. You can validate steps with your own functions for checking expected equivalence between, say, pandas and sklearn implementations.

## 4.3 Imports<a id='4.3_Imports'></a>

In [1]:
import pandas as pd
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import __version__ as sklearn_version
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split, cross_validate, GridSearchCV, learning_curve
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectKBest, f_regression
import datetime
from sklearn import preprocessing

#from library.db_utils import save_file

## 4.4 Load Data<a id='4.4_Load_Data'></a>

In [2]:
airbnb_data_cleaned_v2 = pd.read_csv('airbnb_data_cleaned_v2.csv')
neighbourhood_summary_v2 = pd.read_csv('neighbourhood_summary_v2.csv')
airbnb_data_cleaned_v2.head().T

Unnamed: 0,0,1,2,3,4
Unnamed: 0,0,1,2,3,4
neighbourhood_cleansed,North Park Hill,Hale,Five Points,West Colfax,Sunnyside
Unnamed: 0_x,0,1,3,4,5
id,607000000000000000.0,546000000000000000.0,52429527.0,632000000000000000.0,688000000000000000.0
description,Home in Denver · 1 bedroom · 1 bed · 1 shared ...,Rental unit in Denver · 2 bedrooms · 3 beds · ...,Townhouse in Denver · ★4.78 · 3 bedrooms · 4 b...,Townhouse in Denver · ★New · 2 bedrooms · 2 be...,Home in Denver · ★5.0 · 2 bedrooms · 2 beds · ...
host_id,430149575,169214047,107279139,416194740,133612752
host_name,Roye,Jerrod,Kyle And Kimberly,Clayton,Ryan
host_since,11/2/2021,1/22/2018,12/14/2016,7/31/2021,6/5/2017
host_location,,"Chicago, IL","Denver, CO","Colorado, United States","Denver, CO"
host_response_rate,100%,,100%,100%,


There are some extra columns that are deleted here.

In [3]:
airbnb_data_cleaned_v2 = airbnb_data_cleaned_v2.drop(columns=['Unnamed: 0', 'Unnamed: 0_x', 'Unnamed: 0_y'])
airbnb_data_cleaned_v2.head(10)

Unnamed: 0,neighbourhood_cleansed,id,description,host_id,host_name,host_since,host_location,host_response_rate,host_acceptance_rate,host_neighbourhood,...,reviews_per_month,home_type,neighborhood_cleansed_id_number,neighborhood_cleansed_price,denver_neighborhoods_population,region_area_sq_miles,id_review_scores_rating_ac_region_ratio,id_review_scores_location_ac_region_ratio,id_bedrooms_count_ac_mean_region_ratio,id_number_of_reviews_ac_region_ratio
0,North Park Hill,6.07e+17,Home in Denver · 1 bedroom · 1 bed · 1 shared ...,430149575,Roye,11/2/2021,,100%,75%,Congress Park,...,0.11,Home,73.0,158.577465,9382.0,1.52,0.616353,0.819121,0.376289,0.000303
1,Hale,5.46e+17,Rental unit in Denver · 2 bedrooms · 3 beds · ...,169214047,Jerrod,1/22/2018,"Chicago, IL",,0%,East,...,,Rental,63.0,116.065574,6936.0,0.73,,,1.657895,0.0
2,Five Points,52429530.0,Townhouse in Denver · ★4.78 · 3 bedrooms · 4 b...,107279139,Kyle And Kimberly,12/14/2016,"Denver, CO",100%,100%,South,...,2.52,Townhouse,415.0,174.136585,12712.0,0.96,0.984566,1.02811,1.649899,0.00194
3,West Colfax,6.32e+17,Townhouse in Denver · ★New · 2 bedrooms · 2 be...,416194740,Clayton,7/31/2021,"Colorado, United States",100%,100%,West,...,,Townhouse,226.0,196.633028,9740.0,1.22,,,0.786713,0.0
4,Sunnyside,6.88e+17,Home in Denver · ★5.0 · 2 bedrooms · 2 beds · ...,133612752,Ryan,6/5/2017,"Denver, CO",,91%,Northwest,...,0.99,Home,144.0,189.361702,9726.0,1.32,1.01833,1.004938,0.773842,0.001195
5,Jefferson Park,43316440.0,Townhouse in Denver · ★4.96 · 3 bedrooms · 5 b...,299373263,Dania,10/1/2019,,100%,95%,Jefferson Park,...,0.67,Townhouse,96.0,298.126316,2552.0,0.45,1.026378,1.021534,2.281553,0.004892
6,Chaffee Park,53892390.0,Guesthouse in Denver · ★4.97 · 1 bedroom · 1 b...,436546995,Conor,12/18/2021,"Denver, CO",100%,100%,Chaffee Park,...,2.61,Guesthouse,42.0,172.0,3874.0,0.87,1.010487,1.018397,0.428571,0.027089
7,Five Points,8.42e+17,Home in Denver · ★4.88 · 3 bedrooms · 6 beds ·...,456393682,Jasmine,4/27/2022,"Denver, CO",100%,98%,Five Points,...,2.14,Home,415.0,174.136585,12712.0,0.96,1.005164,1.030195,2.474849,0.000456
8,Sunnyside,7.18e+17,Home in Denver · 2 bedrooms · 2 beds · 2 baths,110328442,David,1/7/2017,"Denver, CO",,,Northwest,...,,Home,144.0,189.361702,9726.0,1.32,,,0.773842,0.0
9,West Colfax,9.92e+17,Rental unit in Denver · Studio · 1 bed · 1 bath,263502162,Landing,5/22/2019,"San Francisco, CA",91%,98%,Five Points South,...,,Rental,226.0,196.633028,9740.0,1.22,,,0.393357,0.0


In [4]:
airbnb_data_cleaned_v2.head().T

Unnamed: 0,0,1,2,3,4
neighbourhood_cleansed,North Park Hill,Hale,Five Points,West Colfax,Sunnyside
id,607000000000000000.0,546000000000000000.0,52429527.0,632000000000000000.0,688000000000000000.0
description,Home in Denver · 1 bedroom · 1 bed · 1 shared ...,Rental unit in Denver · 2 bedrooms · 3 beds · ...,Townhouse in Denver · ★4.78 · 3 bedrooms · 4 b...,Townhouse in Denver · ★New · 2 bedrooms · 2 be...,Home in Denver · ★5.0 · 2 bedrooms · 2 beds · ...
host_id,430149575,169214047,107279139,416194740,133612752
host_name,Roye,Jerrod,Kyle And Kimberly,Clayton,Ryan
host_since,11/2/2021,1/22/2018,12/14/2016,7/31/2021,6/5/2017
host_location,,"Chicago, IL","Denver, CO","Colorado, United States","Denver, CO"
host_response_rate,100%,,100%,100%,
host_acceptance_rate,75%,0%,100%,100%,91%
host_neighbourhood,Congress Park,East,South,West,Northwest


In [5]:
neighbourhood_summary_v2 = neighbourhood_summary_v2.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'])
neighbourhood_summary_v2.head()

Unnamed: 0,neighbourhood_cleansed,neighborhood_cleansed_id_number,neighborhood_cleansed_review_scores_rating,neighborhood_cleansed_review_scores_location,neighborhood_cleansed_bedrooms,neighborhood_cleansed_number_of_reviews,neighborhood_cleansed_price,denver_neighborhoods_population,region_area_sq_miles
0,Athmar Park,55,4.832766,4.713191,2.444444,1952,142.472727,8898,1.53
1,Auraria,4,4.996667,4.996667,1.75,90,163.75,705,0.32
2,Baker,118,4.815963,4.84633,1.846154,6018,122.119658,4879,1.26
3,Barnum,34,4.769259,4.658148,2.176471,1043,121.264706,6111,1.47
4,Barnum West,26,4.834545,4.713182,2.115385,1330,115.230769,5376,0.74


In [6]:
airbnb_data_cleaned_v2.shape

(4889, 41)

In [7]:
airbnb_data_cleaned_v2.columns

Index(['neighbourhood_cleansed', 'id', 'description', 'host_id', 'host_name',
       'host_since', 'host_location', 'host_response_rate',
       'host_acceptance_rate', 'host_neighbourhood', 'latitude', 'longitude',
       'property_type', 'room_type', 'bathrooms', 'bedrooms', 'price',
       'minimum_minimum_nights', 'maximum_maximum_nights',
       'calendar_last_scraped', 'number_of_reviews', 'first_review',
       'last_review', 'last_scraped', 'review_scores_rating',
       'review_scores_accuracy', 'review_scores_cleanliness',
       'review_scores_checkin', 'review_scores_communication',
       'review_scores_location', 'review_scores_value', 'reviews_per_month',
       'home_type', 'neighborhood_cleansed_id_number',
       'neighborhood_cleansed_price', 'denver_neighborhoods_population',
       'region_area_sq_miles', 'id_review_scores_rating_ac_region_ratio',
       'id_review_scores_location_ac_region_ratio',
       'id_bedrooms_count_ac_mean_region_ratio',
       'id_number_

## 4.5 Create dummy features for room_type, bathrooms, and home_type<a id='4.5_Create_dummy_features_for_room_type,_ bathrooms,_home_type'></a>

In [8]:
scaled_airbnb_data_cleaned_v2 = airbnb_data_cleaned_v2

In [9]:
scaled_airbnb_data_cleaned_v2.drop(columns=['host_name', 'host_since', 'host_location', 'host_response_rate',
                                                     'host_acceptance_rate', 'property_type', 'minimum_minimum_nights', 
                                                              'maximum_maximum_nights', 'calendar_last_scraped','first_review',
                                                              'last_review', 'host_neighbourhood']
                                                     , inplace=True)

In [10]:
scaled_airbnb_data_cleaned_v2['last_scraped'] = pd.to_datetime(airbnb_data_cleaned_v2['last_scraped'])

In [11]:
scaled_airbnb_data_cleaned_v2.dtypes

neighbourhood_cleansed                               object
id                                                  float64
description                                          object
host_id                                               int64
latitude                                            float64
longitude                                           float64
room_type                                            object
bathrooms                                            object
bedrooms                                            float64
price                                               float64
number_of_reviews                                     int64
last_scraped                                 datetime64[ns]
review_scores_rating                                float64
review_scores_accuracy                              float64
review_scores_cleanliness                           float64
review_scores_checkin                               float64
review_scores_communication             

In [12]:
scaled_airbnb_data_cleaned_v2['room_type'].unique()

array(['Private room', 'Entire home/apt', 'Hotel room', 'Shared room'],
      dtype=object)

In [13]:
scaled_airbnb_data_cleaned_v2 = pd.get_dummies(scaled_airbnb_data_cleaned_v2, columns=['room_type'], prefix='room')

In [14]:
scaled_airbnb_data_cleaned_v2['bathrooms'].unique()

array(['1 shared bath', '2 baths', '2.5 baths', '1 bath', '4 baths',
       '3.5 baths', '1 private bath', '1.5 baths', '3 baths',
       '2 shared baths', '4.5 baths', '6 baths', '5 baths',
       '1.5 shared baths', '5.5 baths', '3 shared baths',
       '4 shared baths', '17 shared baths', '2.5 shared baths',
       '6.5 baths', '0 baths', 'Half-bath', nan], dtype=object)

In [15]:
scaled_airbnb_data_cleaned_v2 = pd.get_dummies(scaled_airbnb_data_cleaned_v2, columns=['bathrooms'], prefix='bath')

In [16]:
scaled_airbnb_data_cleaned_v2['home_type'].unique()

array(['Home', 'Rental', 'Townhouse', 'Guesthouse', 'Condo', 'Bungalow',
       'Guest', 'Boutique', 'Hostel', 'Bed', 'Loft', 'casa', 'Hotel',
       'Aparthotel', 'Place', 'Serviced', 'Tiny', 'Cottage', 'Tent',
       'Vacation', 'Villa', 'Castle', 'Camper/RV'], dtype=object)

In [17]:
scaled_airbnb_data_cleaned_v2 = pd.get_dummies(scaled_airbnb_data_cleaned_v2, columns=['home_type'], prefix='home')

In [18]:
scaled_airbnb_data_cleaned_v2.columns

Index(['neighbourhood_cleansed', 'id', 'description', 'host_id', 'latitude',
       'longitude', 'bedrooms', 'price', 'number_of_reviews', 'last_scraped',
       'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'reviews_per_month',
       'neighborhood_cleansed_id_number', 'neighborhood_cleansed_price',
       'denver_neighborhoods_population', 'region_area_sq_miles',
       'id_review_scores_rating_ac_region_ratio',
       'id_review_scores_location_ac_region_ratio',
       'id_bedrooms_count_ac_mean_region_ratio',
       'id_number_of_reviews_ac_region_ratio', 'room_Entire home/apt',
       'room_Hotel room', 'room_Private room', 'room_Shared room',
       'bath_0 baths', 'bath_1 bath', 'bath_1 private bath',
       'bath_1 shared bath', 'bath_1.5 baths', 'bath_1.5 shared baths',
       'bath_17 shared baths', 'bath_2 baths', 'bat

In [19]:
airbnb_data_cleaned_v2.shape

(4889, 29)

In [20]:
name_list = ['id', 'neighbourhood_cleansed', 'description', 'host_id', 'last_scraped']

In [21]:
scaled_airbnb_data_cleaned_v2.drop(columns=name_list, inplace=True)

In [22]:
scaled_airbnb_data_cleaned_v2.head()

Unnamed: 0,latitude,longitude,bedrooms,price,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,...,home_Loft,home_Place,home_Rental,home_Serviced,home_Tent,home_Tiny,home_Townhouse,home_Vacation,home_Villa,home_casa
0,39.76039,-104.92968,1.0,35.0,2,3.0,2.5,2.5,3.0,3.5,...,0,0,0,0,0,0,0,0,0,0
1,39.72785,-104.93783,3.0,149.0,0,,,,,,...,0,0,1,0,0,0,0,0,0,0
2,39.75852,-104.98846,4.0,190.0,68,4.78,4.88,4.62,4.78,4.78,...,0,0,0,0,0,0,1,0,0,0
3,39.736019,-105.05072,2.0,87.0,0,,,,,,...,0,0,0,0,0,0,1,0,0,0
4,39.77143,-105.02028,2.0,300.0,12,5.0,5.0,5.0,5.0,5.0,...,0,0,0,0,0,0,0,0,0,0


## 4.6 Standardize numeric features using a scaler<a id='4.6_Standardize_numeric_features_using_a_scaler'></a>

Making a Scaler object

In [23]:
scaler = preprocessing.StandardScaler()

Fitting data to the scaler object

In [24]:
scaled_df = scaler.fit_transform(scaled_airbnb_data_cleaned_v2)

In [25]:
scaled_df = pd.DataFrame(scaled_df)

In [26]:
scaled_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,60,61,62,63,64,65,66,67,68,69
0,0.585269,0.790847,-0.772870,-0.137443,-0.543745,-5.376181,-7.371623,-6.532121,-6.155251,-4.325604,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,-0.322661,-0.024779,-0.02023,-0.02023
1,-0.451397,0.654544,0.494931,-0.046506,-0.562495,,,,,,...,-0.129796,-0.035054,1.987053,-0.053589,-0.014303,-0.040485,-0.322661,-0.024779,-0.02023,-0.02023
2,0.525694,-0.192206,1.128832,-0.013801,0.074997,-0.186709,0.049536,-0.588628,-0.380996,-0.367341,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,3.099227,-0.024779,-0.02023,-0.02023
3,-0.191147,-1.233460,-0.138970,-0.095963,-0.562495,,,,,,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,3.099227,-0.024779,-0.02023,-0.02023
4,0.936983,-0.724373,-0.138970,0.073946,-0.449996,0.454686,0.423712,0.476715,0.332675,0.312985,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,-0.322661,-0.024779,-0.02023,-0.02023
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4884,1.022363,-0.679719,-0.138970,-0.103143,-0.515620,0.454686,0.423712,0.476715,0.332675,0.312985,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,-0.322661,-0.024779,-0.02023,-0.02023
4885,0.321483,-0.296399,-0.138970,-0.065651,-0.562495,,,,,,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,-0.322661,-0.024779,-0.02023,-0.02023
4886,0.328173,-0.443238,-0.138970,0.073946,-0.562495,,,,,,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,-0.322661,-0.024779,-0.02023,-0.02023
4887,1.969826,1.658503,-0.772870,-0.057674,-0.534370,0.454686,0.423712,0.476715,0.332675,0.312985,...,-0.129796,-0.035054,-0.503258,-0.053589,-0.014303,-0.040485,-0.322661,-0.024779,-0.02023,-0.02023


## 4.7 Train/Test Split<a id='4.6_Train/Test_Split'></a>

So far, you've treated airbnb data as a single entity. In machine learning, when you train your model on all of your data, you end up with no data set aside to evaluate model performance. You could keep making more and more complex models that fit the data better and better and not realise you were overfitting to that one set of samples. By partitioning the data into training and testing splits, without letting a model (or missing-value imputation) learn anything about the test split, you have a somewhat independent assessment of how your model might perform in the future. An often overlooked subtlety here is that people all too frequently use the test set to assess model performance _and then compare multiple models to pick the best_. This means their overall model selection process is  fitting to one specific data set, now the test split. You could keep going, trying to get better and better performance on that one data set, but that's  where cross-validation becomes especially useful. While training models, a test split is very useful as a final check on expected future performance.

What partition sizes would you have with a 80/20 train/test split?

In [27]:
len(scaled_airbnb_data_cleaned_v2) * .8, len(scaled_airbnb_data_cleaned_v2) * .2

(3911.2000000000003, 977.8000000000001)

In [28]:
X_train, X_test, y_train, y_test = train_test_split(scaled_airbnb_data_cleaned_v2.drop(columns='price'), 
                                                    scaled_airbnb_data_cleaned_v2.price, test_size=0.2, 
                                                    random_state=47)

In [29]:
X_train.shape, X_test.shape

((3911, 69), (978, 69))

In [30]:
y_train.shape, y_test.shape

((3911,), (978,))

In [31]:
scaled_airbnb_data_cleaned_v2.columns

Index(['latitude', 'longitude', 'bedrooms', 'price', 'number_of_reviews',
       'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'reviews_per_month',
       'neighborhood_cleansed_id_number', 'neighborhood_cleansed_price',
       'denver_neighborhoods_population', 'region_area_sq_miles',
       'id_review_scores_rating_ac_region_ratio',
       'id_review_scores_location_ac_region_ratio',
       'id_bedrooms_count_ac_mean_region_ratio',
       'id_number_of_reviews_ac_region_ratio', 'room_Entire home/apt',
       'room_Hotel room', 'room_Private room', 'room_Shared room',
       'bath_0 baths', 'bath_1 bath', 'bath_1 private bath',
       'bath_1 shared bath', 'bath_1.5 baths', 'bath_1.5 shared baths',
       'bath_17 shared baths', 'bath_2 baths', 'bath_2 shared baths',
       'bath_2.5 baths', 'bath_2.5 shared baths', 'bath_3 bath

In [32]:
#Check the `dtypes` attribute of `X_train` to verify all features are numeric
X_train.dtypes.unique()

array([dtype('float64'), dtype('int64'), dtype('uint8')], dtype=object)

In [33]:
#Code task 3#
#Repeat this check for the test split in `X_test`
X_test.dtypes.unique()

array([dtype('float64'), dtype('int64'), dtype('uint8')], dtype=object)

In [34]:
airbnb_data_cleaned_v2.head()

Unnamed: 0,neighbourhood_cleansed,id,description,host_id,latitude,longitude,room_type,bathrooms,bedrooms,price,...,reviews_per_month,home_type,neighborhood_cleansed_id_number,neighborhood_cleansed_price,denver_neighborhoods_population,region_area_sq_miles,id_review_scores_rating_ac_region_ratio,id_review_scores_location_ac_region_ratio,id_bedrooms_count_ac_mean_region_ratio,id_number_of_reviews_ac_region_ratio
0,North Park Hill,6.07e+17,Home in Denver · 1 bedroom · 1 bed · 1 shared ...,430149575,39.76039,-104.92968,Private room,1 shared bath,1.0,35.0,...,0.11,Home,73.0,158.577465,9382.0,1.52,0.616353,0.819121,0.376289,0.000303
1,Hale,5.46e+17,Rental unit in Denver · 2 bedrooms · 3 beds · ...,169214047,39.72785,-104.93783,Entire home/apt,2 baths,3.0,149.0,...,,Rental,63.0,116.065574,6936.0,0.73,,,1.657895,0.0
2,Five Points,52429530.0,Townhouse in Denver · ★4.78 · 3 bedrooms · 4 b...,107279139,39.75852,-104.98846,Entire home/apt,2.5 baths,4.0,190.0,...,2.52,Townhouse,415.0,174.136585,12712.0,0.96,0.984566,1.02811,1.649899,0.00194
3,West Colfax,6.32e+17,Townhouse in Denver · ★New · 2 bedrooms · 2 be...,416194740,39.736019,-105.05072,Entire home/apt,2.5 baths,2.0,87.0,...,,Townhouse,226.0,196.633028,9740.0,1.22,,,0.786713,0.0
4,Sunnyside,6.88e+17,Home in Denver · ★5.0 · 2 bedrooms · 2 beds · ...,133612752,39.77143,-105.02028,Entire home/apt,1 bath,2.0,300.0,...,0.99,Home,144.0,189.361702,9726.0,1.32,1.01833,1.004938,0.773842,0.001195


In [35]:
scaled_airbnb_data_cleaned_v2.to_csv('scaled_airbnb_data_cleaned_v2.csv')
airbnb_data_cleaned_v2.to_csv('airbnb_data_cleaned_v22.csv')

In [36]:
airbnb_data_cleaned_v2.head()

Unnamed: 0,neighbourhood_cleansed,id,description,host_id,latitude,longitude,room_type,bathrooms,bedrooms,price,...,reviews_per_month,home_type,neighborhood_cleansed_id_number,neighborhood_cleansed_price,denver_neighborhoods_population,region_area_sq_miles,id_review_scores_rating_ac_region_ratio,id_review_scores_location_ac_region_ratio,id_bedrooms_count_ac_mean_region_ratio,id_number_of_reviews_ac_region_ratio
0,North Park Hill,6.07e+17,Home in Denver · 1 bedroom · 1 bed · 1 shared ...,430149575,39.76039,-104.92968,Private room,1 shared bath,1.0,35.0,...,0.11,Home,73.0,158.577465,9382.0,1.52,0.616353,0.819121,0.376289,0.000303
1,Hale,5.46e+17,Rental unit in Denver · 2 bedrooms · 3 beds · ...,169214047,39.72785,-104.93783,Entire home/apt,2 baths,3.0,149.0,...,,Rental,63.0,116.065574,6936.0,0.73,,,1.657895,0.0
2,Five Points,52429530.0,Townhouse in Denver · ★4.78 · 3 bedrooms · 4 b...,107279139,39.75852,-104.98846,Entire home/apt,2.5 baths,4.0,190.0,...,2.52,Townhouse,415.0,174.136585,12712.0,0.96,0.984566,1.02811,1.649899,0.00194
3,West Colfax,6.32e+17,Townhouse in Denver · ★New · 2 bedrooms · 2 be...,416194740,39.736019,-105.05072,Entire home/apt,2.5 baths,2.0,87.0,...,,Townhouse,226.0,196.633028,9740.0,1.22,,,0.786713,0.0
4,Sunnyside,6.88e+17,Home in Denver · ★5.0 · 2 bedrooms · 2 beds · ...,133612752,39.77143,-105.02028,Entire home/apt,1 bath,2.0,300.0,...,0.99,Home,144.0,189.361702,9726.0,1.32,1.01833,1.004938,0.773842,0.001195
