# Capstone: Airbnb Price Listing Prediction
## Part 5 Production Model

_Authors: Evonne Tham_

___Example___

_In this notebook, I will calculate the r2 score and RMSE from the production model on the train set. I will then fit the same model on the test dataset to obtain the predicted sale prices, which will be used for the Kaggle submission. I will also state the insights gleaned from the model and subsequent business recommendations_

## Contents of this notebook
1. [Import Necessary Libraries and Load Data](#1.-Import-Necessary-Libraries-and-Load-Data)
2. [](#2.-)
3. [](#3.-)
4. [](#4.-) 


## 1. Import Necessary Libraries & Load Data </span>

In [None]:
import pandas as pd
import numpy as np
from geopy.distance import great_circle
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# modelling
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import LinearRegression, ElasticNetCV
from sklearn.svm import SVR
from xgboost import XGBRegressor

#Hide warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load in Data 
df = pd.read_csv('../datasets/df_dummies.csv')
print(f"Total Number of Listing: {df.shape[0]} | Total Number of Features: {df.shape[1]}")
df.head().T

---
##  2. Model Prep

In [None]:
# Create X and y variables
features = [col for col in train._get_numeric_data().columns 
            if col != 'price' 
            and col != 'log_price' 
            and col != 'id' 
            and col != 'host_id']

X = train[features]
y = train['price']

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size = 0.25, 
                                                    random_state = 42) 

# Scale 
rs = RobustScaler()
X_train_rs = rs.fit_transform(X_train)
X_test_rs = rs.transform(X_test)

# Instantiate Best Model
xgb = XGBRegressor(gamma = 0,
                   learning_rate = 0.05, 
                   max_depth = 5, 
                   n_estimators = 1000, 
                   subsample = 0.5)

# Fit Model
xgb.fit(X_train_rs, y_train)

# Predict
y_pred_train = xgb.predict(X_train_rs)
y_pred_test = xgb.predict(X_test_rs)

# Model Evaluation
r2 = r2_score(y_test, y_pred_test)
print(f"r2: {round(r2, 4)}")

mse = mean_squared_error(y_val, y_pred)

rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))
print(f"RMSE: {round(RMSE, 4)}")


--- 
## 3. Model Evaluation

The evaluation metrics used will be mean squared error (for loss) and r squared (for accuracy).