#### Model Accuracy Testing

This file is for the accuracy testing of our machine learning model (XGBoost). We use a variety of methods to evaluate this.

In [12]:
from backend.dec_tree_trainer import train_dec_tree
from sklearn.metrics import mean_absolute_error, mean_squared_error, precision_score, recall_score, roc_auc_score, accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import xgboost as xgb
import category_encoders as ce
from sklearn.preprocessing import OneHotEncoder


In [14]:
rest_data_path = 'yelp/restaurantsTest.csv'
user_data_path = 'yelp/scrapedUsers.csv'

user_raw_data = pd.read_csv(user_data_path)
rest_raw_data = pd.read_csv(rest_data_path)

total_input = pd.merge(user_raw_data, rest_raw_data, on='rest_id')
total_input_cleaned = total_input.drop(['name', 'rest_id', 'website', 'going'], axis='columns')
output = total_input['going']

total_input_cleaned.to_csv('accuracyTest.csv')

# split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(total_input_cleaned, output, test_size=0.2, random_state=42)

# encode categorical features using one-hot encoding
onehot_encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
X_train_encoded = onehot_encoder.fit_transform(X_train)
X_test_encoded = onehot_encoder.transform(X_test)

# create and fit XGBoost model on training data
dec_tree = xgb.XGBClassifier()
dec_tree.fit(X_train_encoded, y_train)

# make predictions on testing data and evaluate accuracy
y_pred = dec_tree.predict(X_test_encoded)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

# print accuracy metrics
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("AUC:", auc)
print("MAE:", mae)
print("RMSE:", rmse)


Accuracy: 0.9170638703527169
Precision: 0.9240414757656137
Recall: 0.9914618369987064
AUC: 0.5199001027893835
MAE: 0.08293612964728313
RMSE: 0.2879863358690532


Accuracy: The percentage of correct predictions made by the model out of all the predictions made. 

Precision: The percentage of correct positive predictions made by the model out of all positive predictions made.

Recall: The percentage of correctly predicted positive instances out of all actual positive instances.

AUC: The Area Under the Receiver Operating Characteristic Curve (ROC AUC) measures the ability of the model to distinguish between positive and negative instances.

MAE: Mean Absolute Error measures the average absolute difference between the actual and predicted values. 

RMSE: Root Mean Squared Error measures the square root of the average of the squared differences between the actual and predicted values.