# Classification Modeling - Restaurants

Fit and cross-validation binary classification models which predict usefulness for restaurant reviews. We choose a logistic regression model that classifies restaurant reviews with 89% accuracy. In an effort to generalize the classification model to a wider business space, I demonstrate predictions and evaluation only for the model trained on the non-restaurant businesses.

## Import modules

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
import pickle

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC

## Load features and target

In [2]:
features = np.load('../data/rests_train_features.npy')
target = np.load('../data/rests_target.npy')

## Train/test split and scale data

In [3]:
X_train, X_test, y_train, y_test = train_test_split(features, target)

ss = StandardScaler()

X_train = ss.fit_transform(X_train)

X_test = ss.transform(X_test)



## Check baseline accuracy

In [4]:
target.mean()

0.18264464754800014

## Logistic Regression

In [5]:
lr = GridSearchCV(LogisticRegression(), param_grid={'random_state': [32], 
                                                    'C': [1e-4, 1e-3, 1e-2, 1e-1],
                                                    'solver': ['saga'],
                                                    'penalty': ['l2'],
                                                    'n_jobs': [-1],
                                                    'verbose': [1]})

In [None]:
lr.fit(X_train, y_train)

In [7]:
lr.score(X_train, y_train), lr.score(X_test, y_test)

(0.89579221852302415, 0.8965074982203125)

In [4]:
lr.best_params_

{'C': 0.1,
 'n_jobs': -1,
 'penalty': 'l2',
 'random_state': 32,
 'solver': 'saga',
 'verbose': 1}

In [10]:
with open('../models/logisticreg_rests.pkl', 'wb') as model:
    pickle.dump(lr, model)