# Classification Modeling - Restaurants

Fit and cross-validation binary classification models which predict usefulness for restaurant reviews. We choose a logistic regression model that classifies reviews as useful with 82% accuracy.

## Import modules

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
import pickle

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC

## Load features and target

In [2]:
features = np.load('../data/rests_train_features.npy')
target = np.load('../data/rests_target.npy')

## Train/test split and scale data

In [3]:
X_train, X_test, y_train, y_test = train_test_split(features, target)

ss = StandardScaler()

X_train = ss.fit_transform(X_train)

X_test = ss.transform(X_test)



## Check baseline accuracy

In [4]:
target.mean()

0.18264464754800014

## Logistic Regression

In [5]:
lr = GridSearchCV(LogisticRegression(), param_grid={'random_state': [32], 
                                                    'C': [1e-4, 1e-3, 1e-2, 1e-1],
                                                    'solver': ['saga'],
                                                    'penalty': ['l2'],
                                                    'n_jobs': [-1],
                                                    'verbose': [1]})

In [None]:
lr.fit(X_train, y_train)

convergence after 20 epochs took 85 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  1.4min finished


convergence after 38 epochs took 159 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  2.6min finished


convergence after 38 epochs took 159 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  2.6min finished


convergence after 51 epochs took 218 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  3.6min finished


max_iter reached after 425 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  7.1min finished


max_iter reached after 419 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  7.0min finished


convergence after 75 epochs took 314 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  5.2min finished


max_iter reached after 418 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  7.0min finished


max_iter reached after 419 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  7.0min finished


convergence after 79 epochs took 331 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  5.5min finished


max_iter reached after 418 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  7.0min finished


max_iter reached after 418 seconds


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  7.0min finished


In [None]:
lr.score(X_train, y_train), lr.score(X_test, y_test)