## Comparison
Each of 5 ML algorithms (logistic regression, support vector machines, multilayer perceptron, random forest, boosted trees) will be compared on the following criteria:
1. Problem Type
2. Train Speed
3. Predict Speed
4. Interpretability
5. Performance
6. Performance with Limited Data

### Read in Data

In [1]:
import joblib
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score
from time import time

val_features = pd.read_csv('output/val_features.csv')
val_labels = pd.read_csv('output/val_labels.csv')

te_features = pd.read_csv('output/test_features.csv')
te_labels = pd.read_csv('output/test_labels.csv')

### Read in Models 

In [2]:
models = {}

for mdl in ['lr', 'svm', 'mlp', 'rf', 'gb']:
    models[mdl] = joblib.load('output/{}_model.pkl'.format(mdl))



In [3]:
def evaluate_model(name, model, features, labels):
    start = time()
    pred = model.predict(features)
    end = time()
    accuracy = round(accuracy_score(labels, pred), 3)
    precision = round(precision_score(labels, pred), 3)
    recall = round(recall_score(labels, pred), 3)
    print('{}\nAccuracy: {}\nPrecision: {}\nRecall: {}\nLatency: {}ms\n\n'.format(name,
                                                                              accuracy,
                                                                              precision,
                                                                              recall,
                                                                              round((end-start),3)))

In [4]:
 for name, mdl in models.items():
        evaluate_model(name, mdl, val_features, val_labels)

lr
Accuracy: 0.851
Precision: 0.833
Recall: 0.814
Latency: 0.002ms


svm
Accuracy: 0.807
Precision: 0.831
Recall: 0.686
Latency: 0.01ms


mlp
Accuracy: 0.832
Precision: 0.825
Recall: 0.767
Latency: 0.002ms


rf
Accuracy: 0.832
Precision: 0.861
Recall: 0.721
Latency: 0.07ms


gb
Accuracy: 0.837
Precision: 0.844
Recall: 0.756
Latency: 0.004ms




In [5]:
evaluate_model('Logistic Regression', models['lr'], te_features, te_labels)

Logistic Regression
Accuracy: 0.866
Precision: 0.848
Recall: 0.767
Latency: 0.004ms


