# Overview

The following notebook supports a study of classifier methods used in contemporary machine learning. This specific experiment provides python implementations as a basis for comparisons of performance using the same dataset and evaluation criteria. This is by no means exhaustive, but illustrative in the differences between each method, and informative in which perform better on different types of data.

# Data

The data being used for this experiment is part of the sci-kit package, the 20 newsgroups dataset that contains 18000 newsgroups posts on 20 topics. Like the MNIST dataset it comes cleansed, labeled, and packaged with sci-kit. It also has a vectorized set available which include features that are ready to use with classifiers.

In [1]:
#Examine categories
from sklearn.datasets import fetch_20newsgroups
textdata = fetch_20newsgroups(subset='train')
print(list(textdata.target_names))

['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']


For convenience scikit has a dataset that is already vectorized. It is leveraged throughout this experiment. 

In [2]:
from sklearn.datasets import fetch_20newsgroups_vectorized
data = fetch_20newsgroups_vectorized('all')

In [3]:
from sklearn.model_selection import train_test_split

In [4]:
newsdata = data
X = newsdata.data
y = newsdata.target

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y, test_size=0.1)

In [6]:
X_train.dtype

dtype('float64')

In [7]:
X_train

<16961x130107 sparse matrix of type '<class 'numpy.float64'>'
	with 2602382 stored elements in Compressed Sparse Row format>

In [8]:
X_train.shape

(16961, 130107)

In [9]:
y_train.dtype

dtype('int64')

In [10]:
y_train.shape

(16961,)

# Logistic Regression

In [6]:
from sklearn.linear_model import LogisticRegression
logregress = LogisticRegression(solver = 'lbfgs')

In [7]:
logregress.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='lbfgs', tol=0.0001,
          verbose=0, warm_start=False)

In [8]:
logregress_predictions = logregress.predict(X_test)

In [9]:
logregress_score = logregress.score(X_test, y_test)
print (logregress_score)

0.820689655172


In [10]:
#attempt using a different solver algo
logregress2 = LogisticRegression(solver = 'sag')

In [11]:
logregress2.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='sag', tol=0.0001,
          verbose=0, warm_start=False)

In [12]:
logregress_predictions2 = logregress2.predict(X_test)

In [13]:
logregress_score2 = logregress2.score(X_test, y_test)
print (logregress_score2)

0.820689655172


In [14]:
#Additional performance measures
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
print(classification_report(y_test, logregress_predictions))

             precision    recall  f1-score   support

          0       0.86      0.85      0.86        80
          1       0.71      0.81      0.76        97
          2       0.81      0.80      0.80        99
          3       0.76      0.66      0.71        98
          4       0.86      0.78      0.82        96
          5       0.75      0.80      0.77        99
          6       0.72      0.86      0.78        98
          7       0.82      0.93      0.87        99
          8       0.91      0.86      0.89       100
          9       0.84      0.88      0.86        99
         10       0.91      0.92      0.92       100
         11       0.90      0.85      0.88        99
         12       0.83      0.84      0.83        98
         13       0.78      0.80      0.79        99
         14       0.87      0.89      0.88        99
         15       0.77      0.88      0.82       100
         16       0.82      0.88      0.85        91
         17       0.86      0.87      0.87   

In [15]:
#Confusion matrix. Higher values along the diagonal are better
print("Confusion matrix:\n%s" % confusion_matrix(y_test, logregress_predictions))

Confusion matrix:
[[68  0  0  0  0  0  0  0  0  1  0  1  0  1  0  4  0  2  0  3]
 [ 0 79  3  1  0  6  2  0  0  0  0  0  1  1  1  1  0  1  0  1]
 [ 0  5 79  4  1  5  2  0  1  0  0  0  1  0  1  0  0  0  0  0]
 [ 0  4  8 65  4  1  4  1  1  0  1  1  3  2  1  0  0  1  1  0]
 [ 0  3  1  8 75  1  3  0  0  0  0  0  2  0  0  1  1  1  0  0]
 [ 0  6  7  3  1 79  0  1  1  1  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  1  1  1 84  3  0  2  1  2  1  1  0  1  0  0  0  0]
 [ 0  1  0  0  0  0  2 92  0  0  0  0  1  2  1  0  0  0  0  0]
 [ 0  0  0  0  1  0  3  6 86  1  0  0  0  0  1  0  1  0  1  0]
 [ 0  1  0  1  0  1  2  0  0 87  4  0  0  3  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  4  0  0  3 92  0  0  0  0  0  0  0  1  0]
 [ 0  2  0  0  1  4  2  0  1  2  1 84  0  0  0  0  1  1  0  0]
 [ 0  5  0  2  1  1  2  4  0  0  0  0 82  0  1  0  0  0  0  0]
 [ 0  1  0  0  0  0  3  3  2  1  0  1  1 79  3  1  1  2  1  0]
 [ 1  1  0  0  1  2  2  0  0  0  0  0  0  3 88  1  0  0  0  0]
 [ 1  3  0  0  1  3  0  0  0  0  1  0

# Stochastic Gradient Descent

In [21]:
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train)

SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hinge', n_iter=5, n_jobs=1,
       penalty='l2', power_t=0.5, random_state=42, shuffle=True, verbose=0,
       warm_start=False)

In [22]:
sgd_predictions = sgd_clf.predict(X_test)

In [23]:
sgd_score = sgd_clf.score(X_test, y_test)
print(sgd_score)

0.872148541114


In [24]:
print(classification_report(y_test, sgd_predictions))

             precision    recall  f1-score   support

          0       0.87      0.85      0.86        80
          1       0.76      0.91      0.83        97
          2       0.82      0.87      0.84        99
          3       0.83      0.73      0.78        98
          4       0.88      0.84      0.86        96
          5       0.95      0.78      0.86        99
          6       0.77      0.91      0.84        98
          7       0.91      0.97      0.94        99
          8       0.97      0.92      0.94       100
          9       0.87      0.98      0.92        99
         10       0.98      0.98      0.98       100
         11       0.95      0.90      0.92        99
         12       0.92      0.82      0.86        98
         13       0.89      0.90      0.89        99
         14       0.89      0.94      0.92        99
         15       0.80      0.89      0.84       100
         16       0.89      0.92      0.91        91
         17       0.78      0.97      0.87   

In [25]:
print("Confusion matrix:\n%s" % confusion_matrix(y_test, sgd_predictions))

Confusion matrix:
[[68  0  0  0  0  0  0  0  0  1  0  1  0  0  0  3  0  2  0  5]
 [ 0 88  2  1  0  2  2  0  0  1  0  0  0  0  0  0  0  1  0  0]
 [ 0  5 86  2  2  1  2  0  0  0  0  0  0  0  1  0  0  0  0  0]
 [ 1  4  5 72  5  0  5  0  1  0  0  0  2  0  2  0  0  1  0  0]
 [ 0  3  2  4 81  0  3  0  0  0  0  0  1  0  0  0  0  2  0  0]
 [ 0  4  8  4  1 77  0  0  0  2  0  0  0  0  1  0  0  2  0  0]
 [ 0  0  1  2  0  0 89  1  0  1  0  1  2  0  1  0  0  0  0  0]
 [ 0  2  0  0  0  0  0 96  1  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  2  4 92  2  0  0  0  0  0  0  0  0  0  0]
 [ 0  1  0  0  0  0  0  0  0 97  1  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  1  0  0  1 98  0  0  0  0  0  0  0  0  0]
 [ 0  2  0  0  0  0  3  1  1  0  0 89  0  1  0  1  0  1  0  0]
 [ 0  3  0  2  2  0  4  3  0  0  0  1 80  1  2  0  0  0  0  0]
 [ 0  0  0  0  0  0  1  1  0  1  0  0  0 89  1  1  0  4  1  0]
 [ 1  2  0  0  1  0  1  0  0  0  0  0  0  1 93  0  0  0  0  0]
 [ 3  2  0  0  0  1  1  0  0  1  0  0

# Scalable Vector Machine

In [26]:
from sklearn.svm import LinearSVC

scalable_vec_machine = LinearSVC(C=1, loss="hinge")
scalable_vec_machine.fit(X_train, y_train)

LinearSVC(C=1, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='hinge', max_iter=1000, multi_class='ovr',
     penalty='l2', random_state=None, tol=0.0001, verbose=0)

In [27]:
svm_predictions = scalable_vec_machine.predict(X_test)

In [28]:
svn_score = scalable_vec_machine.score(X_test, y_test)
print(svn_score)

0.893899204244


In [29]:
print(classification_report(y_test, svm_predictions))

             precision    recall  f1-score   support

          0       0.84      0.88      0.86        80
          1       0.81      0.88      0.84        97
          2       0.83      0.87      0.85        99
          3       0.81      0.81      0.81        98
          4       0.93      0.86      0.90        96
          5       0.92      0.88      0.90        99
          6       0.84      0.89      0.87        98
          7       0.92      0.98      0.95        99
          8       0.97      0.95      0.96       100
          9       0.92      0.99      0.95        99
         10       0.96      0.99      0.98       100
         11       0.96      0.92      0.94        99
         12       0.91      0.86      0.88        98
         13       0.93      0.89      0.91        99
         14       0.93      0.94      0.93        99
         15       0.78      0.92      0.84       100
         16       0.90      0.95      0.92        91
         17       0.91      0.96      0.93   

In [30]:
print("Confusion matrix:\n%s" % confusion_matrix(y_test, svm_predictions))

Confusion matrix:
[[70  0  0  0  0  0  0  0  0  1  0  1  0  0  0  4  0  0  0  4]
 [ 0 85  2  1  0  4  2  0  0  1  0  0  1  0  0  0  0  1  0  0]
 [ 0  5 86  4  1  1  1  0  0  0  0  0  0  0  1  0  0  0  0  0]
 [ 0  3  5 79  3  1  3  0  1  0  0  0  1  1  1  0  0  0  0  0]
 [ 0  3  2  4 83  0  0  0  0  0  1  0  1  0  0  0  0  2  0  0]
 [ 0  2  6  3  0 87  0  0  0  1  0  0  0  0  0  0  0  0  0  0]
 [ 1  0  0  3  0  0 87  2  0  0  0  1  2  0  1  1  0  0  0  0]
 [ 0  1  0  1  0  0  0 97  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  2  2 95  1  0  0  0  0  0  0  0  0  0  0]
 [ 0  1  0  0  0  0  0  0  0 98  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  1 99  0  0  0  0  0  0  0  0  0]
 [ 0  1  0  0  0  0  3  0  1  0  0 91  0  0  1  1  0  1  0  0]
 [ 0  2  1  2  1  1  2  3  0  0  0  0 84  2  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  2  2  0  1  1  0  0 88  1  1  0  2  1  0]
 [ 1  1  0  0  1  0  1  0  1  0  0  0  0  1 93  0  0  0  0  0]
 [ 3  1  0  0  0  1  0  0  0  0  1  0

# Perceptron

In [16]:
from sklearn.linear_model import Perceptron

In [17]:
perceptron_clf = Perceptron()
perceptron_clf.fit(X_train, y_train)

Perceptron(alpha=0.0001, class_weight=None, eta0=1.0, fit_intercept=True,
      n_iter=5, n_jobs=1, penalty=None, random_state=0, shuffle=True,
      verbose=0, warm_start=False)

In [18]:
perceptron_predictions = perceptron_clf.predict(X_test)

In [19]:
perceptron_score = perceptron_clf.score(X_train, y_train)
print(perceptron_score)

0.98520134426


In [20]:
print(classification_report(y_test, perceptron_predictions))

             precision    recall  f1-score   support

          0       0.88      0.80      0.84        80
          1       0.87      0.80      0.83        97
          2       0.84      0.85      0.84        99
          3       0.74      0.74      0.74        98
          4       0.86      0.83      0.85        96
          5       0.89      0.80      0.84        99
          6       0.86      0.86      0.86        98
          7       0.90      0.94      0.92        99
          8       0.87      0.94      0.90       100
          9       0.94      0.93      0.93        99
         10       0.95      0.99      0.97       100
         11       0.99      0.80      0.88        99
         12       0.71      0.89      0.79        98
         13       0.98      0.83      0.90        99
         14       0.98      0.83      0.90        99
         15       0.95      0.81      0.88       100
         16       0.92      0.93      0.93        91
         17       0.95      0.96      0.95   

In [21]:
print("Confusion matrix:\n%s" % confusion_matrix(y_test, perceptron_predictions))

Confusion matrix:
[[64  0  0  0  0  0  0  0  0  1  0  1  0  0  0  1  0  0  1 12]
 [ 2 78  2  2  0  5  2  0  1  1  0  0  2  0  0  0  0  1  0  1]
 [ 0  5 84  6  2  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  1  2 73  6  0  3  3  1  0  2  0  5  0  1  0  0  0  0  1]
 [ 0  1  2  7 80  1  0  0  1  0  0  0  4  0  0  0  0  0  0  0]
 [ 1  2  6  2  0 79  0  0  2  1  0  0  4  1  0  1  0  0  0  0]
 [ 0  0  0  4  2  1 84  1  1  0  0  0  4  0  0  0  0  0  0  1]
 [ 0  0  0  2  1  0  0 93  0  0  0  0  1  0  0  0  0  0  1  1]
 [ 0  0  0  0  0  0  0  2 94  2  0  0  1  0  0  0  0  0  0  1]
 [ 0  0  0  0  0  0  0  0  1 92  3  0  1  0  0  0  1  1  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 99  0  0  0  0  0  0  1  0  0]
 [ 0  0  0  1  0  0  3  0  2  0  0 79  4  0  0  0  1  0  5  4]
 [ 0  2  1  1  1  0  2  2  0  0  0  0 87  1  0  0  0  0  1  0]
 [ 0  0  1  0  0  0  3  2  0  0  0  0  2 82  0  0  0  1  4  4]
 [ 1  0  0  0  1  2  1  0  1  0  0  0  4  0 82  0  0  0  5  2]
 [ 1  1  1  0  0  1  0  0  1  0  0  0

# Multilayer Perceptron

In [22]:
from sklearn.neural_network import MLPClassifier

In [23]:
mlp_clf = MLPClassifier(solver='sgd', activation='relu', hidden_layer_sizes=(100), alpha=0.0001,
                       learning_rate_init=0.001, learning_rate='adaptive', verbose=True)

In [24]:
mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 3.00582464
Iteration 2, loss = 3.00237079
Iteration 3, loss = 2.99935629
Iteration 4, loss = 2.99666383
Iteration 5, loss = 2.99411789
Iteration 6, loss = 2.99174368
Iteration 7, loss = 2.98948836
Iteration 8, loss = 2.98738209
Iteration 9, loss = 2.98541827
Iteration 10, loss = 2.98355509
Iteration 11, loss = 2.98175783
Iteration 12, loss = 2.98002962
Iteration 13, loss = 2.97835035
Iteration 14, loss = 2.97672849
Iteration 15, loss = 2.97513257
Iteration 16, loss = 2.97355106
Iteration 17, loss = 2.97200609
Iteration 18, loss = 2.97045902
Iteration 19, loss = 2.96892434
Iteration 20, loss = 2.96739260
Iteration 21, loss = 2.96587025
Iteration 22, loss = 2.96435707
Iteration 23, loss = 2.96283463
Iteration 24, loss = 2.96129997
Iteration 25, loss = 2.95975896
Iteration 26, loss = 2.95821770
Iteration 27, loss = 2.95665335
Iteration 28, loss = 2.95508772
Iteration 29, loss = 2.95349620
Iteration 30, loss = 2.95189571
Iteration 31, loss = 2.95026701
Iteration 32, los



MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=100, learning_rate='adaptive',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='sgd', tol=0.0001, validation_fraction=0.1,
       verbose=True, warm_start=False)

In [25]:
mlp_predictions = mlp_clf.predict(X_test)

In [26]:
mlp_score = mlp_clf.score(X_test, y_test)
print(mlp_score)

0.376127320955


In [27]:
print(classification_report(y_test, mlp_predictions))

             precision    recall  f1-score   support

          0       0.56      0.17      0.27        80
          1       0.33      0.27      0.30        97
          2       0.58      0.58      0.58        99
          3       0.43      0.43      0.43        98
          4       0.59      0.24      0.34        96
          5       0.37      0.34      0.36        99
          6       0.32      0.88      0.47        98
          7       0.75      0.15      0.25        99
          8       0.43      0.45      0.44       100
          9       0.38      0.36      0.37        99
         10       0.40      0.67      0.50       100
         11       0.42      0.60      0.49        99
         12       0.50      0.04      0.08        98
         13       0.27      0.31      0.29        99
         14       0.64      0.32      0.43        99
         15       0.26      0.70      0.38       100
         16       0.42      0.16      0.24        91
         17       0.29      0.55      0.38   

  'precision', 'predicted', average, warn_for)


In [28]:
print("Confusion matrix:\n%s" % confusion_matrix(y_test, mlp_predictions))

Confusion matrix:
[[14  0  0  0  1  1  1  0  3 10  0  6  0  2  1 29  1 11  0  0]
 [ 0 26  7  1  2 10 22  0  1  4  3  4  0  7  3  4  0  3  0  0]
 [ 0  2 57  7  0  2 16  0  3  2  2  2  0  4  0  2  0  0  0  0]
 [ 0  7  7 42  5  3 15  0  3  3  2  2  0  4  1  3  0  1  0  0]
 [ 1  1  2 22 23  6 15  1  2  2  5  4  1  7  1  0  1  2  0  0]
 [ 0  6  8  5  0 34 27  0  3  1  1  7  1  3  1  2  0  0  0  0]
 [ 0  2  0  3  1  2 86  0  0  2  0  0  0  1  0  1  0  0  0  0]
 [ 1  4  1  4  4  2  7 15 21  3 12  7  0  3  1  6  1  7  0  0]
 [ 1  1  2  3  0  3 12  1 45  2 11  3  0  8  1  3  3  1  0  0]
 [ 0  1  0  2  0  1 10  0  1 36 29  1  0  2  0 10  1  5  0  0]
 [ 0  0  0  0  0  0  9  0  0  6 67  6  0  0  0  5  1  6  0  0]
 [ 0  2  2  1  1  3  7  0  1  1  2 59  0  6  1  9  0  4  0  0]
 [ 0  7  4  7  2  8 20  2  4  1  7 12  4  8  2  4  0  6  0  0]
 [ 0  8  0  0  0  5  3  0  0  2  8  6  0 31  1 23  0 12  0  0]
 [ 2  4  3  0  0  3  9  0  3  4  4  3  1 10 32 11  0 10  0  0]
 [ 0  3  2  0  0  2  3  0  0  2  1  2

# Citations

This experiment and notebook leverage libraries and packages made available through the scikit-learn project. Necessary citation is made here: Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.