# Online Passive-Aggressive Algorithms

The Passive-Aggressive series of classifiers is our first example of an **online** algorithm, meaning that training is intended to happen one record at a time instead of in batches.  This works especially well in cases where you can get a quick resolution to your predictions and want to perform constant machine learning.

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

clf = PassiveAggressiveClassifier(loss="squared_hinge", C=1.0, max_iter=1000, random_state=0, tol=1e-4)


This classifier takes a new parameter:  `tol` (tolerance).  This acts as the stopping criterion, meaning that training continues until `loss > (previous_loss - tol)`.

Now let's prep the data.  Because we'll do it the same way for each, we only need to do this once.  I'll also remove the bits where we analyze the data, as we've seen it enough times already.

In [None]:
campus_data = "../data/CampusRecruitment.csv"
df = pd.read_csv(campus_data, header=0)
y = df['status']
X = df.drop(['status', 'salary'], axis=1)

## Pre-Processing

For this dataset, we want to use the columns leading up to `status` to determine if different college graduates were placed at a job.  Because the salary is determined by the placement status, we can't use it to predict if a new graduate will be placed, so we'll have to drop that column.  Note that if we were interested in doing a regression analysis, we could try to predict the salary given placement, but we're keeping it classy and sticking to classification algorithms only .

Unlike the heart attack dataset, this dataset includes non-numeric features.

In [None]:
df

Before we can feed this data into the Passive-Aggressive Classifier algorithm (or pretty much any other classification algorithm), we need to convert any text data into numeric data.  There are a few common techniques for encoding.  The technique we will use for our dataset is called one-hot encoding.  What it does is "pivot" the categorical data, so that each distinct categorical value gets its own feature.  For example, `gender` has two values, M and F.  One-hot encoding will create new new features, one for `gender=M` and one for `gender=F`.  We need to do this for each of the non-numeric features.

In [None]:
enc = preprocessing.OneHotEncoder()

# Fit the input features to our encoder
enc.fit(X)

# Perform the transformation on our dataset
X = enc.transform(X).toarray()
X.shape

The `shape` here shows that we have the same number of rows as before (215), but the number of columns went from 15 to 873.  This huge increase came about because of all of the unique string values in the dataset.

In [None]:
X

By contrast, I'm going to perform a simple label encoding of the `status`.

In [None]:
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
y

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1740)

In [None]:
X_train

Now let's train the passive-aggressive model.

In [None]:
clf = clf.fit(X_train, y_train)

## How'd We Do?

Let's first use the `accuracy_score` method in sklearn to see just how well we did.

In [None]:
predicted = clf.predict(X_test)
accuracy_score(y_test, predicted)

The high-line accuracy is exactly the same as kNN.  Let's compare how it does in the confusion matrix and classification report.

In [None]:
confusion_matrix(y_test, predicted)

In [None]:
print(classification_report(y_test, predicted))

Our passive-aggressive classifier did not do as well in predicting non-placements as kNN, but it did better in predicting placements.

Note that more than most other algorithms, online passive-aggressive classifiers are very dependent on the ordering of input data.  A small change could lead to a substantial accuracy difference, more so than most algorithms.  The benefits are that they tend to be very accurate (especially as information changes over time) and you do not need massive amounts of data for retraining.