# Other models

SKlearn also has these other models, like k-Nearest Neighbors, SVM and Decision Trees. They all work similarly, although every model may have it specific options for configuration. Each one of these models has a `fit` method that requires a train X and y, and a predict method that takes an X to apply the learned model to and generate predctions.

In [3]:
from ml import *

In [4]:
data = iris_classify()

# k-Nearest Neighbor

You can find out more about the k-Nearest Neighbors classifier by running a pydoc query in a Notebook. We can see that instead of k, we should set n_neighbors.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [5]:
model = KNeighborsClassifier(n_neighbors=1)

In [6]:
model.fit(data.train_X, data.train_y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=1, p=2,
           weights='uniform')

# Evaluation

To simplify repeating the evaluation, we write a function `evaluate` to report the percentage of flowers in $X$ that gets the same predicted class as the true class $y$. At this first attempt we see that 93.5% is classified correctly. 

In [8]:
"""
X: n x m matrix containing the input for n training examples, each having m features
y: n x 1 matrix containing the correct class {0,1} for the n training examples
theta: m x 1 matrix containing the coefficients for the model
Returns percentage correctly predicted cases in X
"""
def evaluate(model, X, y):
    return sum( model.predict(X) == y ) / len(X)

In [8]:
evaluate(model, data.train_X, data.train_y)

0.9875

# SVM

You can also query Python to view the source code in the `ml` library. If you do so, you can see that to learn an SVM model, it uses `SVC(kernel='linear', C=0.025)`. Then in the SVC documentation you can find more on these kernels and what `C` is.

In [18]:
from sklearn.svm import SVC

In [10]:
model = SVC(kernel='linear', C=0.025)

In [11]:
model.fit(data.train_X, data.train_y)

SVC(C=0.025, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

In [12]:
evaluate(model, data.train_X, data.train_y)

0.9125

So we see that SVM is slightly less effective than KNN on this dataset.

# Decision Trees



In [13]:
from sklearn.tree import DecisionTreeClassifier

In [14]:
model = DecisionTreeClassifier(max_depth=5)

In [15]:
model.fit(data.train_X, data.train_y)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=5, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

In [16]:
evaluate(model, data.train_X, data.train_y)

0.9875

# Logistic Regression

In [11]:
from sklearn.linear_model import LogisticRegression

In [5]:
model = LogisticRegression()

In [6]:
model.fit(data.train_X, data.train_y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [9]:
evaluate(model, data.train_X, data.train_y)

0.925