<img src='img/logo.png'>
<img src='img/title.png'>

# Exercise

Import the iris data set then use scaling and PCA as `Pipeline` preprocessing steps.  Make a  `Pipeline` for each of the classifiers imported in the following cell, fit each `Pipeline`, and calculate its confusion matrix.

In [None]:
from sklearn import datasets

iris_data = datasets.load_iris()
examples = iris_data.data
classes  = iris_data.target
print(classes.shape)
print(examples.shape)

In [None]:
# preprocessing
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# classifiers
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB

<button data-toggle="collapse" data-target="#soln1" class='btn btn-primary'>Show solution</button>

<div id="soln1" class="collapse">

Import some classes for pipelines and scaling, and create a collection of classifiers:

```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, accuracy_score
# make a list of classifiers 
classifiers = [
    KNeighborsClassifier(3),
    SVC(kernel="linear", C=0.025),
    SVC(gamma=2, C=1),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    AdaBoostClassifier(),
    GaussianNB(),
    LDA(),
    QDA()]
```

Setup some scaffolding to look at the confusion matrixes of various classifiers:

```python
# we also see other measures of accuracy
import numpy as np
from sklearn import neighbors, datasets

iris = datasets.load_iris()
# Let's take a look at the "shape" of the data
examples = iris.data
classes = iris.target

# Create a training and a testing set from this data by choosing indices
# (manually, not with higher-level APIs)
def test_once(examples, classes, n_neighbors=5, estimator=None):
    # Random order of indices
    n_examples = len(examples)
    shuffled_indices = np.random.permutation(n_examples)

    # Pick a trainig/testing split
    train_pct = 0.8
    train_ct  = int(n_examples * train_pct)

    # Select indices for training and testing
    train_idx, test_idx = shuffled_indices[:train_ct], shuffled_indices[train_ct:]
    model = Pipeline([('standard', StandardScaler()), 
                       ('pca', PCA()), 
                       ('clf', estimator)])
    model.fit(examples[train_idx], classes[train_idx])
    predictions = model.predict(examples[test_idx])
    confusion = confusion_matrix(predictions, classes[test_idx])
    accuracy = accuracy_score(predictions, classes[test_idx])
    return confusion, accuracy
```

Take a look at the various classifiers by confusion matrix and accuracy:

```python
pretty_name = lambda classifier: classifier.__class__.__name__
# now we can loop over the classifiers
accuracies = []
for classifier in classifiers:
    print('With classifier', pretty_name(classifier))
    confusion, accuracy = test_once(examples, classes, estimator=classifier)
    print('Confusion matrix:')
    print(confusion)
    print('Accuracy', accuracy)
    accuracies.append(accuracy)
    print()
idx = accuracies.index(np.max(accuracies))
best = classifiers[idx]
print('Best model', best, 'accuracy', accuracies[idx])
```

<img src='img/copyright.png'>