```diff
+ The following section provides an opportunity for the student to create a classifier model from scratch, and then modify it to improve performance.
```

In [None]:
from sklearn import datasets, neighbors, metrics, grid_search, cross_validation
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
iris = datasets.load_iris()
irisdf = pd.DataFrame(iris.data, columns=iris.feature_names)
irisdf['target'] = iris.target
cmap = {0:'r', 1:'g', 2:'b' }
irisdf['ctarget'] = irisdf.target.apply(lambda x: cmap[x])
irisdf.plot('petal length (cm)', 'petal width (cm)', kind='scatter', c=irisdf.ctarget)
print(irisdf.describe())

def my_classifier(row):
    if row['petal length (cm)'] < 2:
        return 0
    else:
        return 1

predictions = irisdf.apply(my_classifier, axis=1)

In [None]:
irisdf['predictions'] = predictions

print(float(len(irisdf[irisdf.target == irisdf.predictions])) / len(irisdf))


### Starter Code

Work on improving the classifier below.

In [None]:
def my_classifier(row):
    if row['petal length (cm)'] < 2:
        return 0
    else:
        return 2

predictions = irisdf.apply(my_classifier, axis=1)

irisdf['predictions'] = predictions

print(float(len(irisdf[irisdf.target == irisdf.predictions])) / len(irisdf))


### Using distance: KNN implementation

```diff
+ The following section provides an opportunity for the student to implement an existing sklearn model - KNN, and then try adjusting its parameters.
```

In [None]:
iris = datasets.load_iris()
x_train,x_test,y_train,y_test = cross_validation.train_test_split(iris.data,iris.target,test_size=0.3)

# n_neighbors is our option in KNN. We'll tune this value to attempt to improve our prediction.
knn = neighbors.KNeighborsClassifier(n_neighbors=5, weights='uniform')

knn.fit(x_train, y_train)
prediction = knn.predict(x_test)
print(prediction)
print(y_test)

print(knn.score(x_test, y_test))


Do we see a change in performance when using the distance weight?

In [None]:
knn = neighbors.KNeighborsClassifier(n_neighbors=5, weights='') # add in the weights parameter here
knn.fit(x_train, y_train)
print(knn.predict(x_test))
print(y_test)

print(knn.score(x_test, y_test))


### Solution to solving K
```diff
+ The following section provides an opportunity for the student to implement grid search (a central theme from the previous class) in order to find the optimal value for k, given this particular data set.
```
This is only one approach to the problem, but adding in the 'distance' parameter (instead of uniform) would only be additive. Note the code would need some editing to handle it properly if done in the grid search; alternatively, make the change directly in the estimator.

In [None]:
# recall: what's an effective way to create a numerical list in python?

kf = cross_validation.KFold(len(irisdf), n_folds = 5)
gs = grid_search.GridSearchCV(
    estimator=neighbors.KNeighborsClassifier(),
    param_grid=params,
    cv=kf,
)
gs.fit(iris.data, iris.target)
gs.grid_scores_
