# Session 08: Texture

Now, let's see if we can add a bit more to the types of features that
we consider when working with image data.

## Setup

We need to load the modules within each notebook. Here, we load the
same set as in the previous question.

In [None]:
%pylab inline

import numpy as np
import scipy as sp
import pandas as pd
import sklearn
from sklearn import linear_model
import urllib

import os
from os.path import join

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches

plt.rcParams["figure.figsize"] = (8,8)

## Cats and dogs

Read in the cats and dogs dataset once again:

In [None]:
xdf = pd.read_csv(join("..", "data", "catdog.csv"))
df

Lets create a black and white image and subtract each pixel from the 
pixel to its lower left. What does this show us?

In [None]:
img = imread(join('..', 'images', 'catdog', df.filename[2]))
img_bw = np.sum(img, axis=2)

img_text = img_bw[:-1, :-1] - img_bw[1:, 1:]
plt.imshow(img_text, cmap='gray')

## Texture features for learning a model

Let's try to use these features in a machine learning model:

In [None]:
X = np.zeros((len(df), 3))

for i in range(len(df)):
    img = imread(join("..", "images", "catdog", df.filename[i]))
    img_bw = np.sum(img, axis=2)
    img_hsv = matplotlib.colors.rgb_to_hsv(img)
    img_text = img_bw[:-1, :-1] - img_bw[1:, 1:]
    
    X[i, 0] = np.mean(img_hsv[:, :, 1])
    X[i, 1] = np.mean(img_hsv[:, :, 2])
    X[i, 2] = np.mean(img_text)
    if i % 25 == 0:
        print("Done with {0:d} of {1:d}".format(i, len(df)))

We will also build an array that is equal to 0 for cats and 1 for dogs:

In [None]:
y = np.int32(df.animal.values == "dog")
y

We'll make a training and testing split one more time:

In [None]:
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y)

And then, build a model from the data, testing the accuracy:

In [None]:
model = sklearn.linear_model.LinearRegression()
model.fit(X_train, y_train)
pred = model.predict(X_test)
yhat = np.int32(pred > 0.5)
sklearn.metrics.accuracy_score(y_test, yhat)

Let's also see the ROC curve.

In [None]:
fpr, tpr, _ = sklearn.metrics.roc_curve(y_test, pred)

In [None]:
plt.plot(fpr, tpr, 'b')
plt.plot([0,1],[0,1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

In [None]:
sklearn.metrics.auc(fpr, tpr)

We also can try this with the nearest neighbors model. 

In [None]:
model = sklearn.neighbors.KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
yhat = model.predict(X_test)
sklearn.metrics.accuracy_score(y_test, yhat)

Once again, try to change the number of neighbors to improve the model. You
should be able to get something similar to the linear regression.