# Evaluation: Precision & Recall
## Using the evaluation metrics we have learned, we are going to compare how well some different types of classifiers perform on different evaluation metrics
### We are going to use a dataset of written numbers which we can import from sklearn. Run the code below to do so. 


In [None]:
import numpy as np
#from sklearn.datasets import fetch_mldata
from sklearn.datasets import fetch_openml
#mnist = fetch_mldata('MNIST original')
#X, y = mnist['data'], mnist['target']

mnist = fetch_openml('mnist_784')
X, y = mnist['data'], mnist['target']

### Now take a look at the shapes of the X and y matricies 

In [None]:
print(X.shape, y.shape)

### Now, let's pick one entry and see what number is written. Use indexing to pick the 36000th digit

In [None]:
X[36000]

In [None]:
y[36000]

### You can use the .reshape(28,28) function and plt.imshow() function with the parameters cmap = matplotlib.cm.binary, interpolation="nearest" to make a plot of the number. Be sure to import matplotlib!

In [None]:
import matplotlib.pyplot as plt
plt.imshow(X[36000].reshape(28,28),cmap = matplotlib.cm.binary, interpolation = "nearest")

### Use indexing to see if what the plot shows matches with the outcome of the 36000th index

In [None]:
y[36000]

### Now lets break into a test train split to run a classification. Instead of using sklearn, use indexing to select the first 60000 entries for the training, and the rest for training.

In [None]:
#as in total 70,000 and want to get first 60,000: slect ":-10,000" for training
X_train = X[:-10000]
X_test = X[-10000:]
y_train = y[:-10000]
y_test = y[-10000:]

### We are going to make a two-class classifier, so lets restrict to just one number, for example 5s. Do this by defining a new y training and y testing sets for just the number 5

In [None]:
#should be with a 9 I guess. But doing as asked (with the 5)
import numpy as np

y_5_train = np.where(y_train == "5", 1, 0)
y_5_test = np.where(y_test == "5", 1, 0)

### Lets train a logistic regression to predict if a number is a 5 or not (remember to use the 'just 5s' y training set!)

In [None]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(solver = "liblinear", random_state=0)
logreg.fit(X_train, y_5_train)

### Does the classifier predict correctly the 36000th digit we picked before?

In [None]:
y_pred = logreg.predict(X_train)

In [None]:
y_pred[36000]

### To make some comparisons, we are going to make a very dumb classifier, that never predicts 5s. Build the classifier with the code below, and call it using: never_5_clf = Never5Classifier()

In [None]:
from sklearn.base import BaseEstimator
class Never5Classifier(BaseEstimator):
    def fit(self, X, y=None):
        pass
    def predict(self, X):
        return np.zeros((len(X), 1), dtype=bool)

never_5_clf = Never5Classifier()

### Now lets fit and predict on the testing set using our never 5 Classifier

In [None]:
never_5_clf.fit(X_train)
never_5_clf.predict(X_test)

### Let's compare this to the Logistic Regression. Examine the confusion matrix, precision, recall, and f1_scores for each. What is the probability cutoff you are using to decide the classes?

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score
cm = confusion_matrix(y_5_train, y_pred)
precision = precision_score(y_5_train, y_pred)
recall = recall_score(y_5_train, y_pred)
f1_score = f1_score(y_5_train, y_pred)
print(confusion_matrix)
print(precision)
print(recall)
print(f1_score)

In [None]:
cm

In [None]:
#logreg = LogisticRegression(solver = "liblinear", random_state=0)
#logreg.fit(X_train, y_5_train)
#y_pred = logreg.predict(X_train)

logreg = LogisticRegression(solver = "liblinear", random_state=0)
never_5_clf.fit(X_train)
five_predict = never_5_clf.predict(X_train)

cm = confusion_matrix(y_5_train, five_predict)
precision = precision_score(y_5_train, five_predict)
recall = recall_score(y_5_train, five_predict)
#print(precision)
#print(recall)
#print(f1_score)

In [None]:
cm

In [None]:
precision

In [None]:
recall

In [None]:
f1_score(y_5_train, five_predict)
#why?

### What are the differences you see? Without knowing what each model is, what can these metrics tell you about how well each works?

In [None]:
#it seems as if the first model works better, however, please note that there might be a mistake in the computation
# of the second model

### Now let's examine the roc_curve for each. Use the roc_curve method from sklearn.metrics to help plot the curve for each

In [None]:
import sklearn.metrics as metrics
from sklearn.metrics import roc_curve

#LogReg
fpr, tpr, threshold = roc_curve(y_5_test, y_pred)
plt.plot(fpr, tpr, 'b')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

In [None]:
fpr, tpr, threshold = roc_curve(y_5_test, five_predict)
plt.plot(fpr, tpr, 'b')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

In [None]:

'''
# calculate the fpr and tpr for all thresholds of the classification
logreg = LogisticRegression(solver = "liblinear", random_state=0)
logreg.fit(X_train, y_5_train)
probs = logreg.predict_proba(X_test)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(y_5_train, y_pred)
roc_auc = metrics.auc(fpr, tpr)
'''

In [None]:
'''
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
'''

### Now find the roc_auc_score for each. 

In [None]:
roc_score_log = metrics.roc_auc_score(y_5_test, y_pred)

In [None]:
roc_score_never = metrics.roc_auc_score(y_5_test, five_predict)

### What does this metric tell you? Which classifier works better with this metric in mind?

In [None]:
# I think the Logistic Regression is superior