# Logistic Regression

Based on this [blog post](https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-mnist-handwriting-recognition-matplotlib-a6b31e2b166a) and this [video](https://www.youtube.com/watch?v=71iXeuKFcQM) about multiclass regression and confusion matrices.

### Loading the data (digits dataset) 

In [None]:
%matplotlib inline
from sklearn.datasets import load_digits
digits = load_digits()

In [None]:
# print to show there are 1,797 images (8 by 8 images for a dimensionality of 64)
print("Image Data Shape" , digits.data.shape)

# print to show there are 1,797 labels (integers from 0-9)
print("Label Data Shape", digits.target.shape)

### Showing the images and labels

In [None]:
import numpy as np 
import matplotlib.pyplot as plt

plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])):
    plt.subplot(1, 5, index + 1)
    plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)
    plt.title('Training: %i\n' % label, fontsize = 20)

### Splitting the data into training and test sets 

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=0)

### Scikit-learn modeling in four steps 

**Step 1.** Import the model you want to use:

In sklearn, all machine learning models are implemented as Python classes

In [None]:
from sklearn.linear_model import LogisticRegression

**Step 2.** Make an instance of the model (with multiclass option "one-versus-the-rest" and a solver called liblinear):

In [None]:
logisticRegr = LogisticRegression(multi_class='ovr', solver='liblinear')

**Step 3.** Train the model on the data, storing the information learned from the data:

In [None]:
logisticRegr.fit(x_train, y_train)

**Step 4.** Predict the labels of new data (new images):

In [None]:
# returns a numpy array
# make a prediction for one observation (image)
logisticRegr.predict(x_test[0].reshape(1,-1))

In [None]:
# make a prediction for multiple observations (images) at once
logisticRegr.predict(x_test[0:10])

In [None]:
# make predictions on the entire test data
predictions = logisticRegr.predict(x_test)

### Measuring accuracy

We are going to see how the model performs on the new data (test set). Recall that accuracy = correct predictions / total number of data points.

In [None]:
# use the score method to get the accuracy of model
score = logisticRegr.score(x_test, y_test)
print(score)

### Plotting a confusion matrix

A confusion matrix is a table that describes the performance of a classifier on a set of test data for which the true values are known.

In [None]:
import numpy as np 
import seaborn as sns  # Seaborn is a useful package for creating nice-looking graphics
from sklearn import metrics

In [None]:
cm = metrics.confusion_matrix(y_test, predictions)

In [None]:
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square = True, cmap = 'Blues_r')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(score)
plt.title(all_sample_title, size = 15)
plt.savefig('toy_Digits_ConfusionSeabornCodementor.png')
#plt.show()

For instance, here we can see that some digits that were intendend to be 8s were misclassified as 1s.