# Super-duper quick 'n' dirt character analysis, from scratch
My attempt at following along with the []"Hello, Deep Learning" tutorial](https://berthub.eu/articles/posts/hello-deep-learning/) by Bert Hubert.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torchvision

## Reading the data
Before we can do any maths, we need to open the [EMNIST dataset](https://www.nist.gov/itl/products-and-services/emnist-dataset) in a form
Python can use. NIST only supplies the data in a janky, custom binary format (or MatLab format) and parsing it is a mug's game. Instead, let's just use PyTorch's built-in methods to download and parse the dataset (documentation can be found [here]("https://pytorch.org/vision/main/generated/torchvision.datasets.EMNIST.html#torchvision.datasets.EMNIST")):

In [None]:
training_dataset = torchvision.datasets.EMNIST("./emnist",split='digits', train=True, download=True)

This data is stored in a TorchVision `image` type. Let's inspect its members:

In [None]:
dir(training_dataset)

We can get the data by calling `dataset.data` and the labels by calling `dataset.targets`, both of which are PyTorch `Tensor` objects. We can also convert these `Tensor`s to NumPy arrays by calling their `.numpy()` method (working with plain numpy arrays for now since I don't want to use PyTorch *too* much during the learning stage).

In [None]:
image_data = training_dataset.data.numpy()
image_labels = training_dataset.targets.numpy()
num_images = image_labels.shape[0]
print(f"Read {num_images} training images.")

Now we can go through and select just the images that correspond to the "threes" and "sevens". The image data consists of a 3D array, the first index corresponding to the "label index", while the second two elements are the x- and y-pixel values (which are always 28x28 in size). The labels are a flat array, so `image_data[i]` corresponds to `image_labels[i]`. Let's put this into action by picking out just the images we want and the plotting a sample one using matplotlib (don't forget that the training data are rotated and mirrored relative to how a human would write it).

In [None]:
threes_indices = []
sevens_indices = []
for i in range(len(image_labels)):
  if(image_labels[i] == 3):
    threes_indices.append(i)
  elif(image_labels[i] == 7):
    sevens_indices.append(i)

threes = image_data[threes_indices]
sevens = image_data[sevens_indices]
num_threes = threes.shape[0]
num_sevens = sevens.shape[0]

# Quick 'n' dirty plot to check it worked (just one because I can't be bothered to set up multiplots right now)
plt.imshow(sevens[1], interpolation="nearest", cmap="gray")

Neato! Now we can actually start processing the data. 

## Processing the training data
We'll want to calculate two arrays containing the average brightness at each pixel for both the "seven" and "three" datasets, then compute the *difference* between these two averages to compute a "weights" matrix.

In [None]:
avg_three = np.mean(threes, axis=0)
avg_seven = np.mean(sevens, axis=0)

# Compute average difference, then normalise by the number of samples
avg_diff = (avg_seven - avg_three)/num_images
plt.imshow(avg_diff, interpolation="nearest", cmap="gray")

# Classifying images
Now, the general idea is that we can use our weights matrix of average differences as a classifier: pixels with a large positive values is more likely to be associated with a "seven", while negative values are more likely to be associated with a "three". We can "score" each image by computing its element-wise product with the weights matrix and calculating the sum of all pixels in the resulting array - this gives us a measure of "threeness" or "sevenness". For example, using the seven from above (`sevens[1]`):

In [None]:
tmp = np.zeros((28,28), np.float32)
np.multiply(sevens[1], avg_diff, out=tmp)
mean = np.sum(tmp)
print(f"Sevenness = {mean}")

We can repeat this for all our training data and calculate the average "threeness" of all "threes" and the "sevenness" of all "sevens". 
This measure will not in general be symmetric around zero, so we can compute the average of the above "threeness" and "sevenness" to get what's called a *bias*.

In [None]:
avg_threeness = 0.0;
avg_sevenness = 0.0
# Temporary array to hold the weighted pixels. Totally fine to overrite this after we calculate each average
tmp = np.zeros((28,28), np.float32)

# Threes
for i in range(num_threes):
  np.multiply(threes[i], avg_diff, out=tmp)
  avg_threeness += np.sum(tmp)

avg_threeness /= num_threes

# Sevens
for i in range(num_sevens):
  np.multiply(sevens[i], avg_diff, out=tmp)
  avg_sevenness += np.sum(tmp)

avg_sevenness /= num_sevens

bias = (avg_threeness - avg_sevenness)/2
print(f"Average threeness = {avg_threeness}\nAverage sevenness={avg_sevenness}\nBias = {bias}")



## Validating the model
With the bias calculated, we can do this procedure on the validation data and if a given image's pixel-sum (the residue) is larger than the bias, it's a seve, otherwise it's a three.

Let's give this a try. First, we have to load the testing data and extract the threes and sevens like before.

In [None]:
# Download testing data
testing_dataset = torchvision.datasets.EMNIST("./emnist",split='digits', train=False, download=True)
testing_data = training_dataset.data.numpy()
testing_labels = training_dataset.targets.numpy()
num_training = testing_labels.shape[0]

# Extract threes and sevens
threes_indices = []
sevens_indices = []
for i in range(len(image_labels)):
  if(testing_labels[i] == 3):
    threes_indices.append(i)
  elif(testing_labels[i] == 7):
    sevens_indices.append(i)
threes = testing_data[threes_indices]
num_threes = threes.shape[0]
num_sevens = sevens.shape[0]
sevens = image_data[sevens_indices]

# Now run through the threes, classify the image and check whether it's correct or not
tmp = np.zeros((28,28), np.float32)
bad_predictions = {3: [], 7:[]}
threes_accuracy = 0.0

for i in range(num_threes):
  np.multiply(threes[i], avg_diff, out=tmp)
  if np.sum(tmp) <= bias:
    threes_accuracy += 1
  else:
    bad_predictions[3].append(i)

threes_accuracy /= num_threes

# And again for sevens
sevens_accuracy = 0.0

for i in range(num_sevens):
  np.multiply(sevens[i], avg_diff, out=tmp)
  if np.sum(tmp) > bias:
    sevens_accuracy += 1
    # Check if it's actually correct!
  else:
    # Save the wrong predictions to plot later
    bad_predictions[7].append(i)
sevens_accuracy /= num_sevens

print(f"Classifier accuracy for validation dataset:\nThrees: {threes_accuracy}%\nSevens: {sevens_accuracy}%")


Finally, let's draw some of the figures that our classifier got wrong, which should help give an idea of what it's "thinking".

In [None]:
# Plot some bad predictions on separate subplots
fig, axes = plt.subplots(2, 10)
for i in range(10):
  #plt.subplot()
  #plt.imshow(threes[i])
  axes[0][i].imshow(threes[bad_predictions[3][i]])
for i in range(10):
  axes[1][i].imshow(sevens[bad_predictions[7][i]])

This is pretty good accuracy for such a dumb model! It only works for threes vs sevens and won't work with other characters, but that's the price we pay for sticking to elementary linear algebra. If we *do* want to generalise, then we'll need to get more complex, but this basic process is (conceptually) at the heart of how a neural network classifier works.