# Naive Bayesian Networks
#### Image Classification
##### MNIST - Hand Written Digits

**Name** : Kimia Noorbakhsh

**Student ID** : none

**Sources:**
<hr>

### Installing Dependencies

`torchvision` is installed and used for loading the dataset. If you are not comfortable using torch and numpy, feel free to implement your own dataloader. 

In [5]:
!pip install torch===1.5.0 torchvision===0.6.0 -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch===1.5.0
  Downloading https://download.pytorch.org/whl/cu102/torch-1.5.0-cp38-cp38-win_amd64.whl (899.1 MB)
Collecting torchvision===0.6.0
  Downloading https://download.pytorch.org/whl/cu102/torchvision-0.6.0-cp38-cp38-win_amd64.whl (1.2 MB)
Collecting numpy
  Downloading numpy-1.18.5-cp38-cp38-win_amd64.whl (12.8 MB)
Collecting future
  Using cached future-0.18.2.tar.gz (829 kB)
Collecting pillow>=4.1.1
  Downloading Pillow-7.1.2-cp38-cp38-win_amd64.whl (2.0 MB)
Building wheels for collected packages: future
  Building wheel for future (setup.py): started
  Building wheel for future (setup.py): finished with status 'done'
  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491062 sha256=e3b2d526189262c67194174eef41b7223aa9ca7d5574fbfc1eea64c2e19d3ddb
  Stored in directory: c:\users\kimia\appdata\local\pip\cache\wheels\8e\70\28\3d6ccd6e315f65f245da085482a2e1c7d14b90b30f239e2cf4
S

### Importing Libraries

In [1]:
from torchvision import datasets
import numpy as np

<hr>

### Loading Data

`train_data` & `test_data` variables will hold an [MNIST dataset object](https://pytorch.org/docs/stable/torchvision/datasets.html#mnist) which can be used just like a normal list of `(image <PIL type>, label <integer>)` in python.

In summery, it is possible to loop through the data like:
```python
for image, label in train_data:
    image = np.array(image) # convert PIL image to numpy array
    # your code here
```
or access single datapoints like `image, label = train_data[i]`.`len(train_data)` can be used to get the number training points.

In [2]:
train_data = datasets.MNIST('./data', train=True, download=True)
test_data  = datasets.MNIST('./data', train=False, download=True)
# reformatting the data
new_train_data = []
new_test_data = []
for image, label in train_data:
    image = np.array(image)
    image = np.where(image >= 128, 1, 0)
    new_train_data.append((image, label))
for image, label in test_data:
    image = np.array(image)
    image = np.where(image >= 128, 1, 0)
    new_test_data.append((image, label))

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100.1%

Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


113.5%

Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100.4%

Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz




Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw
Processing...
Done!


<hr>

### Training Model

In [4]:
def training(data_train):
    probs = {}
    prob_of_numbers = {}
    for i in range(10):
        arr = np.zeros((28, 28), dtype=int)
        num = 0
        for image, label in data_train:
            if label == i:
                num += 1
                arr = np.add(image, arr)
        k = 0.02
        probs[i] = ((arr + k) / (num * (k + 1)))
        prob_of_numbers[i] = num / len(new_train_data)
    return probs, prob_of_numbers

### Evaluating Model

In [5]:
def predicting(data_test, probs, prob_of_numbers):
    predicted_labels = []
    correct_predictions = 0
    for image, label in data_test:
        all_probs = {}
        for j in range(10):
            new_arr = np.where(image == 1, probs[j], 1 - probs[j])
            number = np.prod(new_arr)
            number = number * (prob_of_numbers[j])
            all_probs[j] = number
        predicted_label = max(all_probs, key=all_probs.get)
        if predicted_label == label:
            predicted_labels.append(predicted_label)
            correct_predictions += 1

    return predicted_labels, correct_predictions

In [6]:
# training
probs, prob_of_numbers = training(new_train_data)
# prediction
predicted_labels, correct_predictions = predicting(new_test_data, probs, prob_of_numbers)
print("accuracy : ", end="")
print("{:.2f}".format(correct_predictions / len(new_test_data) * 100), end="")
print("%")

accuracy : 84.40%
