# EDS 232 Machine Learning Lab 4a: Deep Learning - Neural Networks
## By Mia Forsline 
## 2022-02-23
### Sourced from [fchollet on GitHub](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/8a30b90fed187aaddaaf1fc868ec8e0ac92bca40/2.1-a-first-look-at-a-neural-network.ipynb)

In [1]:
import keras
keras.__version__

2022-02-11 02:47:13.361811: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/4.0.5/lib/R/lib::/lib:/usr/local/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/java-11-openjdk-amd64/lib/server
2022-02-11 02:47:13.361847: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


'2.6.0'

# Introduction: A first look at a neural network

- This notebook contains the code samples found in Chapter 2, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff)
- the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

We will now take a look at a first concrete example of a neural network, which makes use of the Python library Keras to learn to classify hand-written digits.

- The problem we are trying to solve here is to **classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 categories (0 to 9)**. 
- The dataset we will use is the *MNIST dataset*
- It's a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. 

The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays:

In [2]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

- `train_images` and `train_labels` form the "training set", the data that the model will learn from. 
- The model will then be tested on the "test set", `test_images` and `test_labels`. 
- Our images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging from 0 to 9. There is a one-to-one correspondence between the images and the labels.

## Training data:

In [3]:
train_images.shape #check the dimensions

(60000, 28, 28)

In [4]:
len(train_labels) #length = rows 

60000

In [5]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

## Test data:

In [6]:
test_images.shape

(10000, 28, 28)

In [7]:
len(test_labels)

10000

In [8]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

## Workflow:
- we will present our neural network with the training data (`train_images` and `train_labels`)
- the network will then learn to associate images and labels
- we will ask the network to produce predictions for `test_images`
- we will verify if these predictions match the labels from `test_labels`

Let's build our network

In [9]:
from tensorflow.keras.utils import to_categorical

from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

2022-02-11 02:47:15.073859: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/4.0.5/lib/R/lib::/lib:/usr/local/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/java-11-openjdk-amd64/lib/server
2022-02-11 02:47:15.073891: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-11 02:47:15.073908: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (taylor): /proc/driver/nvidia/version does not exist
2022-02-11 02:47:15.074185: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the ap

## The core building block of neural networks is the **"layer"**
- a layter is a data-processing module which you can conceive as a "filter" for data. 
- Some data comes in, and comes out in a more useful form. 
- layers extract _representations_ out of the data fed into them
    - hopefully representations that are more meaningful for the problem at hand. 
- Most of deep learning  consists of chaining together simple layers in a form of progressive **"data distillation"**. 
- A deep learning model is like a sieve for data processing, made of a succession of increasingly refined data filters, AKA the "layers".

Or network consists of a sequence of two `Dense` layers, which are densely-connected (also called "fully-connected") neural layers. 
- The second (and last) layer is a 10-way "softmax" layer, which means it will return an array of 10 probability scores (summing to 1). 
- Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

To make our network ready for training, we need to pick 3 more things, as part of **"compilation"** step:

1.  A **loss function**: this is how the network will be able to measure how well it is doing on its training data, and thus how it will be able to improve itself
2. An **optimizer**: this is the mechanism through which the network will update itself based on the data it sees and its loss function.
3. **Metrics to monitor during training and testing**: for now, we  only care about accuracy (the fraction of the images that were correctly classified).

In [10]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

## Before training, we will preprocess our data 
- reshape the data into the shape that the network expects
- scale the data so that all values are in the `[0, 1]` interval

Previously, our training images for instance were stored in an array of shape `(60000, 28, 28)` of type `uint8` with 
values in the `[0, 255]` interval. 
- we will transform it into a `float32` array of shape `(60000, 28 * 28)` with values between 0 and 1.

In [11]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels, a step which we explain in chapter 3:

In [12]:
#from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

## We are now ready to train our network
- to do this in Keras, we will use a call to the `fit` method of the network: 
- in other words, we "fit" the model to its training data.

In [13]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

2022-02-11 02:47:15.749416: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f141850e110>

## During training, we can see 2 values: 
- the "loss" of the network over the training data
- and the accuracy of the network over the training data

We quickly reach an accuracy of 0.989 (i.e. 98.9%) on the training data. 

## Next, let's check that our model performs well on the test set too:

In [14]:
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [15]:
print('test_acc:', test_acc)

test_acc: 0.9800000190734863


## Our test set accuracy is lower than the training set accuracy 
- This gap between training accuracy and test accuracy is an example of **"overfitting"**: the fact that machine learning models tend to perform worse on new data than on their training data. 

## Conclusion
- learned how to build and a train a neural network to classify handwritten digits, in less than 20 lines of Python code. 
- next, we will learn in detail over every moving piece we just previewed, and clarify what is really going on behind the scenes. 
- we will learn about "tensors", the data-storing objects going into the network, about tensor operations, which layers are made of, and about gradient descent, which allows our network to learn from its training examples.