# Deep Learning Course - Introduction

This is the introduction to the Deep Learning course.

This component will cover

 * Course Scope
 * Prerequisites Information
 * Course Structure
 * Assignments and Assessment
 * Coding
 * A little bit of background on Linear Algebra

## Coarse Scope

Let's get to the point. Modern frameworks make it easy to code up some solutions for Deep Learning. Have a look....

### Tensorflow: Accurate Digit Classification in 14 Lines

In [1]:
import sys
import tensorflow as tf
print(sys.version)
print(tf.__version__)

3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
2.15.0


In [2]:
# Get a copy of the mnist dataset container
mnist = tf.keras.datasets.mnist

# Pull out the training and test data
(x_train, y_train),(x_test, y_test) = mnist.load_data()

# Normalize the training and test datasets
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

# Create a simple sequential network object
model = tf.keras.models.Sequential()

# Add layers to the network for processing the input data
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(64, activation=tf.nn.sigmoid))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.sigmoid))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

# Compile the model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

# Start the training process
model.fit(x=x_train, y=y_train, epochs=5)

print(model.summary())

# Evaluate the model performance with test data
test_loss, test_acc = model.evaluate(x=x_test, y=y_test,verbose=0)

# Print out the model accuracy
print('\nTest accuracy: ' + str(test_acc*100) + "%" )

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (32, 784)                 0         
                                                                 
 dense (Dense)               (32, 64)                  50240     
                                                                 
 dense_1 (Dense)             (32, 128)                 8320      
                                                                 
 dense_2 (Dense)             (32, 10)                  1290      
                                                                 
Total params: 59850 (233.79 KB)
Trainable params: 59850 (233.79 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None

Te

The code above is training a Deep Neural Network based model to classify handwritten images like the ones below to digits. It is using state of the art optimisers, normalising the data, and gets 97% accuracy in just 5 epochs of training. It is taking advantage of the Keras interface to Tensorflow, and subject to getting your training data organised in the right way, it can be easily adopted to new problems.

<img width="200" src="https://drive.google.com/uc?id=1vNVU20xo-QDsySYSg7mCal3QTgvaksaK"/>

Therefore we can say that it is very easy to take some code that we find online to get a basic Deep Learning model training and producing results. This is great, and the work that the developers of Tensorflow and libraries like it have done to allow users to rapidly prototype models is simply amazing.

That said, if all we do is copy and paste code -- even with alterations -- we seldom learn and find it very difficult to apply learnings to novel or more challenging situations. For this reason, this coarse is not simply a guide to copying and pasting -- you don't need help to do that -- instead this course tries to give you a good understanding of some of the most important concepts that underpin the Deep Learning approach. Putting it simply, I want to  give you an understanding of why things are the way that they are.

Of course theory is useless without practice, so we will keep theory and practice interleaved for you.

### Content Covered

In this Deep Learning course we will trace through the most important concepts in Deep Learning such as the Backpropagation Algorithm, Convolutional Neural Networks, Recurrent Neural Networks and Reinforcement Learning.

The course will place emphasis on introducing concepts that are commonly used in practical Deep Learning research and application. An emphasis on running examples will be made. As such most lecture notes will be provided in the style of Jupyter notebooks that a student can download, edit and run on their own machines. I may also provide lecture slides on a week to seek basis, but the Jupyther notebooks are the main source of information.

With the exception of Week 1, notes for each week are provided in advance of the lecture, and students are encouraged to study in advance for the class. The class - lecture and tutorial time - will be used to revise the material that has already been given out and will give you an opportunity to ask questions about it. This is commonly referred to as a flipped classroom delivery.

Some content commonly found in other courses will not be covered here. For example the following will not be addressed:

 * The history of neural networks
 * The biological inspiration of neural networks
 * Unsupervised learning of neural networks
 * Ethical Issues and Explainability

The interested student is directed to the many good external resources such as Geoff Hinton's online course on neural networks for detail on these topics.

Speaking of videos, this course is not intended to be self-contained. While detailed notes with working examples will be given from week to week as Jupyter notebooks, links to external resources such as videos from Geoff Hinton and Andrew Ng will also be suggested for additional coverage or for a different perspective on a given question.

## Prerequisites / corequisites

This course is ideally intended as an Introduction to Deep Learning for students who have already completed undergraduate or Masters level modules on:

 * Artificial Intelligence, and, or
 * Machine Learning

A Machine Learning course is a minimum co-requisite for this course. In other words, while it would be ideal that the Machine Learning module be taken already, it is also acceptable if you are taking Machine Learning in parallel to this module.  

In the first few weeks the essential background topics of Linear and Logistic Regression will be covered. While these elements do overlap with the content of many Machine Learning modules, they are so essential here that we must make sure to cover them properly here. No other background Machine Learning concepts will be addressed such as distinctions between learning types; or benchmarking machine learning performance.

The module is suitable for someone who has already taken an online module in Deep Learning or otherwise consulted books etc. While no new material will be covered, the course will give you an opportunity to reflect on important topics with the support of fellow students.

All assignments must be coded up using Python with related libraries. If you have not already coded in Python, now is a good time to start. Introductory tutorials to Python and the use of the SKLearn (scikit) package should be consulted. In short, make sure that you can load a data set, perform k-fold cross validation of an SVM based classifier, and then run and present metrics such as the F1 score.


### Learning Outcomes

Our Learning Outcomes cover the following:

1. Use linear and logistic regression to build supervised machine learning models.
2. Use Deep Learning frameworks to implement Deep Learning methods for classification and or regression tasks.
3. Compare and contrast alternative activation types and cost functions in practical regression and classification tasks.
4. Apply convolutional networks for image and or text classification tasks.
5. Critically evaluate Recurrent Neural Networks against contemporary architectures for language processing tasks.
6. Compare stochastic gradient descent to other methods such as the Adam optimizer in training in Deep Neural Networks.
7. Evaluate the role of hardware in achieving efficient neural network training and deployment.

Knowing what these are in advance isn't particularly useful, but some students do ask for them. Also it is worth noting that we will go beyond these simple few learning outcomes to give you as rounded as possible a view on particular Deep Learning based modelling.

## Course Structure

This is a 13 class programme. The course structure is subject to change and this content should be considered indicative. The earlier weeks focus in detail on low level issues and particularly in the first few weeks we will be purposfully replicating material that you should find in a statistics or intro to ML course. As we move through the course we will move to more high-level descriptions. This will be reflected in the lecture material style. The first half of the course puts a large emphasis on the Jupyter notebooks and detailed python code, but as we move forward the amount of material covered in the notebooks will ease off as we cover more content at a more abstract level in slide format instead.

###### Week 1 - Introduction
 * Course Scope
 * Prerequisites
 * Course Structure
 * Assignments and Assessment
 * Coding
 * Getting started with Colab

###### Week 2 - Linear Regression
 * Introduction to Linear Regression
 * Fitting functions to data
 * Cost Functions
 * Gradient Descent
 * Normalization of Data

######  Week 3 - Logistic Regression
 * Modeling Binary Data
 * The Logistic Function
 * The Cost Function for Logistic Units
 * Limits for Logistic Function
 * Higher Order Functions
 * From Linear to Non-Linear Logistic Classifiers

###### Week 4 - Neural Network Essentials
 * Units
 * Layers
 * Bias Units
 * Building non-linear functions
 * The Feed Forward Algorithm
 * Loss
 * Overview of Backpropagation Methods
 * Optimization
 * Stochastic Gradient Descent and Variants

###### Week 5 - Training / Refining Backpropogation
 * Review of Backpropagation Methods
 * Deriving the Backpropagation Equations
 * The Backpropagation Algorithm
 * Cross Entropy Loss function
 * Softmax Layers

###### Week 6 - Improving Performance / Preventing Overfitting
 * Alternative Units: Hyperbolic Tanget Units and RELU
 * Preventing Overfitting
 * Regularisation to Linear and Logistic Regression
 * Regularization in Neural Networks
 * Early Stopping
 * Dropout

###### Week 7 - Convolutional Neural Networks
 * Images as multi-channel data
 * Convolutions
 * Pooling Layers
 * Implementing a CNN
 * Assignment Workshop

###### Week 8 - Representations & Transfer Learning and Test
 * Re-Using Weights
 * Challenges of Fine-Tuning
 * When does Transfer Learning not work.
 * Online Open Book Test.

###### Week 9 - Recurrent Neural Networks and LSTM for Generation
 * Basic topology
 * Motivating examples
 * Long Short Term Memory
 * Language Modelling Example
 * Assignment Workshops

###### Class 10 - Attention & Transformers
 * Intro to Attention Models
 * Generative Models
 * Encoder-Decoder Technologies
 * Transformers
 * More on Transformers -- Bert, ELMO, Roberta and their friends
 * Working with Transformers
 * Assignment Workgoups

###### Week 11 - Hardwqare Optimization and Sustainable AI
 * Momentum in Optimization
 * Randomness in Optimization
 * Hardware Optimisation
 * Training versus Inference Hardware

###### Week 12 - Reinforcement Learning
 * Q-Learning
 * Optimization Policy
 * Deep Q-Learning
 * Other Variants

###### Week 13 - Final Quiz and Interviews

Classes will be delivered approximately weekly on TUESDAY evening between 6pm and 9pm. The class will be delivered in a Hybrid Fashion.

### Delivery Model

The course is a 5 ECTS course at Level 9/10. There are three contact hours per week, but students are expected to engage in a significant amount of self-study each week. The contact hours will be used for content delivery, group discussion, tests, reviewing assignments, and addressing any issues with respect to the delivery of the course. With the exception of Week 1, I will try to make lecture notes available in advance of the class. Therefore, students are strongly encouraged to study the content for each module in advance of that class. The lecture on Tuesday will cover key highlights in the module content provided in Jupyter notebooks, but will not be going through each notebook in fine detail. All content will be made available in the Deep Learning module on Brightspace.

## Assignments and Assessment
This is a 100\% Continuous Assessment course. The assessment of the course will be broken down as follows:

 * 40\% on in-class tests
 * 60\% on Project Work

There will be two in-class tests to encourage students to continuously engage with the material and take on assignments. The first test is in Week 8 (subject to confirmation or change) while the second test is at the end of the semester. These are open book tests that are designed to be challenging by asking questions around problem solving rather than just memorisation. Therefore genuine study will be required to make sure you get a good grade on these tests.

The Project is a coding / modeling task which will allow the student to demonstrate a clear understanding of the concepts covered in the course. A dataset will be provided and students will be required to submit operational code and a short but detailed report on their model. Specific instructions on the project will be provided early in the course. The project will be due at the end of the semester. All students are required to orally present on their project and answer questions on their models / code at the end of the year. The project has an optional group element.

## Coding
In this course we will use Python extensively for all examples and assignments. Rather than using vanilla Python we will where appropriate make use of Python packages that provide enhanced functionality for numerical computing and Deep Learning. Examples have been coded up using Python 3.10. You should set up your environment for Python 3.10 for minimal problems replicating code and examples.

The following packages are some of the most frequently used in this course:
 * **numpy**
 * **scipy** - often referred to as SciKit
 * **matplotlib**
 * **tensorflow**
 * **pytorch**

If you aren't familiar with either Python or the first three of these specific packages, now is the time to get familiar. We will use tutorail time in week 1 to discuss these and work through installations.

Course notes and all examples will be coded up in the **Jupyter Notebook** environment. Notebooks will be made available on Brightspace for download. Students who do not already use Jupyter Notebook should install Jupyter Notebook. Note however that most notebooks will also be tested and deployed on Google Co-Lab. Again, this will be discussed in the Week 1 tutorial.

For your work environment, you should make use of a virtual environment to manage your python code. I suggest that you setup your environment using venv and pip. I discourage the use of Anaconda as it takes away too much control from you. For coding, Jupyter is good for short examples. But for longer projects, there are many good IDEs available including Atom (with suitable plugins) and PyCharm.

As indicated, the course will place emphasis on practical understanding - with attention placed on both the practical implementation and use of important model types. As such early models will be explained with two types of examples: **The Hard Way** and **The Easy Way**.

### The Hard Way

In early weeks material will be explained from first principles. We refer to this way of doing things as **The Hard Way**. In these examples we will make use of the **numpy** library for performing numerical operations such as matrix multiplication or transposition. However we will in general be designing and coding important concepts such as neuron types, the backpropogation algorithm etc. with little reliance on well known libraries. The emphasis here will be on understanding how the algorithm works. In these cases many simplifying assumptions will often be made.

### The Easy Way

In many weeks we will also use examples to show how the newly introduced concept can be quickly implemented using well known 3rd party libraries such as **scipy** or **TensorFlow**, or **PyTorch**. The emphasis here will often be on more complex examples which demonstrate the true power or limitations of the models that we are investigating. The easy way tends to refer to classic Tensorflow type operations where we see the computational graph that is being built up.

Even within the Easy Way of coding, there can be variants in how we tackle certain types of models. Pure Tensorflow is very cumbersome -- although it makes it very clear what is actually going on in training and inference. Wrappers for underlying logic like Tensorflow are common. Keras is the best known example of such a wrapper. Keras allows us to rapidly prototype models in Tensorflow (and in other Deep Learning libraries) but it is not always transparent about what is going on 'under the hood'.

### Hardware Required

Students are expected to have running installations of scipy and TensorFlow running on their own machines for testing. Your machine does not have to be particularly fast for most problems. For the first half of the course in particular, you should be able to run sample code on a moderate machine. A GPU is not needed for that code. However for some models discussed in the 2nd part of the course, more serious hardware would be required.

Google Codelab is a nice platform where you can run even big models for free -- though you are limited in the amount of time the model can be left running for.


# Appendices

Below you will find a couple of useful reference posts for issues such as notation and linear algebra in python.

## Appendix A - Notation

Given a training set we talk about:
 * $m$ = number of training examples
 * $x$'s = input variables or features
 * $y$'s = output variable or target
 * $(x,y)$ = one training example
 * $(x^{i},y^{i})$ = refers specifically to the ith training case

## Appendix B - Linear Algebra in Python

In python we can use the `numpy` library to easily define and perform operations on vectors and matrices.

### Matrices
The term **matrix** refers to a 2D rectangular array of numbers.

In [3]:
import numpy as np

# We can define a matrix directly from a string of numbers where rows are delimited by semi-colons
A = np.matrix('1 2 3; 4 5 6')
print(A)

# Alternatively we can define the same matrix from a series of vectors where each vector defines a row of the matrix
B = np.matrix([[1, 2, 3], [4, 5, 6]])
print(B)

# Alternatively we can use numpy's array constructor to creaea a 2D array, i.e., our matrix
B = np.array([[1, 2, 3], [4, 5, 6]])
print(B)

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]


A matrix will have a certain *dimensionality* defined in terms of the number of rows and the number of columns. We can use standard number theory notation to define the dimensionality of the matrix. For example we can define the set of matrices of real numbers with 3 columns and 2 rows as $R^{3x2}$.

The **shape** function in numpy will return the number of rows and columns for a given matrix.

In [4]:
print(A.shape)

(2, 3)


We refer to individual objects within the matrix as **elements**. We use subscript notation on the matrix name in order to refer to individual objects. In general $M_{i,j}$ will refer to the element found on the $i^{th}$ row and $j^{th}$ column of M.

Numpy supports a wide range of methods for indexing and slicing arrays which we will not cover here. In the simple case we can however use indices to operate directly on the matrix as follows.

In [5]:
print(A[1,1])

5


Note that indexing on the rows and columns of numpy matrices begins at 0.

### Vectors
A vector is a 1D array and as such can be thought of as a special case of a matrix with only one column.

While the vector is in general a special case of a matrix, we typically use different operators to create and work with vectors.

In [6]:
# create a numpy vector by way of a stadard python list fed into the array constructor
v1 = np.array([2,3,1,0])
print(v1)

# note that this is not equivilent to attempting to create the array as a single row of a matrix
v2 = np.matrix([[2,3,1,0]])
print(v2)

# index an element in the vector
print(v1[1])

[2 3 1 0]
[[2 3 1 0]]
3


### Matrix and Vector Basic Operations

We can add and subtract matrices which are of the same dimensionality to result in a new matrix which is of that same dimensionality.

In [7]:
C = A + B
print(C)
D = C - A
print(D)

[[ 2  4  6]
 [ 8 10 12]]
[[1 2 3]
 [4 5 6]]


We can do the same for vectors

In [8]:
v2 = np.array([10,20,30,40])

v3 = v1 + v2
print(v3)
v3 = v2 - v1
print(v3)

[12 23 31 40]
[ 8 17 29 40]


We can also directly apply scalar multiplication and division operations to matrices and vectors.  

In [9]:
E = A * 2
print(E)
F = A / 2
print(F)
v4 = v2 * 2
print(v4)
v5 = v2 / 2
print(v5)

[[ 2  4  6]
 [ 8 10 12]]
[[0.5 1.  1.5]
 [2.  2.5 3. ]]
[20 40 60 80]
[ 5. 10. 15. 20.]


### Inner / Scalar Product
The Inner Product or Scalar Product of two vectors is the scalar result of summing the pairwise products of elements in two vectors of equal length. In geometric space the Scalar Product is often interpreted as a distance metric between two points in that space. This is often used for example to calculate document similarities and in clustering.

In python we can calculate the scalar product of two vectors using numpy's inner function.

In [10]:
v3 = np.dot(v1,v2)
print(v3)

110


Note that the dot function will also produce this result when applied to two vectors. However the dot function can also be applied to matrices and higher-order arrays.

### Matrix-Matrix Multiplication
Given two matrices we can calculate the cross product of these matrices so long as the number of rows in the first matrix equals the number of columns in the second.

For two matrices A and B, the dimensionality of the resultant matrix product is given by:

\begin{equation}
R_{A}C_{A} \times R_{B}C_{B} = R_{A}C_{B}
\end{equation}

The operation for calculating the matrix product is straightforward: the entry for row i column j in the resultant matrix C is the dot product of the i$^{th}$ row of A and the j$^{th}$ column of B.

![matrix matrix multiplication](figures/img792.gif)

In python we can use the numpy function **matmul** to perform matrix-matrix multiplication.

In [11]:
G = np.matrix('1 2; 3, 4; 5, 6')
print(G)
H = np.matmul(A,G)
print(H)

[[1 2]
 [3 4]
 [5 6]]
[[22 28]
 [49 64]]


Remember that matrix matrix multiplication is not commutative.
\begin{equation}
  A \times B \neq B \times A
\end{equation}

In [12]:
A = np.matrix('1, 2, 3; 4, 5, 6; 7, 8, 9')
B = np.matrix('1, 1, 1; 2, 2, 2; 3, 3, 3')
print(A)
print(B)
print(np.matmul(A,B))
print(np.matmul(B,A))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 1 1]
 [2 2 2]
 [3 3 3]]
[[14 14 14]
 [32 32 32]
 [50 50 50]]
[[12 15 18]
 [24 30 36]
 [36 45 54]]


### Matrix Identity and Inverse

We can define the identity $I$ of a matrix which in general allows the commutation property to hold.

\begin{equation}
 A \times I = I \times A = A
\end{equation}

Here $I$ is the Identity Matrix which is a square matrix where all diagonal elements are = 1 and all non-diagonal elements are = 0.

Numpy allows us to easily define an identity matrix with a specified number of rows and columns.

In [13]:
I = np.identity(3)
print(I)
print(A)
print(np.matmul(A,I))
print(np.matmul(I,A))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


Just as we can define the inverse for a real number $X \in R$ as $\frac{1}{X}$, we can also define the inverse for a matrix.

Beginning first with the case of real numbers, we note that:

\begin{equation}
 X \times INV(X) = I
\end{equation}

where I is the identity for real numbers - which is 1.

This gives us an intuition of how the inverse is defined for matrices:  

\begin{equation}
 A \times INV(A) = I
\end{equation}

i.e., the inverse of a matrix A should be defined such that the matrix product of A by it produces an identity matrix.

While we straightforwardly calculate the inverse of a real number X as $\frac{1}{X}$, the calculation of the inverse of a matrix is more complicated and involves calculations of Determinants and Cofactors of matrices which we will not consider here. Fortunately we can of course calculate the inverse directly in numpy.

In [14]:
import numpy.linalg as la
X = np.matrix('100, -200, 403; 44, -5, 607; 27, 98, -59')
B = la.inv(X)
print(B)
print(np.matmul(X,B))

# note that the process doesn't work very well when the candidate matrix is close to 0
print("Uncomment lines in the block to run this - but expect an error!")
# B = la.inv(A)
# print(np.matmul(A,B))

[[ 0.00746988 -0.00349497  0.01506633]
 [-0.0023959   0.00211775  0.00542254]
 [-0.00056121  0.00191823 -0.00104746]]
[[ 1.00000000e+00  4.31512465e-17  2.29850861e-17]
 [ 4.47775497e-17  1.00000000e+00 -5.16080234e-17]
 [ 4.32596667e-17 -3.10081821e-17  1.00000000e+00]]
Uncomment lines in the block to run this - but expect an error!


### Matrix Transpose
We can also define the transpose of a matrix $A^{T}$ as a matrix such that rows and columns of $A$ are reversed.

\begin{equation}
   A_{ij} = A_{ji}
\end{equation}

In [15]:
C = A.T
print(A)
print(C)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 4 7]
 [2 5 8]
 [3 6 9]]


### Do I need all this Linear Algebra?

Knowledge of the issues above is typically not needed just to run basic Deep learning modeling tasks. However, in the first number of weeks while we are figuring out why things work they way that they do, this is essential.