# Lecture 1 notebook
## Introduction to TensorFlow and Deep Learning

## IADS Summer School, 1st August 2022

### Dr Michael Fairbank, University of Essex, UK

- Email: m.fairbank@essex.ac.uk
- This is a Jupyter Notebook to accompany Lecture 1 of the course

### Check Python engine is running

You need to check you are on python 3.6 or above

In [1]:
print("Hello World")
import sys
print("Python Version",sys.version)

Hello World
Python Version 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]


## Check tensorflow version

You should see a version > 2.0

In [3]:
import tensorflow as tf
print(tf.__version__)
# importing tensorflow might take a while

2.9.1


# Lecture slide content for Basic Concepts

- Copy the lecture slide contents into the appropriate cells below, run each code block, and check you get the right answer.
- There is no need to keep up with all of these in time with the lecture - just do the ones you are curious about for now.  Priorise keeping listening with the lecture and come back to any gaps later.


In [4]:
# Slide Title: Basic concepts- Tensor scalars, and numpy
# Code (TODO)...
a=tf.constant(2,tf.float32) # rank-0 tensor (i.e. a scalar) of type float32 with value of 2
print(a)
b=tf.constant(3,tf.float32) # rank-0 tensor (i.e. a scalar) of type float32 with value of 3
print(b)
c=tf.add(a,b) # addition
print(c) # This just says c is a Tensor of shape() (rank zero), type float32, with value 5.0
print(c.numpy()) # Converts from tensorflow datatype to a numpy datatype (apparently without loading numpy explicitely)

tf.Tensor(2.0, shape=(), dtype=float32)
tf.Tensor(3.0, shape=(), dtype=float32)


In [6]:
# Slide Title: Basic concepts – tensor addition
# Code (TODO)...
a=tf.constant([[1,2],[3,4]])
b=tf.constant([[5,6],[8,9]])
print(a,
      b)
c=tf.add(a,b)
print(c.numpy())

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32) tf.Tensor(
[[5 6]
 [8 9]], shape=(2, 2), dtype=int32)
[[ 6  8]
 [11 13]]


In [10]:
# Slide Title: Basic concepts – tensor multiplication 
# Code (TODO)...
a=tf.constant([[1,2],[3,4]])
b=tf.constant([[5,6],[8,9]])
c=tf.multiply(a,b) # Elementwise multiplication (“Hadamard product”)
print(a.numpy())
print(b.numpy())
print(c.numpy())

[[1 2]
 [3 4]]
[[5 6]
 [8 9]]
[[ 5 12]
 [24 36]]


In [12]:
# Slide Title: Basic concepts – matrix multiplication 
# Code (TODO)...
a=tf.constant([[1,2],[3,4]],tf.float32) # A rank-2 tensor (i.e. a 2*2 matrix)
print(a.numpy())
b=tf.constant([[1],[1]], tf.float32) # A rank-2 tensor (a 2*1 matrix)
print(b.numpy())
c=tf.matmul(a,b) # Matrix multiplication
print(c.numpy())

# Matrix multiplication
# [[1 2]    [[1]    [[1*1 + 1*2]    [3]
          *       =               =
#  [3 4]]    [1]]    [1*3 + 1*4]]   [7]

[[1. 2.]
 [3. 4.]]
[[1.]
 [1.]]
[[3.]
 [7.]]


In [14]:
# Slide Title: Basic concepts – datatypes: https://www.tensorflow.org/api_docs/python/tf/dtypes/DType
# Code (TODO)...
a=tf.constant(3.2, tf.float32)
print(a)
b=tf.constant(3, tf.int32)
print(b)
c=tf.constant([1,2,3], tf.float32)
print(c)
d=tf.constant(5) # This defaults to int32
print(d)
tf.constant(5.0) # This defaults to float32
print(e)

# Also have tf.float64, tf.int64, tf.bool

tf.Tensor(3.2, shape=(), dtype=float32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(5.0, shape=(), dtype=float32)


In [15]:
# Slide Title: Basic concepts – casting datatypes (1): convert one datatype to another
# Code (TODO)...
a=tf.constant([[1,2],[3,-4]],tf.float32)
print(tf.cast(a,tf.int32).numpy()) # This is now an integer tensor

b=tf.constant([True, False, True], tf.bool)
print(tf.cast(b,tf.int32).numpy()) # Bools cast using True=1, False=0

[[ 1  2]
 [ 3 -4]]
[1 0 1]


In [16]:
# Slide Title: Basic concepts – casting datatypes (2)
# Code (TODO)...
# You can’t add datatypes that don’t match --> code will produce error
a=tf.constant(3.0, tf.float32)
b=tf.constant(3, tf.int32)
c=tf.add(a,b)

InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2]

In [17]:
# we have to cast it like this
c=tf.add(a,tf.cast(b,tf.float32))
print(c.numpy())

6.0


In [18]:
# Slide Title: Basic concepts – tensor shape (1): Tensor shapes must match for most operations
# Code (TODO)...
# code will produce error
a=tf.constant([1,2])
b=tf.constant([2,3,1])
f=tf.add(a,b)

InvalidArgumentError: Incompatible shapes: [2] vs. [3] [Op:AddV2]

However there is a shorthand that violates these size-matching rules  
When the rank of one matrix is less than the other it tries to add them in  the most sensible way (if possible)  
  
this behaviour is called “broadcasting”: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

In [None]:
# Slide Title: Basic concepts – tensor shape (2)
# Code (TODO)...
a=tf.constant([1,2])
b=tf.constant(1)
print(tf.add(a,b).numpy())

In [None]:
# Slide Title: Basic concepts – tensor shape (3)
# Code (TODO)...
a=tf.constant([[1,2],[3,4]])
b=tf.constant([10,20])
print(tf.add(a,b).numpy())

In [20]:
# Slide Title: Elementwise Tensor operations
# Code (TODO)...
# These elementwise operations produce a tensor of equal size to the input

a=tf.constant([[1,2],[3,-4]],tf.float32)
print(tf.square(a).numpy())
print(tf.abs(a).numpy())

print(tf.tanh(a).numpy())

[[ 1.  4.]
 [ 9. 16.]]
[[1. 2.]
 [3. 4.]]
[[ 0.7615942  0.9640276]
 [ 0.9950547 -0.9993292]]


In [22]:
# Slide Title: Comparison Tensor operations: https://www.tensorflow.org/api_docs/python/tf/math/greater
# Code (TODO)...

a=tf.constant([1,2,3])
b=tf.constant([5,1,7])
print(tf.greater(a,b).numpy())

print(tf.greater(a,1).numpy()) # This is “broadcasting” the mismatching tensor sizes, i.e., changing 1 to [1,1,1] before comparison

[False  True False]
[False  True  True]


In [25]:
# Slide Title: Basic concepts – operator shorthand
# Code (TODO)...
# a+b --> tf.add(a,b)
# a-b --> tf.subtract(a,b)
# a*b --> tf.multiply(a,b) 
# a@b --> tf.matmul(a,b)
# a>b --> tf.greater(a,b)

a=tf.constant([[1,2],[3,4]])
b=tf.constant([[5,6],[8,9]])
print((a+b).numpy())
print((a*b).numpy())
print((a@b).numpy())

[[ 6  8]
 [11 13]]
[[ 5 12]
 [24 36]]
[[21 24]
 [47 54]]


In [26]:
# Slide Title: Basic concepts – variables vs. constants (1): 
# Code (TODO)...
# Unlike “constants”, all “Variables” can be updated. 
W = tf.Variable([0.3], tf.float32)
W.assign([-1.0])
print(W.numpy())

# Constants however, once created, cannot be “reassigned”
x = tf.constant([0.3], tf.float32)
x = tf.constant([-1], tf.float32)
print(x.numpy())

[-1.]
[-1.]


In [27]:
# Slide Title: Basic concepts – variables vs. constants (2): Variables also have assign_add and assign_sub
# Code (TODO)...
W = tf.Variable([0.3], tf.float32)
W.assign_add([1.0]) # Analogous to W+=1
print(W.numpy())
W.assign_sub([1.0])
print(W.numpy())

[1.3]
[0.29999995]


In [28]:
# Slide Title: Aggregation functions (1)
# Code (TODO)...
a=tf.constant([[1,2],[3,4]],tf.float32)
print(tf.reduce_sum(a).numpy()) # sums up all values in the tensor and REDUCES it to a single number
print(tf.reduce_mean(a).numpy()) # computes mean of all values in tensor
print(tf.reduce_max(a).numpy()) # computes max of all values in tensor

10.0
2.5
4.0


In [29]:
# Slide Title: Aggregation functions (2)
# Code (TODO)...
a=tf.constant([[1,2],[3,4]],tf.float32)
print(tf.reduce_sum(tf.cast(a>1,tf.float32)).numpy())

3.0


In [30]:
# Slide Title: Aggregation functions (3)
# Code (TODO)...
a=tf.constant([4,0,5,-4],tf.float32)
print(tf.reduce_max(a).numpy())
print(tf.argmax(a).numpy()) # Argmax counts the **index** at which the max element appears

5.0
2


In [31]:
# Slide Title: Aggregation functions across an axis (1)
# Code (TODO)...
a=tf.constant([[1,2],[3,4]])
print(tf.reduce_sum(a, axis=0).numpy()) # columns
print(tf.reduce_sum(a, axis=1).numpy()) # rows

[4 6]
[3 7]


In [32]:
# Slide Title: Aggregation functions across an axis (2)
# Code (TODO)...
a=tf.constant([[5,10,0],[3,4,12]])
print(tf.reduce_max(a, axis=0).numpy())
print(tf.argmax(a, axis=1).numpy())

[ 5 10 12]
[1 2]


### Automatic differentiation

Autodiff is fast and exact differentiation
- Not numerical differentiation (which is neither exact nor fast)
- Not symbolic differentiation either
- it’s something in between
- If you give it code to compute a function f(x), it will write corresponding program code for you that calculated df/dx
  
https://justindomke.wordpress.com/2009/02/17/automatic-differentiation-the-most-criminally-underused-tool-in-the-potential-machine-learning-toolbox/

In [34]:
# Slide Title: Automatic differentiation (Autodiff) (1)
# Code (TODO)...
x = tf.Variable(3.0, tf.float32)
with tf.GradientTape() as tape:
    y=tf.pow(x,2.0)
    dydx=tape.gradient(y, x)
print(dydx.numpy())

6.0


Autodiff also works when there is more than one input variable:  
https://en.m.wikipedia.org/wiki/Partial_derivative

In [35]:
# Slide Title: Automatic differentiation (Autodiff) (3)
# Code (TODO)...
x=tf.Variable(4.0,tf.float32)
y=tf.Variable(2.0,tf.float32)
with tf.GradientTape() as tape:
    f=tf.pow(x,2.0)*3.0+y
[dfdx, dfdy]=tape.gradient(f, [x,y]) # Fetching two derivatives at once
print(dfdx.numpy(), dfdy.numpy())

24.0 1.0


If you want a derivative w.r.t. a constant, then you need tape.watch(…)

Autodiff also works when the input variables are higher rank tensors.  
AUTODIFF makes neural-network training much easier to program
- Autodiff replaces “backpropagation” programming
- Backpropagation is replaced by autodiff

In [36]:
# Slide Title: Automatic differentiation (Autodiff) (4)
# Code (TODO)...
x=tf.constant(4.0,tf.float32) # A “constant” requires watching
with tf.GradientTape() as tape:
    tape.watch(x) # A “constant” requires watching
    f=tf.pow(x,3.0)
dfdx=tape.gradient(f, x)
print(dfdx.numpy())

48.0


# Gradient Descent Exercise

- In this exercise we will build a gradient descent script to minimise $y = x^2 − 4x + 4$ with respect to $x$.
- Please tackle this exercise carefully - this is the main exercise of this lecture!

- Gradient Descents are an iterative process of algorithm optimisation where the minimum of x with respect to the function y 

#### Exercise 1:

In [None]:
# Slide Title: Example 1D Gradient Descent problem
# Complete the 3 TODOs below and run the code to solve the minimisation challenge...
import tensorflow as tf
eta = 0.1 # learning rate
x = tf.Variable(10.0, tf.float32) # arbitrary initial value

# Exercise 1: Use gradient descent to find the minimum of 𝑦 = 𝑥^2 − 4𝑥 + 4
for i in range(50):
    with tf.GradientTape() as tape:
        y=tf.pow(x,2.0)-tf.multiply(x,4)+4#TODO put in formula for y in terms of x here
    dydx=tape.gradient(y,x)# TODO finish this line
    x.assign(tf.subtract(x,tf.multiply(eta, dydx)))# TODO finish x_(t+1)=x_t-eta*dydx
    # note: We didn’t need to give the iterative variable’s steps different variable names 𝑥1, 𝑥2, … We just called them all “x”
    print("iteration:",i, "x:", x.numpy(), "y:", y.numpy())

## Optimised versions of Gradient Descent Exercise

- Now we will repeat the above exercise (and hopefully get exactly the same results again)
- But now we will use some higher-level TensorFlow functions...

#### Exercise 1, Version 2: @tf.function

Inefficiency:
- Recalculates the automatic differentiation formula every step of loop!
  
Fix this by:
1. pulling out guts of main loop into a separate python function  
    def do_update():
2. Annotate this function by “@tf.function”

In [None]:
# Slide Title: Optimised Version (Exercise 1, Version 2)
# Code (TODO)...
import tensorflow as tf

eta = 0.1 # learning rate
x = tf.Variable(10.0, tf.float32) # arbitrary initial value

@tf.function
def do_update(x):
    with tf.GradientTape() as tape:
        y=tf.pow(x,2.0)-4.0*x+4.0
    dydx=tape.gradient(y, x)
    x.assign(x-dydx*eta)
    return y

for i in range(50):
    y=do_update(x)
    print("iteration:",i, "x:", x.numpy(), "y:", y.numpy())

- Use function annotation @tf.function to speed up execution, take advantage of the GPU, and for saving models
    - It allows tensorflow to cache the graph of computations so that it doesn’t have to recalculate them (or the derivatives) every iteration.
- Adding the @tf.function in this task sped things up by around 4 times  
- Put @tf.function around the main functionality of your training loop 
    - After you’ve debugged things
    - Warning – the function you are optimising must not refer to any global variables (unless they are strictly constants)

#### Exercise 1, Version 3: Using a built-in optimizer

In [None]:
# Slide Title: Using a built-in optimizer
# Code (TODO)...
import tensorflow as tf

eta = 0.1 # learning rate
x = tf.Variable(10.0, tf.float32) # arbitrary initial value

optimizer = tf.keras.optimizers.SGD(eta) # SGD = Stochastic Gradient Descent
def calc_y():
    y=tf.pow(x,2.0)-4.0*x+4.0
    return y

for i in range(50):
    optimizer.minimize(calc_y, [x])
    print("iteration:",i, "x:", x.numpy(), "y:", calc_y().numpy())

Other built-in optimizers work better with neural networks:
- optimizer = tf.keras.optimizers.SGD(eta)
- optimizer = tf.keras.optimizers.Adam()
- optimizer = tf.keras.optimizers.RMSProp()


In [None]:
# Slide Title: Using a built-in optimizer
# Code (TODO)...
import tensorflow as tf

x = tf.Variable(10.0, tf.float32) # arbitrary initial value

optimizer = tf.keras.optimizers.Adam()
def calc_y():
    y=tf.pow(x,2.0)-4.0*x+4.0
    return y

for i in range(50):
    optimizer.minimize(calc_y, [x])
    print("iteration:",i, "x:", x.numpy(), "y:", calc_y().numpy())

## End of lecture 1
