In [2]:
import tensorflow as tf
import numpy as np

We first create a set of 20 values spaced linearly between 0 and 100. We then create targets by multiplying with pi. 

So our dataset is sampled from a function $f(x) = \pi x$

Now we want to approximate this function with a linear model that has the form $f_{model}(x) = a x$, where $a$ is the parameter that we want to learn.

For this we use the 20 data points, initialize the parameter $a$ to a value far from $\pi$ and then do gradient descent with the individual examples (also called stochastic gradient descent). We show the entire dataset only once to the model or in other words we only train for one epoch. For gradient descent, we use tensorflow's gradient tape for automatic differentiation to obtain loss gradients with respect to the trainable parameter $a$.


What is shown in this notebook regarding the use of tf.GradientTape() can be used for any data and any model. Indeed we can obtain the gradients with respect to millions of parameters with the same general structure, except then we get the list of trainable variables by using ".trainable_variables" on the tf.keras.Model class.

In [195]:
x = tf.linspace(0,100, 20)

In [196]:
y = x * np.pi

In [197]:
# a simple linear univariate model function without bias
def model(x, parameter):
    return x * parameter

In [198]:
# set learning rate
learning_rate = tf.constant(0.00005, dtype=tf.float64)

# initialize parameter variable to a value far away from pi
parameter_estimate = tf.Variable(520.5, trainable=True, dtype=tf.float64)


#iterate over epochs
for epoch in range(1):

    # iterate over training examples
    for x,y in zip(x,y):

        # within GradientTape context manager, calculate loss between targets and prediction
        with tf.GradientTape() as tape:

            prediction = model(x, parameter_estimate)

            loss = (prediction - y)**2

        # outside of context manager, obtain gradients with respect to list of trainable variables
        gradients = tape.gradient(loss, [parameter_estimate])

        # apply gradients scaled by learning rate to parameters
        new_parameter_val = parameter_estimate - learning_rate * gradients

        # assign new parameter values
        parameter_estimate.assign(new_parameter_val[0])

In [204]:
parameter_estimate

<tf.Variable 'Variable:0' shape=() dtype=float64, numpy=3.141592653589793>