The following additional libraries are needed to run this
notebook.

In [1]:
!pip install d2l==0.17.2

Collecting d2l==0.17.2
  Downloading d2l-0.17.2-py3-none-any.whl (82 kB)
[K     |████████████████████████████████| 82 kB 126 kB/s 
[?25hCollecting numpy==1.18.5
  Downloading numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
[K     |████████████████████████████████| 20.1 MB 1.2 MB/s 
[?25hCollecting requests==2.25.1
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 7.3 MB/s 
Collecting matplotlib==3.3.3
  Downloading matplotlib-3.3.3-cp37-cp37m-manylinux1_x86_64.whl (11.6 MB)
[K     |████████████████████████████████| 11.6 MB 58.4 MB/s 
[?25hCollecting pandas==1.2.2
  Downloading pandas-1.2.2-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)
[K     |████████████████████████████████| 9.9 MB 43.6 MB/s 
Installing collected packages: numpy, requests, pandas, matplotlib, d2l
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
  Att

In this section, we will show you how to implement the linear regression model concisely by using high-level APIs of deep learning frameworks.

##3.3.1. Generating the Dataset

In [2]:
import numpy as np
import tensorflow as tf
from d2l import tensorflow as d2l

true_w = tf.constant([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)

##3.3.2. Reading the Dataset

Rather than rolling our own iterator, we can call upon the existing API in a framework to read data. We pass in features and labels as arguments and specify batch_size when instantiating a data iterator object. Besides, the boolean value is_train indicates whether or not we want the data iterator object to shuffle the data on each epoch (pass through the dataset).

In [3]:
def load_array(data_arrays, batch_size, is_train=True):
    """Construct a TensorFlow data iterator."""
    dataset = tf.data.Dataset.from_tensor_slices(data_arrays)
    if is_train:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(batch_size)
    return dataset

batch_size = 10
data_iter = load_array((features, labels), batch_size)

Now we can use data_iter. To verify that it is working, we can read and print the first minibatch of examples.

In [4]:
next(iter(data_iter))

(<tf.Tensor: shape=(10, 2), dtype=float32, numpy=
 array([[-0.26172063,  0.38331038],
        [-1.1276855 , -1.6509379 ],
        [ 0.9434952 ,  0.90271693],
        [ 0.62092966,  1.8308698 ],
        [ 1.1200039 , -0.5776032 ],
        [ 0.9453936 , -0.16542035],
        [ 1.5894574 ,  0.67671317],
        [-0.48715246, -0.36461625],
        [-0.87483805,  0.87743634],
        [ 0.43445507, -1.0172988 ]], dtype=float32)>,
 <tf.Tensor: shape=(10, 1), dtype=float32, numpy=
 array([[ 2.3603122],
        [ 7.5584755],
        [ 3.0224087],
        [-0.781395 ],
        [ 8.392234 ],
        [ 6.6481028],
        [ 5.0897408],
        [ 4.480731 ],
        [-0.5316329],
        [ 8.524435 ]], dtype=float32)>)

##3.3.3. Defining the Model

For standard operations, we can use a framework’s predefined layers, which allow us to focus especially on the layers used to construct the model rather than having to focus on the implementation.

Recall the architecture of a single-layer network. The layer is said to be fully-connected because each of its inputs is connected to each of its outputs by means of a matrix-vector multiplication.

In Keras, the fully-connected layer is defined in the Dense class. Since we only want to generate a single scalar output, we set that number to 1.

In [5]:
# `keras` is the high-level API for TensorFlow
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(1))

##3.3.4. Initializing Model Parameters

Before using net, we need to initialize the model parameters, such as the weights and bias in the linear regression model.

Here we specify that each weight parameter should be randomly sampled from a normal distribution with mean 0 and standard deviation 0.01. The bias parameter will be initialized to zero.

The easiest way to specify the initialization method in Keras is when creating the layer by specifying kernel_initializer. Here we recreate net again.

In [6]:
initializer = tf.initializers.RandomNormal(stddev=0.01)
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(1, kernel_initializer=initializer))

We are initializing parameters for a network even though Keras does not yet know how many dimensions the input will have! It might be 2 as in our example or it might be 2000. Keras lets us get away with this because behind the scenes, the initialization is actually deferred. The real initialization will take place only when we for the first time attempt to pass data through the network. Just be careful to remember that since the parameters have not been initialized yet, we cannot access or manipulate them.

##3.3.5. Defining the Loss Function

The MeanSquaredError class computes the mean squared error. By default it returns the average loss over examples.

In [7]:
loss = tf.keras.losses.MeanSquaredError()

##3.3.6. Defining the Optimization Algorithm

Minibatch stochastic gradient descent is a standard tool for optimizing neural networks and thus Keras supports it alongside a number of variations on this algorithm in the optimizers module. Minibatch stochastic gradient descent just requires that we set the value learning_rate, which is set to 0.03 here.

In [8]:
trainer = tf.keras.optimizers.SGD(learning_rate=0.03)

##3.3.7. Training

Once we start working with much more complex models, advantages of high-level APIs will grow considerably.

To refresh your memory: for some number of epochs, we will make a complete pass over the dataset (train_data), iteratively grabbing one minibatch of inputs and the corresponding ground-truth labels. For each minibatch, we go through the following ritual:

Generate predictions by calling net(X) and calculate the loss l (the forward propagation).

Calculate gradients by running the backpropagation.

Update the model parameters by invoking our optimizer.

For good measure, we compute the loss after each epoch and print it to monitor progress.

In [9]:
num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        with tf.GradientTape() as tape:
            l = loss(net(X, training=True), y)
        grads = tape.gradient(l, net.trainable_variables)
        trainer.apply_gradients(zip(grads, net.trainable_variables))
    l = loss(net(features), labels)
    print(f'epoch {epoch + 1}, loss {l:f}')

epoch 1, loss 0.000261
epoch 2, loss 0.000099
epoch 3, loss 0.000098


Below, we compare the model parameters learned by training on finite data and the actual parameters that generated our dataset. 

In [10]:
w = net.get_weights()[0]
print('error in estimating w', true_w - tf.reshape(w, true_w.shape))
b = net.get_weights()[1]
print('error in estimating b', true_b - b)

error in estimating w tf.Tensor([-0.00031352  0.00027657], shape=(2,), dtype=float32)
error in estimating b [9.8228455e-05]


##3.3.8. Summary

* Using TensorFlow’s high-level APIs, we can implement models much more concisely.

* In TensorFlow, the data module provides tools for data processing, the keras module defines a large number of neural network layers and common loss functions.

* TensorFlow’s module initializers provides various methods for model parameter initialization.

* Dimensionality and storage are automatically inferred (but be careful not to attempt to access parameters before they have been initialized).