# Predicting the quality of wines
Before we move on to implementing RL algorithms with neural networks I want to give you a brief overview of how we will be using Tensorflow. Tensorflow is a machine learning library. It lets us easily implement neural networks and perfrom operations such as stochastic gradient descent. 

I chose to use Tensorflow for my tutorials but another very popular alternative is PyTorch.

In [6]:
%%capture
!pip install tensorflow
!pip install tensorflow_datasets

## Tensors
In Tensorflow we perform mathematical operations on Tensors. Tensors are simply like higher dimensional arrays. We can for example make a 1-D tensor by wrapping a list in a Tensor. A 2-D tensor would be a matrix. We can then make 3-D, 4-D etc. tensors.

Here is an example of how we construct tensors using Tensorflow. It is very easy to convert a numpy array into a tensor.

In [1]:
import numpy as np
import tensorflow as tf

my_list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
my_np_array_of_arrays = np.array(my_list_of_lists)
my_2d_tensor = tf.convert_to_tensor(my_np_array_of_arrays)

print("Shape of Tensor:", my_2d_tensor.shape)
print("Tensor:", my_2d_tensor)

2021-08-15 19:37:58.547790: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-08-15 19:37:58.547867: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


Shape of Tensor: (3, 3)
Tensor: tf.Tensor(
[[1 2 3]
 [4 5 6]
 [7 8 9]], shape=(3, 3), dtype=int64)


2021-08-15 19:38:01.188533: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-08-15 19:38:01.188563: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-08-15 19:38:01.188582: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (pop-os): /proc/driver/nvidia/version does not exist
2021-08-15 19:38:01.189631: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Next we will make a 3-D tensor of shape `(100,8,8)`. This is a common way that batches of images are stored as a tensor. The first dimension is known as the batch dimension, in this case `100`. That means we have a batch of 100 images. The second and third dimensions are like the x and y coordinates of the pixels in the image.

Below we make a tensor filled with zeros of shape `(100,8,8)` e.g. a batch of 100 images that have dimension `8x8`.

In [3]:
my_3d_tensor = tf.zeros(shape=(100,8,8))

my_first_image = my_3d_tensor[0]
print("First image shape:", my_first_image.shape)

first_image_first_row = my_3d_tensor[0,0]
print("First row shape:", first_image_first_row.shape)
print("Row value:", first_image_first_row)

first_pixel = my_3d_tensor[0,0,0]
print("First pixel shape:", first_pixel.shape)
print("Pixel value:", first_pixel)

First image shape: (8, 8)
First row shape: (8,)
Row value: tf.Tensor([0. 0. 0. 0. 0. 0. 0. 0.], shape=(8,), dtype=float32)
First pixel shape: ()
Pixel value: tf.Tensor(0.0, shape=(), dtype=float32)


Tensors are like numpy arrays but they have several other atributes that make them apropriate for machine learning. First of all, Tensorflow Tensors can be used on a GPU instead of just your computers CPU. Performing computations on a GPU can dramatically speed up your programs. If your PC does not have a GPU then you can use one for free on a Google Colab instance. A quick google search should help you figure out how to use a GPU on Google Colab.

The second reason we use Tensorflow tensors is because they can keep track of the computations that were performed on them. This makes it possible to compute the gradients of mathematical operations performed on the tensors. This is especially useful for training neural networks because it is what makes it possible to perform stochastic gradient decent on the parameters of the neural network.

In summary, whenever we want to pass values to our neural network we need to convert them into a tensor first.

## Back to wine
We are going to train a neural network to predict the rating of wines given a vector of information about the wines. The dataset we are going to use is available from Tensorflow itself. Below we will load the dataset. Visit the webpage to get information on what each value in the wines feature vector represents. [Webpage](https://www.tensorflow.org/datasets/catalog/wine_quality)

In [38]:
## This cell is not important.
# I just wrote a function to cenvert the data from the dataset into
# into something more usable. You don't really need to know how it works
# but reading through it may behelpful because I demonstrate how to
# reshape, cast and concatenate tensors.

def preprocess_batch(batch):
    feature_dict = batch[0]
    labels = batch[1]

    # Next we convert the dict of features
    # into a list of tensors with the right 
    # dimensions.
    list_of_tensors = []
    for tensor in feature_dict.values():
        # Reshape tensor (64,) -> (64, 1)
        tensor = tf.reshape(tensor, shape=(-1, 1))
        # Cast all tensors to dtype=float32
        tensor = tf.cast(tensor, dtype='float32')
        # Append tensor to the list
        list_of_tensors.append(tensor)

    # Concatenate list of tensors into one big tensor.
    feature_tensor = tf.concat(list_of_tensors, axis=1)

    # One hot encode labels
    # i.e. the integer 2 becomes the vector [0,0,1,0,0,0,0,0,0,0]
    on_hot_labels = tf.one_hot(indices=labels, depth=10)

    # Return features and labels
    return feature_tensor, on_hot_labels



In [39]:
import tensorflow_datasets as tfds

# Download the dataset
wine_dataset = tfds.load('wine_quality', split='train', shuffle_files=True, as_supervised=True)

# Break the dataset up into batches of 64
wine_dataset = wine_dataset.batch(64)

# Loop through the dataset and
# count  how many batches there are.
cnt = 0
for batch in wine_dataset:
    # Lets print the shape of the first batch
    # to see what the data looks like.
    if cnt == 0:
        # Preprocess
        features, labels = preprocess_batch(batch)

        # Lets print the features shape.
        print("Feature shape:", features.shape)

        # Lets print one feature vector from the 
        # batch and its rating.
        print("Wine feature vector:", features[0])
        print("Wine rating:", labels[0])

    # increment counter
    cnt += 1

print("Number of batches in the dataset:", cnt)



Feature shape: (64, 11)
Wine feature vector: tf.Tensor(
[1.220e+01 6.300e-02 4.900e-01 9.911e-01 6.300e+00 3.500e+01 3.380e+00
 1.200e+00 4.200e-01 9.200e+01 2.700e-01], shape=(11,), dtype=float32)
Wine rating: tf.Tensor([0. 0. 0. 0. 0. 0. 1. 0. 0. 0.], shape=(10,), dtype=float32)
Number of batches in the dataset: 77


So, you should see from the above that our batch dimension is `64`. Our wine feature vector has length 11 and wines are rated on a scale from 1 to 10.

## Neural Networks
Now lets create a simple neural network which accepts our feature vector as input and outputs a final layer with 10 nodes. We will then say that the node with the highest value on the final layer is the networks predicted rating for the wine, i.e. if the final layer looks like `[100, 2, 3, -40, 44, 22, 1, 6, 7, 9]` then because 100 is the largest number and is in the first position, the network predicts that the wine has a rating of `1`. 

Defining a feedforward neural network in Tensorflow is simple.

In [36]:
hidden_layer = tf.keras.layers.Dense(units=100)
activation = tf.keras.layers.ReLU()
output_layer = tf.keras.layers.Dense(units=10)

list_of_layers = [hidden_layer, activation, output_layer]

network = tf.keras.Sequential(layers=list_of_layers)

# We can then send dummy data through the network 
# to initialize it. Since our feature vectors have 
# dimension 11 we can create the dummy input as follows.
# We include a dummy batch dimension of 64.
dummy_input = tf.ones(shape=(1,11), dtype='float32')

dummy_output = network(dummy_input)
print("Dummy output:", dummy_output)

print(network.summary())

Dummy output: tf.Tensor(
[[ 0.19892643 -0.36783198 -0.24432595  0.04384113  0.02561705 -0.58986026
   0.31614754 -0.6132946   0.27988076 -0.04619189]], shape=(1, 10), dtype=float32)
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (1, 100)                  1200      
_________________________________________________________________
re_lu_4 (ReLU)               (1, 100)                  0         
_________________________________________________________________
dense_9 (Dense)              (1, 10)                   1010      
Total params: 2,210
Trainable params: 2,210
Non-trainable params: 0
_________________________________________________________________
None


Now all we need to train our model is to define a loss function to measure how good the networks predictions are. Since this is a multi-class classification problem (with 10 classes) we can use the Categorical Crossentropy loss.

In [46]:
loss_function = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

# Let the prediction be given by the neural network
y_pred = dummy_output
# Suppose the true label is 7
# Dont worry about this confusing line. 
# It just computes the one-hot encoding of 7 with a batch dimension.
y_true = tf.reshape(tf.one_hot(tf.convert_to_tensor(7), depth=10), shape=(1, -1))
# We compute loss like this.
loss = loss_function(y_true, y_pred)
print("Loss:", loss)


Loss: tf.Tensor(2.8658438, shape=(), dtype=float32)


Now that we have a neural network to make predictions and a loss function to measure how good thos predictions are, all we need now is an optimizer that can perform gradient decent on the loss function and the parameters of the neural network to train our model. Lets quickly look at how we compute gradients in Tensordlow.

In [53]:
# All computations performed inside the scope 
# of this `with` statement will record their gradients 
# on the `tape`.
with tf.GradientTape() as tape:
    # Input.
    dummy_input = tf.ones(shape=(1,11), dtype='float32')

    # Pass through the network.
    dummy_output = network(dummy_input)

    # Dummy label 8.
    dummy_label = tf.reshape(tf.one_hot(tf.convert_to_tensor(8), depth=10), shape=(1, -1))

    # Compute the loss.
    loss = loss_function(dummy_label, dummy_output)

# Now that we are done with computing the loss 
# we can collect the trainable parameters in the neural network.
trainable_variables = network.trainable_variables

# We then retrieve the gradients from the `tape` by passing in the 
# loss and the variables with respect to which we want to compute 
# gradients with.
gradients = tape.gradient(loss, trainable_variables)

# We then create an optimizer.
# In this case the Adam optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

# We can then apply the gradients to the trainable parameters like this.
gradients_and_variables = zip(gradients, trainable_variables)
optimizer.apply_gradients(gradients_and_variables)

print("Loss:", loss)


    


Loss: tf.Tensor(1.896747, shape=(), dtype=float32)


The above computation was obviously nonesense but it should hopefully give you an idea of how we compute gradients and apply them to the neural network with an optimizer. Now that we have all of these moving parts, lets create a loop which takes a batch of feature vectors and labels, makes a prediction, computes the loss and applies a gradient update.

In [55]:
def learn_on_batch(network, loss_function, optimizer, features, labels):
    with tf.GradientTape() as tape:
        prediction = network(features)

        loss = loss_function(prediction, labels)

        # Average the loss over the batch
        loss = tf.reduce_mean(loss)

    # Get gradients.
    variables = network.trainable_variables
    gradients = tape.gradient(loss, variables)

    # Apply gradients
    optimizer.apply_gradients(zip(gradients, variables))

    return loss



Lets iterate over the dataset and perform updates to the networks parameters using the above function.

In [56]:
for batch in wine_dataset:
    # Preprocess
    features, labels = preprocess_batch(batch)

    loss = learn_on_batch(network, loss_function, optimizer, features, labels)

    print("Batch loss:", loss)

print("Done.")

Batch loss: tf.Tensor(-77.6736, shape=(), dtype=float32)
Batch loss: tf.Tensor(-91.37077, shape=(), dtype=float32)
Batch loss: tf.Tensor(-120.694145, shape=(), dtype=float32)
Batch loss: tf.Tensor(-149.69974, shape=(), dtype=float32)
Batch loss: tf.Tensor(-177.73682, shape=(), dtype=float32)
Batch loss: tf.Tensor(-207.77567, shape=(), dtype=float32)
Batch loss: tf.Tensor(-218.18834, shape=(), dtype=float32)
Batch loss: tf.Tensor(-242.65567, shape=(), dtype=float32)
Batch loss: tf.Tensor(-281.09348, shape=(), dtype=float32)
Batch loss: tf.Tensor(-315.66162, shape=(), dtype=float32)
Batch loss: tf.Tensor(-325.54547, shape=(), dtype=float32)
Batch loss: tf.Tensor(-359.47186, shape=(), dtype=float32)
Batch loss: tf.Tensor(-413.8739, shape=(), dtype=float32)
Batch loss: tf.Tensor(-431.97437, shape=(), dtype=float32)
Batch loss: tf.Tensor(-453.40784, shape=(), dtype=float32)
Batch loss: tf.Tensor(-476.06952, shape=(), dtype=float32)
Batch loss: tf.Tensor(-525.8874, shape=(), dtype=float32)
B

## Tensorboard
Another useful feature I want to share with you is Tensorboard. Tensorboard is a tool which you can use to log information to during training so that you can monitor things. Below is a class I define to log information to tensorboard.

In [57]:
import datetime

class Logger():

    def __init__(self, logdir="./logs/"):
        current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
        logdir = logdir + current_time

        self.summary_writer = tf.summary.create_file_writer(logdir)

    def write(self, step, logs):
        """Write logs to tensorboard.

        Args:
            step (Int): Training step of the logs.
            logs (Dict[str, float]): Dictionary of logs to be written to tensorboard.
        """
        with self.summary_writer.as_default():
            for key, value in logs.items():
                tf.summary.scalar(key, value, step=step)


## Model training
Lets pull together everything we have done so far. We will train a network on the wine dataset for 100 epochs. We will log the losses to tensorboard and monitor the process of training on tensorboard. We run the cell below to start tensorboard. If tensorboard does not apear, run the cell again.

In [59]:
%load_ext tensorboard
%tensorboard --logdir logs/

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 46409), started 0:00:20 ago. (Use '!kill 46409' to kill it.)

In [61]:
logger = Logger()
num_epochs = 100

for e in range(num_epochs):
    epoch_losses = []
    for batch in wine_dataset:
        # Preprocess
        features, labels = preprocess_batch(batch)

        loss = learn_on_batch(network, loss_function, optimizer, features, labels)

        epoch_losses.append(loss.numpy())
    
    epoch_avg_loss = np.mean(epoch_losses)

    logger.write(e, {"loss": epoch_avg_loss})

        

Once you have run the cell above, go refresh tensorboard and monitor training. You will need to keep refreshin tensorboard to get the latest logs. You should hopefully see that the loss is going down, that suggests that our model is starting to get better at predicting the wine ratings. To verify that our model is really getting better we would need to evaluate it on a test dataset. But this tutorial is not really a lesson on supervised learning so I am gonna leave it here. Lets move on to RL now.