<div style="text-align: right"><a href="http://ml-school.uni-koeln.de">Summer School "Deep Learning for
    Language Analysis"</a> <br/><strong>Text Analysis with Deep Learning</strong><br/>Sep 5 - 9, 2022<br/>Nils Reiter<br/><a href="mailto:nils.reiter@uni-koeln.de">nils.reiter@uni-koeln.de</a></div>

# Exercise 0: Getting Started

This is the first exercise for you to solve independently, but as a group of approximately three students. Feel free to ask us for support. 

## The Task

Each neural network is trained to solve one or more specific task(s). You can think of a task as a function that takes an input and generates an output. Ideally, the output depends on the input. The network in this exercise solves a very simple task: Assuming our input consists of sequences of ones and zeros, we want the network to also produce output consisting of ones and zeros -- but shifted by two positions. I.e., the input `[0,1,0,0]` results in
`[0,0,0,1]` as output.

This is obviously a trivial task, and even with moderate python skills, we can write a function that solves the task. Nevertheless, we will use a neural network to solve this task -- and in fact, we will use such a function to generate training data. But first, let's import the relevant libraries (and verify that they can be imported).

In [1]:
import keras
import numpy as np
import pandas as pd
import tensorflow as tf

from tensorflow.keras import models, layers, optimizers

# limit GPU memory to 4 GB
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])
    except RuntimeError as e:
        print(e)


It is a good practice to define the basic settings for any task in variables. This way, they can be easily changed, even if they are used in multiple places in our code.

In [2]:
# The number of sequences to generate
number_of_sequences = 40

# The number of symbols to distinguish
number_of_symbols = 2

# The length of each sequence
length_of_sequences = 15

In [3]:
# initialize the random number generator
rng = np.random.default_rng()

# create the sequences randomly
x_train = np.array([rng.integers(0,number_of_symbols,length_of_sequences) for i in range(number_of_sequences)])

# show them
x_train

array([[1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0],
       [1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0],
       [0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0],
       [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1],
       [0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0],
       [1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1],
       [0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0],
       [1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1],
       [1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1],
       [0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0],
       [1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0],
       [1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0],
       [1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0],
       [1, 0, 0, 0, 1, 0, 0,

### Technical background
(you may skip this if you're relatively new to Python)

This (`x_train`) is actually a single object -- a [numpy](https://numpy.org) array. Despite their name, numpy arrays are multidimensional data structures and are used extensively in deep learning. The numpy array we have generated is a actually a two-dimensional array (i.e. a matrix). You can find out how many dimensions a numpy array has by asking its property [`ndim`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ndim.html) (e.g., `x_train.ndim`). You can also find out how large each dimension is by asking the property [`shape`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html) (e.g., `x_train.shape`)

We now have created the input training data, which are often labeled `x`. What we also need is the output training data, i.e., the shifted sequences. Output sequences are usually labeled `y`, such that the neural network can be thought of as a function that goes from $x$ to $y$: $f(x) = y$.

This of course is the place where we already solve the task, using splicing and list comprehension.

In [4]:
y_train = np.array([np.insert(x_seq[:length_of_sequences-2],0,[0,0]) for x_seq in x_train])

# shifting should be visible here:
print(x_train[1])
print(y_train[1])

[1 1 0 1 1 0 1 1 0 1 1 1 1 0 0]
[0 0 1 1 0 1 1 0 1 1 0 1 1 1 1]


Up to now, the input sequences consist of scalar integers. Neural networks expect the elements to be vectors -- to allow including multiple features (e.g., from an embedding). We will therefore [`reshape`](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) the ndimensional arrays we have. This is also a function that numpy provides.

The result of this operation is that each sequence contains one-dimensional vectors instead of scalar values.

In [5]:
x_train = x_train.reshape(40, length_of_sequences, 1)
y_train = y_train.reshape(40, length_of_sequences, 1)

## Model Building

The next block defines the neural network we will be using. Don't worry, you will understand what you are doing later this week.

Keras, the library that we are using, offers two APIs to specify a neural network: The [*Sequential API*](https://keras.io/guides/sequential_model/) and the [*Functional* API](https://keras.io/guides/functional_api/). We will be using the sequential API for the moment. Sequential here means that "network" is actually only a single lane: The output of one node goes as input into the next. The functional API allows more complex architectures and will be used later on.

The last line of the block generates an overview of the network architecture that we have assembled. This is very handy, because complex architectures are ... complex. The middle column in this table contains the output shape of each layer. The last layer (last row) thus generates sequences of length 15, consisting of one-dimensional arrays.

In [6]:
model = models.Sequential()
# define an input layer
model.add(layers.InputLayer(input_shape=(length_of_sequences,1)))

# define a hidden layer
model.add(layers.Bidirectional(layers.SimpleRNN(5,return_sequences=True)))

# define an output layer
model.add(layers.Dense(1))

model.compile(loss="binary_crossentropy", 
              optimizer="sgd",
             metrics=["accuracy"])

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional (Bidirectional (None, 15, 10)            70        
_________________________________________________________________
dense (Dense)                (None, 15, 1)             11        
Total params: 81
Trainable params: 81
Non-trainable params: 0
_________________________________________________________________


## Training

The actual training is done using the [`fit()`](https://keras.io/api/models/model_training_apis/) function, provided by keras. We will talk about each of the parameters, but feel free to play around with them.

In any case, the function needs the input and output data to train on ($x$, `x_train` and $y$, `y_train`).

In [7]:
model.fit(x_train, y_train, epochs=150, batch_size=10, verbose=1)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x13e959610>

## Application and Testing

Finally, we can use the trained model to predict a new sequence, using the `predict()` function. The predict function expects us to provide data that is structurally similar to the training data -- i.e., it should have the same dimensionality etc.

In [17]:
x_test = np.array([[1,0,0,1,0,0,0,1,0,0,0,0,0,0,0]]).reshape(1,length_of_sequences,1)
y_test = np.array([np.insert(x_seq[:length_of_sequences-2],0,[0,0]) for x_seq in x_test])
y_prediction = model.predict(x_test)
y_prediction = y_prediction.reshape(length_of_sequences)

for i in range(length_of_sequences):
    print(f"{i:02d}", end=" ")
    print(x_test[0][i][0], end=" ")
    print(y_test[0][i], end=" ")
    print(y_prediction[i])



00 1 0 -0.05330883
01 0 0 -0.25208867
02 0 1 1.5137744
03 1 0 -0.43100852
04 0 0 -0.66855615
05 0 1 1.4111977
06 0 0 -0.046335146
07 1 0 -0.62494653
08 0 0 -0.60186344
09 0 1 1.3038105
10 0 0 -0.53792495
11 0 0 -1.0049039
12 0 0 -0.67612857
13 0 0 -0.48061174
14 0 0 -0.65284264
