<a href="https://colab.research.google.com/github/juelha/IANNWTF/blob/sabine/hw07_Sabine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1.2 Generate a Tensorflow Datset
You can of course now go on and compute a big bunch of examples and store
them in a huge array or dataframe or whatever you like. However, that would
be rather inefficient - as we work with random noise input, we can compute the
data on the fly.
Thus, we would like you to work with generators 1 and the tensorflow
function tf.data.Dataset.from generator().
There are a few simple steps to take:

• Write a integration task(seq len, num sapmles) generator function: This function should for num samples times 2 yield a random noise
signal of the size seq len 3
(sequence length, thus the number of time steps
2
we are considering) and a target, namely if the sum of the noise signal is
greater or smaller than 1. 4
You make your life easier if you do not try to handle empty dimensions!5

• Write a wrapper generator my integration task(): As it is easier to
pass a generator to the tensorflow tf.data.Dataset.from generator()
method that does not take any arguments, write a wrapper function
which internally iterates to integration task with a specified seq len and
num samples and yields the function’s yield (yes sounds stupid and complicated, but it’s really easier that way).
Choose num samples to be rather high 6
, as you do not want to overfit and
you may want to start with a small sequence length for easier debugging.
In the end, you should however be able to deal with a sequence length of
25 at least.

• Create the Tensorflow Dataset: Pass on your wrapper generator to
tf.data.Dataset.from generator(, output signature= tf.TensorSpec(shape,
dtype).



In [1]:
import scipy.integrate as integrate
import random
import numpy as np
import tensorflow as tf


In [2]:
# Dataset Setup

''''
def integration_task(seq_len, num_samples): 

  for sample in range(num_samples):
    f = lambda x: np.random.normal(loc=0, scale=2)
    i = integrate.quad(f,0,seq_len)
    print(i[0])

integration_task(5, 1)

'''
seq_len = 3
num_samples = 80000

def integration_task(seq_len, num_samples): 
  for sample in range(num_samples):
    noise_signal=np.array([])
    for signal in range(seq_len):
      noise_signal = np.append(noise_signal, np.random.normal(loc=0, scale=2)) # noise signal ints of length seq_len
    target = np.array(int(np.sum(noise_signal)>0))
    noise_signal = np.expand_dims(noise_signal,-1) 
    target = np.expand_dims(target,-1)
    #print(noise_signal.shape, target.shape)
    yield noise_signal, target

def wrapper_generator(): 
  return integration_task(seq_len,num_samples)

wrapper_generator()

dataset = tf.data.Dataset.from_generator(wrapper_generator, output_signature= 
                                                                 (tf.TensorSpec(shape=(seq_len,1), dtype=tf.float32),
                                                                  tf.TensorSpec(shape=(1), dtype=tf.float32)))

#list(dataset.take(1))

##1.3 Create a Data pipeline
Now you should have a Tensorflow Dataset object and thus should be able to
apply all the necessary preprocessing steps to create an efficient pipeline. 7

In [3]:
train_size = int(0.7 * num_samples)
valid_size = int(0.15 * num_samples)

train_ds = dataset.take(train_size)
remaining = dataset.skip(train_size)  
valid_ds = remaining.take(valid_size)
test_ds = remaining.skip(valid_size)

#assert dataset.shape == (80000, 3, 1)

def preprocessing(tensor):
  """ apply a preprocessing pipeline to the given dataset
  :param tensor: data to be preprocessed
  :return: preprocessed dataset
  """
  # shuffle, batch, prefetch
  tensor = tensor.shuffle(1000)
  tensor = tensor.batch(32)
  tensor = tensor.prefetch(20)
  # return preprocessed dataset
  return tensor

train_dataset = train_ds.apply(preprocessing)
test_dataset = test_ds.apply(preprocessing)

dataset = dataset.apply(preprocessing)

print(dataset)

<PrefetchDataset shapes: ((None, 3, 1), (None, 1)), types: (tf.float32, tf.float32)>


#2 The network
While there are great implementations in tensorflow/keras for LSTMs, this week
you are asked to build your own! So you are restricted from using any tensorflow/keras inbuilt of LSTM (or RNN!). You can however use Dense layers for
the matrix multiplications.
To do so, there are several steps you need to take:

1. Implement a LSTM Cell which is called by providing the input for a
single times step and a tuple containing (hidden state, cell state).

• init (self,units): Remember again what gates an LSTM consists of and how to parametrize them. 8 Pay special attention to
weight initialization: According to Josezefowicz et al. Setting the
bias of the forget gate to one initially is very important for performance in training LSTMs. 9 Think about it for a moment - what
effect would that have on the initial output values of this gate, and
what effect will that have on the recurrent cell state? This is actually implemented in the default LSTM cell in Tensorflow - check the
unit forget bias argument there for more information!
3

• call(self, x, states): To implement the call function of the
LSTM cell, think about the different pathways and how the input
and the different states are combined. 10

2. Implement a LSTM layer, which is created from one (or multiple for
a multi-layer LSTM) LSTM Cell. This LSTM layer should operate on
inputs with multiple time steps.

• init (self,cell): It is easier to start implementing single cell
layers but you may think about ways to implement multi-layer LSTMs.

• call(self, x, states) The call function takes the input over multiple time steps and creates (and returns!) the outputs over multiple
time steps. The input is expected to be of shape [batch size, seq len,
input size], the respective output of shape [batch size, seq len, output size] To achieve this you will have to ”unroll” the LSTM.
To learn more about ”unrolling” if you are unsure what it means and
want to know how to implement it efficiently, go to the subsection
about Unrolling.

• zero states(self, batch size): Define a function that, given a
batch size, resets the states of the LSTM, thus returns a tuple of
states of the appropriate size filled with zeros. 11

3. Implement the final Model: This model should be a wrapper around
your lstm implementation.

• init (): You may want to use one or multiple read in layers and
an output layer that takes the output of your LSTM and transforms
it into your final prediction. 12

• call(self,x): Here the input is over the whole sequence length.
You should pass it through your read-in layers and then to your
LSTM implementation 13 and finally to your read-out layer.

In [11]:
sig_function = tf.keras.activations.sigmoid
tan_function = tf.keras.activations.tanh

class LSTM_Cell(tf.keras.Model): 
  def __init__(self, units): 
    super(LSTM_Cell, self).__init__()
    self.units = units
    self.forget_gate = tf.keras.layers.Dense(units, activation=sig_function, bias_initializer='ones')
    self.input_gate = tf.keras.layers.Dense(units, activation=sig_function)
    self.cell_state_candidates = tf.keras.layers.Dense(units, activation=tan_function)
    self.output_gate = tf.keras.layers.Dense(units, activation=sig_function)

  def call(self, x, states): 
    concat_inputs = tf.concat((x, states[0]), axis=1) # states is a tuple containing (hidden_state, cell_state). Axis 1 = seq_len

    # applying the forget filter to the old cell state Ct−1 via point wise multiplication 
    ft = self.forget_gate(concat_inputs)
    cell_state_update = ft * states[1] # states is a tuple containing (hidden_state, cell_state)

    #do the same with our input filter and the candidate cell state C^t selecting new candidates.
    it = self.input_gate(concat_inputs)
    ct = self.cell_state_candidates(concat_inputs)
    new_candidate = it * ct

    #  We now combine that to form the new cell state Ct:
    new_ct = tf.add(cell_state_update, new_candidate)

    # Determining the hidden state/output
    output = self.output_gate(concat_inputs)
    new_hidden = output *tf.math.tanh(new_ct)

    return new_ct, new_hidden


class LSTM_Layer(tf.keras.Model): 
  def __init__(self, units=16): 
    super(LSTM_Layer, self).__init__()
    self.cell = LSTM_Cell(units)

  def call(self, x): 
    states = self.zero_states()
    output_list = []
    for index in range(x.shape[1]):
      cell, hidden = self.cell(tf.squeeze(x[:,index,:]), states)
      output_list = np.stack(hidden)
      states = cell, hidden
    print(output_list.shape)
    return output_list

  def zero_states(self, batch_size=32): 
    states = (np.zeros([batch_size, self.cell.units]), np.zeros([batch_size, self.cell.units])) # batch_size, cell plus hidden state and unit number??
    return states


class Final_Model(tf.keras.Model):
  def __init__(self): 
    super(Final_Model, self).__init__()
    self.input_layer = tf.keras.layers.Dense(units=16, activation='sigmoid')
    self.lstm_layer = LSTM_Layer(units=16)
    self.output_layer = tf.keras.layers.Dense(units=1, activation='sigmoid')
    

  def call(self, x):
    x = self.input_layer(x)
    x = self.lstm_layer(x)
    x = self.output_layer(x)
    return x

  
'''    
# idea of where this is going

model = LSTM_Layer(units=16)

model(inputs, states)
'''


'    \n# idea of where this is going\n\nmodel = LSTM_Layer(units=16)\n\nmodel(inputs, states)\n'

#3 Training
Training mostly stays the same to any other binary classification task you have
tackled so far. However, there is a key difference: You get a prediction on the
level of several time steps, but for your loss and your accuracy, we only want to
consider the prediction of the very last time step 14

.
Think of this task as a proof of concept task - getting some reasonable
accuracy is really enough, no need to try and push it this week. So don’t feel
pressured to use any sort of more advanced optimization.
Be aware that training might take longer, especially if you decide not to
make use of graph mode. It is enough if you can see a significant improvement
in loss and accuracy after the second epoch of training, and it is very reasonable
to get to at least 80% accuracy.


In [5]:
def train_step(model, input, target, loss_function, optimizer):
  """Applys optimizer to all trainable variables of this model to
  minimize the loss (loss_function) between the target output and the
  predicted ouptut.
  :param input: tf.Tensor input to the model
  :param target: target output with repect to the input
  :return: the loss and the accuracy of the models prediction
   """
  # loss_object and optimizer_object are instances of respective tensorflow classes
  with tf.GradientTape() as tape:
    prediction = model(input)
    loss = loss_function(target, prediction)
    gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  return loss

def test(model, test_data, loss_function):
  """Calculate the mean loss and accuracy of the model over all elements
  of test_data.
  :param test_data: model is evaulated for test_data
  :param: loss_function: chosen cost function
  :return: mean loss and mean accuracy for all datapoints
  """

  # test over complete test data
  test_accuracy_aggregator = []
  test_loss_aggregator = []

  for (input, target) in test_data:
    prediction = model(input)
    sample_test_loss = loss_function(target, prediction)
    sample_test_accuracy =  np.argmax(target, axis=1) == np.argmax(prediction, axis=1)
    sample_test_accuracy = np.mean(sample_test_accuracy)
    test_loss_aggregator.append(sample_test_loss.numpy())
    test_accuracy_aggregator.append(np.mean(sample_test_accuracy))

  test_loss = tf.reduce_mean(test_loss_aggregator)
  test_accuracy = tf.reduce_mean(test_accuracy_aggregator)

  return test_loss, test_accuracy

In [12]:
tf.keras.backend.clear_session()

### Hyperparameters
num_epochs = 10
learning_rate = 0.001

# Initialize the model.
model = Final_Model()
# Initialize the loss.
cross_entropy_loss = tf.keras.losses.CategoricalCrossentropy()
# Initialize the optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate)

# Initialize lists for later visualization.
losses = []
accuracies = []


#check how model performs on train data once before we begin
loss, accuracy = test(model, dataset, cross_entropy_loss)
losses.append(loss)
accuracies.append(accuracy)

# train for num_epochs epochs.
for epoch in range(num_epochs):
    print(f'Epoch: {str(epoch)} starting with accuracy {accuracies[-1]}')
    
    #training (and checking in with training)
    epoch_loss_agg = []
    for input,target in dataset:
        loss = train_step(model, input, target, cross_entropy_loss, optimizer)
        epoch_loss_agg.append(loss)
    
    #track training loss
    losses.append(tf.reduce_mean(epoch_loss_agg))

(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(32, 16)
(

KeyboardInterrupt: ignored