# IMPORTANT
1. Interrupt all existing kernels (otherwise you will get memory problems). To do so go to the navigation bar and click Kernel > Shut Down All Kernels
2. **Start this notebook with the Tensorflow Kernel.** To do so go to the navigation bar and click Kernel > Change Kernel... > Select Tensorflow > Click Select (Alternativly via the kernel button on the top right of the notebook)

## Binary Judgement Prediction with Bag of Words

Let's finally start with the task of predicting the outcome of a case given the text describing the main facts brought to the attention of the court. As we have just seen, each court case is annotated with a binary judgement label: whether the offendant has (label 1) or has not (label 0) violated any human rights article or protocol. This is a similar scenario to the sentiment classification task you have worked on previously in this course.

### Set-up
First, we load the necessary python libraries. Similarly to the sentiment classification example, we will use `keras` and `tensorflow`.

**Exercise:** Fix the random seed of `tensorflow` and `numpy` to ensure reproducibility.

In [None]:
import numpy as np
import tensorflow as tf
import time
import json
import os

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.layers import Input, TextVectorization, Embedding, Conv1D, MaxPooling1D, Flatten, LSTM, Bidirectional
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.initializers import Constant
from tensorflow.keras.models import load_model

# Initialize random number generators to ensure reproducibility

# set random seed for tensorflow
... # fill in this line

# set random seed for numpy
... # fill in this line

In [None]:
# Convenience functions: prepare data splits in scikit-friendly format
# You don't need to read the code in this cell, but please make sure you execute it.

def load_input_from_ECHR_dataset(dataframe):
    # Input: text
    X_train = data[data.partition == 'train'].text.to_list()
    X_val = data[data.partition == 'dev'].text.to_list()
    X_test = data[data.partition == 'test'].text.to_list()
    return X_train, X_val, X_test

def load_binary_output_from_ECHR_dataset(dataframe):
    # Binary output: violation judgement
    y_train_binary = data[data.partition == 'train'].binary_judgement.to_numpy()
    y_val_binary = data[data.partition == 'dev'].binary_judgement.to_numpy()
    y_test_binary = data[data.partition == 'test'].binary_judgement.to_numpy()
    return y_train_binary, y_val_binary, y_test_binary

def load_regression_output_from_ECHR_dataset(dataframe):
    # Regression output: case importance score
    y_train_regression = data[data.partition == 'train'].importance.astype(float).to_numpy()
    y_val_regression = data[data.partition == 'dev'].importance.astype(float).to_numpy()
    y_test_regression = data[data.partition == 'test'].importance.astype(float).to_numpy()
    return y_train_regression, y_val_regression, y_test_regression

def load_multiclass_output_from_ECHR_dataset(dataframe):
    # Multiclass output: case importance label
    y_train_multiclass = data[data.partition == 'train'].importance.to_numpy()
    y_val_multiclass = data[data.partition == 'dev'].importance.to_numpy()
    y_test_multiclass = data[data.partition == 'test'].importance.to_numpy()
    return y_train_multiclass, y_val_multiclass, y_test_multiclass

def load_ECHR_dataset_for_binary_judgement_classification(dataframe, for_tensorflow=False):
    X_train, X_val, X_test = load_input_from_ECHR_dataset(dataframe)
    y_train, y_val, y_test = load_binary_output_from_ECHR_dataset(dataframe)
    if for_tensorflow:
        train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
        val_ds = tf.data.Dataset.from_tensor_slices((X_val, y_val))
        test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test))
    else:
        train_ds = {"texts": X_train, "labels": y_train}
        val_ds = {"texts": X_val, "labels": y_val}
        test_ds = {"texts": X_test, "labels": y_test}
    return train_ds, val_ds, test_ds

def load_ECHR_dataset_for_case_importance_regression(dataframe, for_tensorflow=False):
    X_train, X_val, X_test = load_input_from_ECHR_dataset(dataframe)
    y_train, y_val, y_test = load_regression_output_from_ECHR_dataset(dataframe)
    if for_tensorflow:
        train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
        val_ds = tf.data.Dataset.from_tensor_slices((X_val, y_val))
        test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test))
    else:
        train_ds = {"texts": X_train, "labels": y_train}
        val_ds = {"texts": X_val, "labels": y_val}
        test_ds = {"texts": X_test, "labels": y_test}
        return train_ds, val_ds, test_ds

def load_ECHR_dataset_for_case_importance_classification(dataframe, for_tensorflow=False):
    X_train, X_val, X_test = load_input_from_ECHR_dataset(dataframe)
    y_train, y_val, y_test = load_multiclass_output_from_ECHR_dataset(dataframe)
    if for_tensorflow:
        train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
        val_ds = tf.data.Dataset.from_tensor_slices((X_val, y_val))
        test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test))
    else:
        train_ds = {"texts": X_train, "labels": y_train}
        val_ds = {"texts": X_val, "labels": y_val}
        test_ds = {"texts": X_test, "labels": y_test}
    return train_ds, val_ds, test_ds

### Loading the data

We now load the data in needed for the binary classification task in a model-friendly format, using some convenience functions defined in the cell above. As we have seen, the ECHR dataset comes with a predefined train-validation-test split.

First, we load the data from the preparation

In [None]:
import pandas as pd
data = pd.read_csv("data.csv")

In [None]:
train_ds, val_ds, test_ds = load_ECHR_dataset_for_binary_judgement_classification(data, for_tensorflow=True)

# Print 3 examples from the dataset
for example, label in train_ds.take(3):
  print("Input: ", example)
  print(10*".")
  print('Target labels: ', label)
  print(50*"-")

### Fit and evaluate

The following piece of code defines a function that trains (fits) a model on the training data and evaluates it on the development set. It then returns a dictionary with the training and validation history.

Please take some time to read this code and to understand all of its components.

In [None]:
def fit_and_eval_binary_classifier(
    train_ds,
    val_ds,
    model, # model to train
    model_name, # name of the model
    learning_rate, # too small, and it learns slowly; too large, and it might overshoot the optimal solution
    buffer_size, # number of elements randomly sampled from the dataset to form a batch during training
    batch_size, # number of training examples utilized in one iteration
    n_epochs, # number of iterations to train the model
    patience_n_epochs=5
    ):

    # preliminaries
    tf.random.set_seed(42)
    np.random.seed(42)
    tf.config.run_functions_eagerly(True)

    # start timing
    start_time = time.time()

    # train
    history = model.fit(
        train_ds.shuffle(buffer_size=buffer_size).batch(batch_size),
        validation_data=val_ds.batch(batch_size),
        epochs=n_epochs,
        verbose=1,
        callbacks=[EarlyStopping(
            monitor='val_accuracy', patience=patience_n_epochs, verbose=False, restore_best_weights=True
        )]
    )

    # end timing
    end_time = time.time()
    training_time = end_time - start_time

    # Evaluate Training Progress
    history_dict = history.history
    history_dict.keys()

    resDict = {}
    resDict['train_loss'] = history_dict['loss']
    resDict['val_loss'] = history_dict['val_loss']
    resDict['train_accuracy'] = history_dict['accuracy']
    resDict['val_accuracy'] = history_dict['val_accuracy']
    resDict['epochs'] = len(resDict['train_accuracy'])
    resDict['model_name'] = model_name
    resDict['training_time'] = training_time

    return resDict

Now that the data is loaded and the training and evaluation procedure is in place, we can move to modelling.

### Creating Bag-of-Words text representations

We will use [`TextVectorization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization) to obtain bag-of-words representations of texts.

These representations will depend on two main parameters:
* the vocabulary size `VOCAB_SIZE`, which limits the number of word considered to the `VOCAB_SIZE` most frequent ones
* the type of bag-of-words representation: based on raw word counts (`count`) or on word counts weighed by inverse document frequency (`tf-idf`)

**Exercise:** Write code to create count-based and tf-idf text representations.

In [None]:
print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))

In [None]:
VOCAB_SIZE = 1000

# Create count-based features
# ----------------------------
# encoder_bow_count = ... # fill in this line
# ... # fill in this line

# Create tf-idf features
# ----------------------------
# encoder_bow_tfidf = ... # fill in this line
# ... # fill in this line


In [None]:
# Let's take a peak at the vocabulary
vocab = np.array(encoder_bow_count.get_vocabulary())
vocab[:30]

### Create folder for saving and loading models

In [None]:
folder_path = "models_trained" # create one folder for models trained from scratch
if not os.path.exists(folder_path):
  os.mkdir(folder_path)

### Binary classifier 1
As a first model, we will implement a logistic regression classifier with count-based BOW representations.

**Exercise:** Write code to define the model architecture, the training obectives, and the evaluation metric.

In [None]:
# Define main hyperparameters
# --------------------------------------------------------------
LEARNING_RATE = 0.005
N_EPOCHS = 20
BUFFER_SIZE = 10000
BATCH_SIZE = 50

# Define model architecture
# --------------------------------------------------------------
binary_classifier_1 = Sequential(
    name = f'Logistic regression, count-based BOW, |V| = {VOCAB_SIZE}'
)
# binary_classifier_1.add(...)  # fill in this line
# binary_classifier_1.add(...)  # fill in this line
# binary_classifier_1.add(...)  # fill in this line

# Define training objective, evaluation metric, and optimizer
# --------------------------------------------------------------
binary_classifier_1.compile(
    # loss='...', # fill in this line
    # metrics=['...'], # fill in this line
    optimizer=Adam(learning_rate=LEARNING_RATE)
)
print(binary_classifier_1.summary())

Let's evaluate the first classifier. You have two options: train it from scratch or load the pretrained model.

In [None]:
train_from_scratch = True # if set to True, we train clf from scratch, otherwise load pretrained model

modelFile_classifier_1 = os.path.join(folder_path, 'models_BoW/logistic_regression_count_based_BOW_V_1000.keras')
historyFile_classifier_1 = os.path.join(folder_path, 'models_BoW/logistic_regression_count_based_BOW_V_1000.json')

# train from scratch
if train_from_scratch:
  # fit_and_eval_binary_classifier returns the training history
  res_dict_classifier_1 =  fit_and_eval_binary_classifier(
      train_ds=train_ds,
      val_ds=val_ds,
      model=binary_classifier_1,
      model_name=binary_classifier_1.name,
      learning_rate=LEARNING_RATE,
      buffer_size=BUFFER_SIZE,
      batch_size=BATCH_SIZE,
      n_epochs=N_EPOCHS,
      patience_n_epochs=N_EPOCHS
  )
  # save trained model
  folder_name_bow = "models_BoW"
  folder_path_bow = os.path.join(folder_path, folder_name_bow) # concatenate the directory path and folder name
  if not os.path.exists(folder_path_bow):
    os.mkdir(folder_path_bow)

  binary_classifier_1.save(modelFile_classifier_1) # save classifier
  with open(historyFile_classifier_1, 'w') as json_file: # save training history
    json.dump(res_dict_classifier_1, json_file)

# load pretrained model either from zipfile or from folder
else:
  if (not os.path.exists(modelFile_classifier_1)) | (not os.path.exists(historyFile_classifier_1)):
    !tar -xvzf ./pretrained/models_BoW.tar.gz -C $folder_path # the path to the zip file and the destination directory

  binary_classifier_1 = load_model(modelFile_classifier_1)
  with open(historyFile_classifier_1, 'r') as json_file:
    res_dict_classifier_1 = json.load(json_file)

train_acc_model_1 = res_dict_classifier_1['train_accuracy']
val_acc_model_1 = res_dict_classifier_1['val_accuracy']

These are its training and validation accuracy over epochs. (Note that the model stops training after `patience_n_epochs` where its loss doesn't improve. We set this value equal to the number of epochs, so the model completes them all. However, you can set this parameter to a lower value for more efficient training. This is what you'd likely do in practice.)

In [None]:
import matplotlib.pyplot as plt

plt.plot(
    range(1, len(train_acc_model_1) + 1),  # the epochs for the x-axis
    train_acc_model_1,  # the training accuracy
    'b:',  # for dotted blue line
    label='Logreg count-based BOW, Training acc'
)
plt.plot(
    range(1, len(val_acc_model_1) + 1),  # the epochs for the x-axis
    val_acc_model_1,  # the validation accuracy
    'b',  # for dense blue line
    label='Logreg count-based BOW, Validation acc'
)
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

### Binary classifier 2
Next is a logistic regression classifier with tfidf-based BOW representations.


**Exercise:** Write code to define the model architecture, the training obectives, and the evaluation metric.

In [None]:
# Define main hyperparameters
# --------------------------------------------------------------
LEARNING_RATE = 0.005
N_EPOCHS = 20
BUFFER_SIZE = 10000
BATCH_SIZE = 50

# Define model architecture
# --------------------------------------------------------------
binary_classifier_2 = Sequential(
    name = f'Logistic regression, tfidf-based BOW, |V| = {VOCAB_SIZE}'
)
#binary_classifier_2.add(...)  # fill in this line
#binary_classifier_2.add(...)  # fill in this line
#binary_classifier_2.add(...)  # fill in this line

# Define training objective, evaluation metric, and optimizer
# --------------------------------------------------------------
binary_classifier_2.compile(
    #loss='...',  # fill in this line
    #metrics=['...'],  # fill in this line
    optimizer=Adam(learning_rate=LEARNING_RATE)
)
print(binary_classifier_2.summary())

You have two options: either train the model from scratch or load a pretrained model.

In [None]:
train_from_scratch = True # if set to True, we train clf from scratch, otherwise load pretrained model

modelFile_classifier_2 = os.path.join(folder_path, 'models_BoW/logistic_regression_tfidf_based_BOW_V_1000.keras')
historyFile_classifier_2 = os.path.join(folder_path, 'models_BoW/logistic_regression_tfidf_based_BOW_V_1000.json')

# train from scratch
if train_from_scratch:
  # fit_and_eval_binary_classifier returns the training history
  res_dict_classifier_2 =  fit_and_eval_binary_classifier(
      train_ds=train_ds,
      val_ds=val_ds,
      model=binary_classifier_2,
      model_name=binary_classifier_2.name,
      learning_rate=LEARNING_RATE,
      buffer_size=BUFFER_SIZE,
      batch_size=BATCH_SIZE,
      n_epochs=N_EPOCHS,
      patience_n_epochs=N_EPOCHS
  )
  # save trained model
  folder_name_bow = "models_BoW"
  folder_path_bow = os.path.join(folder_path, folder_name_bow) # concatenate the directory path and folder name
  if not os.path.exists(folder_path_bow):
    os.mkdir(folder_path_bow)

  binary_classifier_2.save(modelFile_classifier_2) # save classifier
  with open(historyFile_classifier_2, 'w') as json_file:
    json.dump(res_dict_classifier_2, json_file)

# load pretrained model either from zipfile or from folder
else:
  if (not os.path.exists(modelFile_classifier_2)) | (not os.path.exists(historyFile_classifier_2)):
    !tar -xvzf ./pretrained/models_BoW.tar.gz -C $folder_path # the path to the zip file and the destination directory

  binary_classifier_2 = load_model(modelFile_classifier_2)
  with open(historyFile_classifier_2, 'r') as json_file:
    res_dict_classifier_2 = json.load(json_file)

train_acc_model_2 = res_dict_classifier_2['train_accuracy']
val_acc_model_2 = res_dict_classifier_2['val_accuracy']

In [None]:
plt.plot(
    range(1, len(train_acc_model_2) + 1),
    train_acc_model_2,
    'g:',
    label='Logreg tfidf-based BOW, Training acc'
)
plt.plot(
    range(1, len(val_acc_model_2) + 1),
    val_acc_model_2,
    'g',
    label='Logreg tfidf-based BOW, Validation acc'
)
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

### Model comparison

**Exercise:** To compare the two models visually, ***plot the training and validation accuracy of the two bag-of-words models.***

In [None]:
...  # fill in this line
...  # fill in this line
...  # fill in this line
...  # fill in this line
#plt.title('Training and validation accuracy')
#plt.xlabel('Epochs')
#plt.ylabel('Accuracy')
#plt.legend(loc='lower right')
#plt.grid(True)
#plt.show()

**Exercise:** Briefly describe the results.

*Enter your response here (two or three sentences should suffice).*

### Analysis
To understand the models' performance beyond the evaluation scores, it is useful to carry out what could be called an *intepretability analysis*.

We interpret what the model has learned by analysing its weights.

**Exercise:** Write code to extract the weights from the two classifiers above and to obtain the vocabulary entries with the highest weights.

In [None]:
vocab1 = np.array(encoder_bow_count.get_vocabulary())
vocab2 = np.array(encoder_bow_tfidf.get_vocabulary())

# Extract the classifier weights
# classifier_1_vocab_weights = ...  # fill in this line
# classifier_2_vocab_weights = ...  # fill in this line

# Sort the weights and get the correspondingly sorted vocabulary indices
# classifier_1_vocab_weights_sorted = ...  # fill in this line
# classifier_2_vocab_weights_sorted = ...  # fill in this line

# The indices with the largest values indicate which words are most indicative of violations
print("Words predictive of violations")
print("Model 1:\n", vocab1[classifier_1_vocab_weights_sorted[-10:]])
print()
print("Model 2:\n", vocab2[classifier_2_vocab_weights_sorted[-10:]])

# ... and vice versa
print("\n\nWords predictive of absolution")
print("Model 1:\n", vocab1[classifier_1_vocab_weights_sorted[:10]])
print("Model 2:\n", vocab2[classifier_2_vocab_weights_sorted[:10]])

Do the words with the highest weights correspond to sensible violation or absolution cues?

## Binary Judgement Prediction with LSTMs

As a next model class, we will consider recurrent neural models — in particular, LSTMs. As you have learned, these models are able to take into account the order of words in sentences, which is in principle a big advantage over bag-of-words models. "The woman sued Switzerland" is not the same as "Switzerland sued the woman"!

### BiLSTM with embeddings trained from scratch

First, we'll design a simple one-layer bidirectional LSTM with word embeddings learned from scratch.

**Exercise:** Write code to create word embeddings for the vocabulary of this dataset.

First, define the right encoder.

In [None]:
EMBEDDING_DIM = 50
VOCAB_SIZE = 1000

# encoder_embed = ... # fill in this line
# ... # fill in this line

In [None]:
# print the vocabulary id of the word "human"
encoder_embed("human").numpy()

Then, create the embedding matrix.

In [None]:
embedding_layer = Embedding(
    # input_dim=...,  # fill in this line
    # output_dim=...,  # fill in this line
    embeddings_initializer="uniform",
    trainable=True,
)

**Exercise: *Write code to define the model architecture***. Remember, this should include an input layer, encoder and embedding layers, a bidirectional LSTM layer and an output layer. Keep the dimensionality of the LSTM layer low (for example, 16).

In [None]:
LEARNING_RATE = 0.005
BATCH_SIZE = 50
BUFFER_SIZE = 10000
N_EPOCHS = 5 # can be increased to get better results if computational resources available, e.g. to 20

binary_classifier_3 = Sequential(
    name=f"1-layer BiLSTM, embeddings from scratch"
)
# binary_classifier_3.add(...)  # fill in this line
# binary_classifier_3.add(...)  # fill in this line
# binary_classifier_3.add(...)  # fill in this line
# binary_classifier_3.add(...)  # fill in this line
# binary_classifier_3.add(...)  # fill in this line

binary_classifier_3.compile(
    #loss='...',  # fill in this line
    #metrics=['...'],  # fill in this line
    optimizer=Adam(learning_rate=LEARNING_RATE)
)
print(binary_classifier_3.summary())

You have two options: either train the model from scratch or load a pretrained model. Note: the LSTM takes longer to train than the logistic regression. We set the patience parameter to 5 to avoid redundant epochs.

In [None]:
train_from_scratch = True # if set to True, we train clf from scratch, otherwise load pretrained model

modelFile_classifier_3 = os.path.join(folder_path, 'models_LSTM/1_layer_BiLSTM_embeds_from_scratch.keras')
historyFile_classifier_3 = os.path.join(folder_path, 'models_LSTM/1_layer_BiLSTM_embeds_from_scratch.json')

# train from scratch
if train_from_scratch:
  # fit_and_eval_binary_classifier returns the training history
  res_dict_classifier_3 =  fit_and_eval_binary_classifier(
      train_ds=train_ds,
      val_ds=val_ds,
      model=binary_classifier_3,
      model_name=binary_classifier_3.name,
      learning_rate=LEARNING_RATE,
      buffer_size=BUFFER_SIZE,
      batch_size=BATCH_SIZE,
      n_epochs=N_EPOCHS,
      patience_n_epochs=5
  )
  # save trained model
  folder_name_lstm = "models_LSTM"
  folder_path_lstm = os.path.join(folder_path, folder_name_lstm) # concatenate the directory path and folder name
  if not os.path.exists(folder_path_lstm):
    os.mkdir(folder_path_lstm)

  binary_classifier_3.save(modelFile_classifier_3) # save classifier
  with open(historyFile_classifier_3, 'w') as json_file: # save history
    json.dump(res_dict_classifier_3, json_file)

# load pretrained model either from zipfile or from folder
else:
  if (not os.path.exists(modelFile_classifier_3)) | (not os.path.exists(historyFile_classifier_3)):
    !tar -xvzf ./pretrained/models_LSTM.tar.gz -C $folder_path # the path to the zip file and the destination directory

  binary_classifier_3 = load_model(modelFile_classifier_3)
  with open(historyFile_classifier_3, 'r') as json_file:
    res_dict_classifier_3 = json.load(json_file)

train_acc_model_3 = res_dict_classifier_3['train_accuracy']
val_acc_model_3 = res_dict_classifier_3['val_accuracy']

### Deeper network
Next, let's try with a deeper two-layer LSTM network. Word embeddings will be still learned from scratch.


**Exercise: *Define the full two-layer Bidirectional LSTM in the cell below.***  This is identical to the one-layer model, but with an extra Bidirectional LSTM layer. Again, keep the dimensionality of the LSTM layers low.

In [None]:
# encoder_embed = ... # fill in this line
# ... # fill in this line

embedding_layer = Embedding(
    # input_dim=...,  # fill in this line
    # output_dim=...,  # fill in this line
    embeddings_initializer="uniform",
    trainable=True,
)

LEARNING_RATE = 0.005
BATCH_SIZE = 50
BUFFER_SIZE = 10000
N_EPOCHS = 5 # can be increased to get better results if computational resources available, e.g. to 20

binary_classifier_4 = Sequential(
    name=f"2-layer BiLSTM, embeddings from scratch"
)
# binary_classifier_4.add(...)  # fill in this line
# binary_classifier_4.add(...)  # fill in this line
# binary_classifier_4.add(...)  # fill in this line
# binary_classifier_4.add(...)  # fill in this line
# binary_classifier_4.add(...)  # fill in this line
# binary_classifier_4.add(...)  # fill in this line

#compile the model before training
binary_classifier_4.compile(
    #loss='...',  # fill in this line
    #metrics=['...'],  # fill in this line
    optimizer=Adam(learning_rate=LEARNING_RATE)
)

In [None]:
print(binary_classifier_4.summary())

You have two options: either train the model from scratch or load a pretrained model.

In [None]:
train_from_scratch = True # if set to True, we train clf from scratch, otherwise load pretrained model

modelFile_classifier_4 = os.path.join(folder_path, 'models_LSTM/2_layer_BiLSTM_embeds_from_scratch.keras')
historyFile_classifier_4 = os.path.join(folder_path, 'models_LSTM/2_layer_BiLSTM_embeds_from_scratch.json')

# train from scratch
if train_from_scratch:
  # fit_and_eval_binary_classifier returns the training history
  res_dict_classifier_4 =  fit_and_eval_binary_classifier(
      train_ds=train_ds,
      val_ds=val_ds,
      model=binary_classifier_4,
      model_name=binary_classifier_4.name,
      learning_rate=LEARNING_RATE,
      buffer_size=BUFFER_SIZE,
      batch_size=BATCH_SIZE,
      n_epochs=N_EPOCHS,
      patience_n_epochs=5
  )
  # save trained model
  folder_name_lstm = "models_LSTM"
  folder_path_lstm = os.path.join(folder_path, folder_name_lstm) # concatenate the directory path and folder name
  if not os.path.exists(folder_path_lstm):
    os.mkdir(folder_path_lstm)

  binary_classifier_4.save(modelFile_classifier_4) # save classifier
  with open(historyFile_classifier_4, 'w') as json_file: # save history
    json.dump(res_dict_classifier_4, json_file)

# load pretrained model either from zipfile or from folder
else:
  if (not os.path.exists(modelFile_classifier_4)) | (not os.path.exists(historyFile_classifier_4)):
    !tar -xvzf ./pretrained/models_LSTM.tar.gz -C $folder_path # the path to the zip file and the destination directory

  binary_classifier_4 = load_model(modelFile_classifier_4)
  with open(historyFile_classifier_4, 'r') as json_file:
    res_dict_classifier_4 = json.load(json_file)

train_acc_model_4 = res_dict_classifier_4['train_accuracy']
val_acc_model_4 = res_dict_classifier_4['val_accuracy']

**Exercise:** To compare the two bidirectional LSTMs, plot the training and validation accuracy of the two models.

In [None]:
# ...  # fill in this line
# ...  # fill in this line
# ...  # fill in this line
# ...  # fill in this line

plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

### Pre-trained word embeddings

The dataset at hand is very domain-specific and not particularly large so it is unlikely that the model will be able learn the general meaning of words. Luckily the network can be initialised with pre-trained word embeddings, which were trained on generalist corpora to capture the meaning of all words in the vocabulary. We will download pre-trained GloVe embeddings of dimensionality 50, trained on a corpus of 6 billion tokens.

In [None]:
folder_path_glove = os.path.join(folder_path, 'models_glove') # concatenate the directory path and folder name
if not os.path.isfile('glove.6B.zip'):
    !wget http://nlp.stanford.edu/data/glove.6B.zip
    !unzip -q glove.6B.zip -d $folder_path_glove

In [None]:
# Load pre-trained GloVe embeddings
# ----------------------------------
file_path_glove_50d = os.path.join(folder_path_glove, 'glove.6B.50d.txt')
EMBEDDING_DIM = 50

embeddings_index = {}
with open(file_path_glove_50d) as f:
    for line in f:
        word, coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs, "f", sep=" ")
        embeddings_index[word] = coefs

print("Found %s word vectors." % len(embeddings_index))

hits = 0
misses = 0

# Prepare embedding matrix
embedding_matrix = np.zeros((VOCAB_SIZE, EMBEDDING_DIM))
for i, word in enumerate(encoder_embed.get_vocabulary()):
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # Words not found in embedding index will be all-zeros.
        # This includes the representation for "padding" and "OOV"
        embedding_matrix[i] = embedding_vector
        hits += 1
    else:
        misses += 1
        print(word)

print("Converted %d words (%d misses)" % (hits, misses))

#### Frozen embeddings

Here, we are going to leave the word embeddings "frozen". That is, they will not be updated throughout the training of the LSTM. In this way, the embeddings will remain general representations of word meaning while the rest of the network will specialise for the legal judgement prediction task.

**Exercise: *Define a Bidirectional LSTM with frozen, pre-trained word embeddings.*** You can make the LSTM one-layer for faster training. If computational power is available, use can use more.

In [None]:
LEARNING_RATE = 0.005
BATCH_SIZE = 50
BUFFER_SIZE = 10000
N_EPOCHS = 5 # can be increased to get better results if computational resources available, e.g. to 20

pretrained_embedding_layer_frozen = Embedding(
    #input_dim=...,  # fill in this line
    #output_dim=...,  # fill in this line
    #embeddings_initializer=Constant(embedding_matrix),
    #trainable=...,  # fill in this line
)

binary_classifier_5 = Sequential(
    name=f"1-layer BiLSTM classifier (frozen pre-trained embeddings)"
)
# Fill in the following lines to build the LSTM
#binary_classifier_5.add(...)
#binary_classifier_5.add(...)
#binary_classifier_5.add(...)
#binary_classifier_5.add(...)
#binary_classifier_5.add(...)

#compile the model before training
binary_classifier_5.compile(
    #loss='...',  # fill in this line
    #metrics=['...'],  # fill in this line
    optimizer=Adam(learning_rate=LEARNING_RATE)
)

In [None]:
binary_classifier_5.summary()

Again, you have two options: either train the model from scratch or load a pretrained model.

In [None]:
train_from_scratch = True # if set to True, we train clf from scratch, otherwise load pretrained model

modelFile_classifier_5 = os.path.join(folder_path, 'models_LSTM/1_layer_BiLSTM_embeds_pretrained_frozen.keras')
historyFile_classifier_5 = os.path.join(folder_path, 'models_LSTM/1_layer_BiLSTM_embeds_pretrained_frozen.json')

# train from scratch
if train_from_scratch:
  # fit_and_eval_binary_classifier returns the training history
  res_dict_classifier_5 =  fit_and_eval_binary_classifier(
    train_ds=train_ds,
    val_ds=val_ds,
    model=binary_classifier_5,
    model_name=binary_classifier_5.name,
    learning_rate=LEARNING_RATE,
    buffer_size=BUFFER_SIZE,
    batch_size=BATCH_SIZE,
    n_epochs=N_EPOCHS,
    patience_n_epochs=5
  )

  # save trained model
  folder_name_lstm = "models_LSTM"
  folder_path_lstm = os.path.join(folder_path, folder_name_lstm) # concatenate the directory path and folder name
  if not os.path.exists(folder_path_lstm):
    os.mkdir(folder_path_lstm)

  binary_classifier_5.save(modelFile_classifier_5) # save classifier
  with open(historyFile_classifier_5, 'w') as json_file: # save dictionary with history
    json.dump(res_dict_classifier_5, json_file)

# load pretrained model either from zipfile or from folder
else:
  if (not os.path.exists(modelFile_classifier_5)) | (not os.path.exists(historyFile_classifier_5)):
    !tar -xvzf ./pretrained/models_LSTM.tar.gz -C $folder_path # the path to the zip file and the destination directory

  binary_classifier_5 = load_model(modelFile_classifier_5)
  with open(historyFile_classifier_5, 'r') as json_file:
    res_dict_classifier_5 = json.load(json_file)

train_acc_model_5 = res_dict_classifier_5['train_accuracy']
val_acc_model_5 = res_dict_classifier_5['val_accuracy']

#### Adaptive embeddings

Now let's unfreeze the embeddings and allow them to be updated throughout training.  To do so, we need to set `trainable=True` when defining the embedding layer.

**Exercise: *Define a Bidirectional LSTM with adaptive embeddings.***

In [None]:
LEARNING_RATE = 0.005
BATCH_SIZE = 50
BUFFER_SIZE = 10000
N_EPOCHS = 5 # can be increased to get better results if computational resources available, e.g. to 20

pretrained_embedding_layer_adaptive = Embedding(
    #input_dim=...,  # fill in this line
    #output_dim=...,  # fill in this line
    #embeddings_initializer=Constant(embedding_matrix),
    #trainable=...,  # fill in this line
)

binary_classifier_6 = Sequential(
    name=f"1-layer BiLSTM, adaptive pre-trained embeddings"
)
# Fill in the following lines to build the LSTM
#binary_classifier_6.add(...)
#binary_classifier_6.add(...)
#binary_classifier_6.add(...)
#binary_classifier_6.add(...)
#binary_classifier_6.add(...)

#compile the model before training
binary_classifier_6.compile(
    #loss='...',  # fill in this line
    #metrics=['...'],  # fill in this line
    optimizer=Adam(learning_rate=LEARNING_RATE) 
)

In [None]:
binary_classifier_6.summary()

You have two options: either train the model from scratch or load a pretrained model.

In [None]:
train_from_scratch = True # if set to True, we train clf from scratch, otherwise load pretrained model

modelFile_classifier_6 = os.path.join(folder_path, 'models_LSTM/1_layer_BiLSTM_embeds_pretrained_adaptive.keras')
historyFile_classifier_6 = os.path.join(folder_path, 'models_LSTM/1_layer_BiLSTM_embeds_pretrained_adaptive.json')

# train from scratch
if train_from_scratch:
  # fit_and_eval_binary_classifier returns the training history
  res_dict_classifier_6 =  fit_and_eval_binary_classifier(
    train_ds=train_ds,
    val_ds=val_ds,
    model=binary_classifier_6,
    model_name=binary_classifier_6.name,
    learning_rate=LEARNING_RATE,
    buffer_size=BUFFER_SIZE,
    batch_size=BATCH_SIZE,
    n_epochs=N_EPOCHS,
    patience_n_epochs=5
  )

  # save trained model
  folder_name_lstm = "models_LSTM"
  folder_path_lstm = os.path.join(folder_path, folder_name_lstm) # concatenate the directory path and folder name
  if not os.path.exists(folder_path_lstm):
    os.mkdir(folder_path_lstm)

  binary_classifier_6.save(modelFile_classifier_6) # save classifier
  with open(historyFile_classifier_6, 'w') as json_file: # save dictionary with history
    json.dump(res_dict_classifier_6, json_file)

# load pretrained model either from zipfile or from folder
else:
  if (not os.path.exists(modelFile_classifier_6)) | (not os.path.exists(historyFile_classifier_6)):
    !tar -xvzf ./pretrained/models_LSTM.tar.gz -C $folder_path # the path to the zip file and the destination directory

  binary_classifier_6 = load_model(modelFile_classifier_6)
  with open(historyFile_classifier_6, 'r') as json_file:
    res_dict_classifier_6 = json.load(json_file)

train_acc_model_6 = res_dict_classifier_6['train_accuracy']
val_acc_model_6 = res_dict_classifier_6['val_accuracy']

**Exercise: *Plot the training and validation accuracy of the four LSTM models.***

  First, we plot the two models without pre-trained embeddings.

In [None]:
... # fill in this code block
#plt.plot(...)
#plt.plot(...)
#plt.plot(...)
#plt.plot(...)

plt.title('Training and validation accuracy without pre-trained embeddings')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

Next we plot the performance development of two models with pre-trained embeddings (frozen and adapted):

In [None]:
... # fill in this code block
#plt.plot(...)
#plt.plot(...)
#plt.plot(...)
#plt.plot(...)

plt.title('Training and validation accuracy with pre-trained embeddings')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

In [None]:
plt.plot(range(1, len(train_acc_model_1) + 1), train_acc_model_1, 'k:', label='Logreg count-based BOW, Training acc')
plt.plot(range(1, len(val_acc_model_1) + 1), val_acc_model_1, 'k',  label='Logreg count-based BOW, Validation acc')
plt.plot(range(1, len(train_acc_model_2) + 1), train_acc_model_2, 'y:', label='Logreg tfidf-based BOW, Training acc')
plt.plot(range(1, len(val_acc_model_2) + 1), val_acc_model_2, 'y',  label='Logreg tfidf-based BOW, Validation acc')

plt.plot(range(1, len(train_acc_model_3) + 1), train_acc_model_3, 'b:', label='1-layer BiLSTM, Training acc')
plt.plot(range(1, len(val_acc_model_3) + 1), val_acc_model_3, 'b',  label='1-layer BiLSTM, Validation acc')
plt.plot(range(1, len(train_acc_model_4) + 1), train_acc_model_4, 'g:', label='2-layer BiLSTM, Training acc')
plt.plot(range(1, len(val_acc_model_4) + 1), val_acc_model_4, 'g',  label='2-layer BiLSTM, Validation acc')

plt.plot(range(1, len(train_acc_model_5) + 1), train_acc_model_5, 'r:', label='1-layer BiLSTM (frozen pre-trained embeddings), Training acc')
plt.plot(range(1, len(val_acc_model_5) + 1), val_acc_model_5, 'r',  label='1-layer BiLSTM (frozen pre-trained embeddings), Validation acc')
plt.plot(range(1, len(train_acc_model_6) + 1), train_acc_model_6, 'c:', label='1-layer BiLSTM (adaptive pre-trained embeddings), Training acc')
plt.plot(range(1, len(val_acc_model_6) + 1), val_acc_model_6, 'c',  label='1-layer BiLSTM (adaptive pre-trained embeddings), Validation acc')

plt.title('Development of Training and Validation Accuracy: All Considered Models')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.ylim([0.8, 1])
plt.legend(loc='right', bbox_to_anchor=(2.2, 0.5))
plt.grid(True)
plt.show()

# Evaluation on Test Set

In [None]:
# Load test set
data = pd.read_csv("data.csv")
_, _, test_set = load_ECHR_dataset_for_binary_judgement_classification(data)

test_documents = test_set['texts']
test_labels = test_set['labels']

Example evaluation with logistic regression classifiers and LSTMs.

In [None]:
from sklearn.metrics import classification_report
from tqdm import tqdm

# List to store predicted probabilities
predictions = []

# Make prediction for the test set sentences
for text in tqdm(test_documents):
    # Tokenize input text
    tokenized_input = tf.constant([text])  # Ensure tensors are in the correct format for TensorFlow

    # Forward pass using the predict method
    predicted_prob = binary_classifier_1.predict(tokenized_input, verbose = 0)

    # Apply sigmoid to get probabilities and take the first element (assuming single logit)
    predicted_prob = tf.sigmoid(predicted_prob).numpy().squeeze().item()

    # Store the predicted probability
    predictions.append(predicted_prob)

# Turn predicted probabilities into binary classification scores
binary_predictions = [1 if pred > 0.5 else 0 for pred in predictions]

# Evaluate model by comparing its prediction to the gold labels
report = classification_report(y_true=test_labels, y_pred=binary_predictions)
print(report)

By examining the report for the test data, we observe that the classifier performs significantly better on positive cases (absolution). Specifically, the positive cases have a recall of 1.0, whereas the negative cases have a recall of only 0.03. Recall, also known as the True Positive Rate (TPR), indicates the proportion of actual positives correctly identified by the classifier. This means that while the classifier successfully recognizes all positive cases, it detects only 3% of the negative cases.