# DL Models Demo Notebook <a class="anchor" id="top"></a>

**Simon Hall**

The aim of this project is to perform a systematic investigation of a number of Deep Learning methods in the context of text processing, and to benchmark these methods against classical methods. The task uses a dataset of articles extracted from *The Guardian* newspaper, and aims to predict the broader section from which an article was taken.

## Contents:
* [1. Download Processed Data](#first-bullet)
    * [i. Download X_train Data](#xtrain-bullet)
    * [ii. Download X_val Data](#xval-bullet)
    * [iii. Download X_test Data](#xtest-bullet)
    * [iv. Download y_train Data](#ytrain-bullet)
    * [v. Download y_val Data](#yval-bullet)
    * [vi. Download y_test Data](#ytest-bullet)
* [2. Download and Run Models](#second-bullet)
    * [i. Download On-The-Fly CNN-LSTM Model](#cnn-bullet)
    * [ii. Download Pre-Trained CNN-LSTM Model](#pte-bullet)

### Download Processed Data <a class="anchor" id="first-bullet"></a>

[TOP ↑](#top)

First, we need to load the data from the web.

In [None]:
import numpy as np
import pandas as pd

In [None]:
#!pip install gdown
import gdown

**i. Download X_train Data** <a class="anchor" id="xtrain-bullet"></a>

[TOP ↑](#top)

In [None]:
#X_train
url = 'https://drive.google.com/uc?id=15gi8uu8W4MA-76HIjWMj3HqX8XMari8H'
output = 'X_train.txt'
gdown.download(url, output, quiet=False)

In [None]:
import numpy as np
X_train = np.genfromtxt("X_train.txt")

In [None]:
print(X_train)

**ii. Download X_val Data** <a class="anchor" id="xval-bullet"></a>

[TOP ↑](#top)

In [None]:
#X_val

url = 'https://drive.google.com/uc?id=1RwIpsyjaKzaFOqerpOZ9M4GLK9rBpC4F'
output = 'X_val.txt'
gdown.download(url, output, quiet=False)

In [None]:
import numpy as np
X_val = np.genfromtxt("X_val.txt")

In [None]:
print(X_val)

**iii. Download X_test Data** <a class="anchor" id="xtest-bullet"></a>

[TOP ↑](#top)

In [None]:
#X_test

url = 'https://drive.google.com/uc?id=1TX70ky8y7DrY_Wu2jOJfbRSjlMzx4G33'
output = 'X_test.txt'
gdown.download(url, output, quiet=False)

In [None]:
import numpy as np
X_test = np.genfromtxt("X_test.txt")

In [None]:
print(X_test)

**vi. Download y_train Data** <a class="anchor" id="ytrain-bullet"></a>

[TOP ↑](#top)

In [None]:
#y_train

url = 'https://drive.google.com/uc?id=1sczC8DInj7bRStnscIGxkIGqEG6TN_SK'
output = 'y_train.txt'
gdown.download(url, output, quiet=False)

In [None]:
import numpy as np
y_train = np.genfromtxt("y_train.txt")

In [None]:
print(y_train)

**v. Download y_val Data** <a class="anchor" id="xval-bullet"></a>

[TOP ↑](#top)

In [None]:
#y_val

url = 'https://drive.google.com/uc?id=1t8roXY2l7qpwTcHNP37jKF2ZS4C54GRu'
output = 'y_val.txt'
gdown.download(url, output, quiet=False)

In [None]:
import numpy as np
y_val = np.genfromtxt("y_val.txt")

In [None]:
print(y_val)

**vi. Download y_test Data** <a class="anchor" id="ytest-bullet"></a>

[TOP ↑](#top)

In [None]:
#y_test

url = 'https://drive.google.com/uc?id=1GR5qOLJO_vmuP53XnwiLfvpTeaxnMlOM'
output = 'y_test.txt'
gdown.download(url, output, quiet=False)

In [None]:
import numpy as np
y_test = np.genfromtxt("y_test.txt")

In [None]:
print(y_test)

### 2. Download Models <a class="anchor" id="second-bullet"></a>

[TOP ↑](#top)

**Download On-The_Fly Embedding CNN-LSTM Model** <a class="anchor" id="cnn-bullet"></a>

[TOP ↑](#top)

In [None]:
#CNN-LSTM Model with on-the-fly embeddings

url = 'https://drive.google.com/uc?id=1x550rag7FHHMu_wThfl19rCiMpQSUGHJ'
output = 'cnn_lstm_model.h5'
gdown.download(url, output, quiet=False)

In [None]:
from tensorflow import keras

# Load pre-trained model
cnn_lstm_model = keras.models.load_model('cnn_lstm_model.h5')

Now, we can print the model accuracy:

In [None]:
loss, accuracy = cnn_lstm_model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")

**Download Pre-Trained Embedding CNN-LSTM Model** <a class="anchor" id="pte-bullet"></a>

[TOP ↑](#top)

In [None]:
#CNN-LSTM Model with pre-trained embeddings

url = 'https://drive.google.com/uc?id=1H_4tSFHVNFCgFlHvouPLbSlcZdXQCrcU'
output = 'pte_cnn_lstm_model.h5'
gdown.download(url, output, quiet=False)

In [None]:
from tensorflow import keras

# Load pre-trained model
pte_cnn_lstm_model = keras.models.load_model('pte_cnn_lstm_model.h5')

Now, we can print the model accuracy:

In [None]:
loss, accuracy = pte_cnn_lstm_model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")

**Appendix: Generate Data**

In [None]:
#import numpy as np

#np.savetxt('X_train.txt', X_train)
#np.savetxt('X_test.txt', X_test)
#np.savetxt('X_val.txt', X_val)

#np.savetxt('y_train.txt', y_train_onehot)
#np.savetxt('y_test.txt', y_test_onehot)
#np.savetxt('y_val.txt', y_val_onehot)