


# **Rob Boswell**

# **Portfolio Project:** "Sequential RNN Text Classification Models for Twitter COVID-19 Misinformation Detection"

# **Models Originally Created:** Apr 18, 2021

---


---








### **Data Source:**  Shahi, Gautam Kishore, Anne Dirkson, and Tim A. Majchrzak. "An exploratory study of covid-19 misinformation on twitter." Online Social Networks and Media 22 (2021): 100104.

### Using the twitter dataset from Shahi et al (2021) that was created during the midst of the COVID-19 pandemic, I created deep learning models for sequential data which used embedding layers, Long Short-Term Memory layers (LSTMs), Gated Recurrent Unit (GRU) layers, bidirectional sequential layers, stacking, and Conv1D layers in order to perform sentiment analysis on tweets discussing COVID-19. The goal was to predict which tweets mentioned accurate information about COVID-19 and which tweets mentioned inaccurate or misleading information. There were 8,560 tweets in the dataset, with 52% categorized as “real” and 48% categorized as “fake.”

<br>

### After training my deep learning models on these authentic tweets, I created artificial tweets - some which stated true facts about COVID-19, and some which stated falsehoods - on which to evaluate the models.

---

<br>

### **<u>1st Model:</u>**

### Embedding layer with 32 attributes

### 1 LSTM layer with 32 neurons

### Batch Size = 32

<br>

### *Test set metrics:*

### Accuracy: 0.9336

### F1-score: 0.9364

### Precision: 0.9397

### Recall: 0.9330

### ROC-AUC score: 0.9839

<br>

### **<u>2nd Model:</u>**

### Embedding layer with 16 attributes

### 3 LSTM layers with 32 neurons each, and dropout (0.3) and recurrent dropout (0.3) for each layer

### The LSTM layers are bidirectional

### Batch Size = 16

<br>

### *Test set metrics:*

### Accuracy: 0.9336

### F1-score: 0.9374

### Precision: 0.9252

### Recall: 0.9500

### ROC-AUC score: 0.9847



<br>

### **<u>3rd Model:</u>**

### Embedding layer with 150 attributes

### Conv1D - 32 filters (7x7)

### Average Pooling 1D (5x5)

### 3 GRU layers with 128 neurons each, and dropout (0.3) and recurrent dropout (0.3) for each layer

### Batch Size = 50

<br>

### *Test set metrics:*

### Accuracy: 0.9089

### F1-score: 0.9140

### Precision: 0.9032

### Recall: 0.9250

### ROC-AUC score: 0.9676

<br>

### **<u>4th Model:</u>**

### Embedding layer with 50 attributes

### Conv1D - 60 filters (5x5)

### Average Pooling ID - (3x3)

### 2 GRU layers with 128 neurons each, and dropout (0.2) and recurrent dropout (0.2) for each layer

### Batch Size = 20

<br>

### *Test set metrics:*

### Accuracy: 0.9238

### F1-score: 0.9275

### Precision: 0.9246

### Recall: 0.9304

### ROC-AUC score: 0.9767

<br>

### **<u>5th Model:</u>**

### Embedding layer with 26 attributes

### 2 LSTM layers with 128 neurons each, and dropout (0.15) and recurrent dropout (0.15) for each layer

### 2 GRU layers with 32 neurons each, and dropout (0.15) and recurrent dropout (0.15) for each layer

### The LSTM and GRU layers are bidirectional

### Batch Size = 32

<br>

### *Test set metrics:*

### Accuracy: 0.9379

### F1-score: 0.9411

### Precision: 0.9333

### Recall: 0.9491

### ROC-AUC score: 0.9860

<br>

### **<u>Best Model:</u>**

### As can be seen, my best model in terms of test set metrics, overall, appears to be the fifth model. It achieved the best accuracy, best F1-score, second best precision, second best recall, and the best ROC-AUC score. Its uniqueness stands in stacking two LSTM layers and then stacking two GRU layers, rather than simply stacking layers of the same type together. All of these layers are bidirectional. Further, the model’s embedding layer’s number of attributes is fairly small, at 26.

---



## **The code below shows a summary of some real and fake COVID-19 tweets in the training set:**

In [None]:
#Source:Fighting an Infodemic: COVID-19 Fake News Dataset, https://github.com/diptamath/covid_fake_news,https://arxiv.org/abs/2011.03327

import pandas as pd
trainingdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/Constraint_Train.csv", usecols = ['tweet','label'])
testdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/english_test_with_labels.csv", usecols = ['tweet','label'])

trainingdata

Unnamed: 0,tweet,label
0,The CDC currently reports 99031 deaths. In gen...,real
1,States reported 1121 deaths a small rise from ...,real
2,Politically Correct Woman (Almost) Uses Pandem...,fake
3,#IndiaFightsCorona: We have 1524 #COVID testin...,real
4,Populous states can generate large case counts...,real
...,...,...
6415,A tiger tested positive for COVID-19 please st...,fake
6416,???Autopsies prove that COVID-19 is??� a blood...,fake
6417,_A post claims a COVID-19 vaccine has already ...,fake
6418,Aamir Khan Donate 250 Cr. In PM Relief Cares Fund,fake



---

## **Discussion of the dataset in general terms, and why building a predictive model using this data might be practically useful:**

### The dataset contains 8,560 tweets that have previously been labelled as either "real" (i.e., true) or "fake" (i.e., incorrect/misleading). As seen from the results of the code below, there are 3,360 "real" tweets in the training set, and 1,120 in the test set. Further, there are 3,060 "fake" tweets in the training set, and 1,020 in the test set. Thus, 52% of the data are real tweets and 48% are fake tweets.

<br>

### Building a highly accurate and predictively strong model based on this data could be very beneficial for helping Twitter (now X) to identify and remove tweets that are misleading/false regarding COVID-19, and thus pose a strong potential health threat to viewers and those to whom they may communicate misinformation learned from Twitter.

<br>

### Monitoring when spikes in fake tweets are occuring by using such a model could also help health professionals who have access to Twitter to know to move quickly to counter misinformation by speading accurrate tweets in hopes that far more Twitter users will be exposed to correct information about COVID-19 than fake information. Thus, Twitter (now X), the general population of Twitter users, and those with whom Twitter users share information about COVID-19 would stand to benefit from such a model.

---


In [None]:
print(len(trainingdata[trainingdata['label'] == 'real']))
print(len(testdata[testdata['label'] == 'real']))


3360
1120


In [None]:
print(len(trainingdata[trainingdata['label'] == 'fake']))
print(len(testdata[testdata['label'] == 'fake']))

3060
1020



<br>

## Define Preprocessor & Prepare Train and Test Data

In [None]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Flatten, LSTM, GRU, Bidirectional
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.optimizers import RMSprop
from sklearn.metrics import f1_score, roc_auc_score, precision_score, recall_score, accuracy_score
import tensorflow as tf
import numpy as np

# Tokenization and padding
tokenizer = Tokenizer(num_words=25000)
tokenizer.fit_on_texts(trainingdata.tweet)

def preprocessor(data, maxlen):
    sequences = tokenizer.texts_to_sequences(data)
    X = pad_sequences(sequences, maxlen=maxlen)
    return X

X_train = preprocessor(trainingdata.tweet, maxlen=45)
X_test = preprocessor(testdata.tweet, maxlen=45)

# One-hot encode labels
y_train = pd.get_dummies(trainingdata.label)
y_test = pd.get_dummies(testdata.label)

In [None]:
print(X_train.shape)
print(X_test.shape)

(6420, 45)
(2140, 45)



<br>

### By running the following code, we can confirm the 'fake' class takes the 0 (first) index position and that the 'real' class takes the 1 (second) index position. This will be helpful when we make actual predictions on unseen data near the end of this project.

In [None]:
y_train.columns

Index(['fake', 'real'], dtype='object')

---

<br>

## Model 1:

In [None]:
# Model architecture
with tf.device('/device:GPU:0'):
    model_1 = Sequential()
    model_1.add(Embedding(25000, 32))  # Removed input_length
    model_1.add(LSTM(32, return_sequences=True))
    model_1.add(Flatten())
    model_1.add(Dense(2, activation='softmax'))

    # Callbacks
    mc = ModelCheckpoint('best_model.keras', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
    red_lr = ReduceLROnPlateau(monitor='val_accuracy', patience=2, verbose=1, factor=0.05)

    # Compile model with updated accuracy metric and optimizer
    model_1.compile(optimizer=RMSprop(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

    # Train model
    history = model_1.fit(X_train, y_train, epochs=20, batch_size=32, callbacks=[mc, red_lr], validation_split=0.2)


Epoch 1/20
[1m156/161[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 5ms/step - accuracy: 0.7570 - loss: 0.4810
Epoch 1: val_accuracy improved from -inf to 0.89642, saving model to best_model.keras
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7599 - loss: 0.4766 - val_accuracy: 0.8964 - val_loss: 0.2581 - learning_rate: 0.0010
Epoch 2/20
[1m157/161[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.9265 - loss: 0.1773
Epoch 2: val_accuracy improved from 0.89642 to 0.92056, saving model to best_model.keras
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 11ms/step - accuracy: 0.9267 - loss: 0.1770 - val_accuracy: 0.9206 - val_loss: 0.2074 - learning_rate: 0.0010
Epoch 3/20
[1m159/161[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 8ms/step - accuracy: 0.9580 - loss: 0.1039
Epoch 3: val_accuracy improved from 0.92056 to 0.92445, saving model to best_model.keras
[1m161/161[0m [32m━━━━

In [None]:
import pickle

pickle.dump(model_1, open('model_1.pkl', 'wb'))

In [None]:
model_1 = pickle.load(open('model_1.pkl', 'rb'))

In [None]:
# Generate predictions as labels
y_pred = model_1.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]

# Get the true labels
true_labels = y_test.idxmax(axis=1)

# Compute metrics for binary classification
# Specify the positive label explicitly
accuracy = accuracy_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels, pos_label='real')
precision = precision_score(true_labels, predicted_labels, pos_label='real')
recall = recall_score(true_labels, predicted_labels, pos_label='real')

# Indicate that 'real' is the positive class for calculating the ROC-AUC
roc_auc = roc_auc_score((true_labels == 'real').astype(int), model_1.predict(X_test)[:, 1])

# Display the metrics
print("Accuracy: {:.4f}".format(accuracy))
print("F1-score: {:.4f}".format(f1))
print("Precision: {:.4f}".format(precision))
print("Recall: {:.4f}".format(recall))
print("ROC-AUC score: {:.4f}".format(roc_auc))

[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Accuracy: 0.9336
F1-score: 0.9364
Precision: 0.9397
Recall: 0.9330
ROC-AUC score: 0.9839



<br>

## Model 2:

In [None]:
with tf.device('/device:GPU:0'):
  model_2 = Sequential()
  model_2.add(Embedding(25000, 16))
  model_2.add(Bidirectional(LSTM(32, dropout=0.3, recurrent_dropout=0.3, return_sequences=True)))
  model_2.add(Bidirectional(LSTM(32, dropout=0.3, recurrent_dropout=0.3, return_sequences=True)))
  model_2.add(Bidirectional(LSTM(32, dropout=0.3, recurrent_dropout=0.3)))
  model_2.add(Flatten())
  model_2.add(Dense(2, activation='softmax'))

  mc = ModelCheckpoint('best_model.keras', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
  red_lr= ReduceLROnPlateau(monitor='val_accuracy', patience=2, verbose=1, factor=0.1)

  model_2.compile(optimizer=RMSprop(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

  history = model_2.fit(X_train, y_train,
                      epochs=20,
                      batch_size=16,
                      callbacks=[mc, red_lr],
                      validation_split=0.2)

Epoch 1/20
[1m321/321[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 263ms/step - accuracy: 0.7322 - loss: 0.5049
Epoch 1: val_accuracy improved from -inf to 0.89408, saving model to best_model.keras
[1m321/321[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m100s[0m 288ms/step - accuracy: 0.7325 - loss: 0.5045 - val_accuracy: 0.8941 - val_loss: 0.2578 - learning_rate: 0.0010
Epoch 2/20
[1m321/321[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 254ms/step - accuracy: 0.9215 - loss: 0.2072
Epoch 2: val_accuracy improved from 0.89408 to 0.91355, saving model to best_model.keras
[1m321/321[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m92s[0m 286ms/step - accuracy: 0.9215 - loss: 0.2072 - val_accuracy: 0.9136 - val_loss: 0.2030 - learning_rate: 0.0010
Epoch 3/20
[1m321/321[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 270ms/step - accuracy: 0.9527 - loss: 0.1367
Epoch 3: val_accuracy improved from 0.91355 to 0.92290, saving model to best_model.keras
[1m321/321[

In [None]:
pickle.dump(model_2, open('model_2.pkl', 'wb'))

In [None]:
model_2 = pickle.load(open('model_2.pkl', 'rb'))

In [None]:
# Generate predictions as labels
y_pred = model_2.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]

# Get the true labels
true_labels = y_test.idxmax(axis=1)

# Compute metrics for binary classification
# Specify the positive label explicitly
accuracy = accuracy_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels, pos_label='real')
precision = precision_score(true_labels, predicted_labels, pos_label='real')
recall = recall_score(true_labels, predicted_labels, pos_label='real')

# Indicate that 'real' is the positive class for calculating the ROC-AUC
roc_auc = roc_auc_score((true_labels == 'real').astype(int), model_2.predict(X_test)[:, 1])

# Display the metrics
print("Accuracy: {:.4f}".format(accuracy))
print("F1-score: {:.4f}".format(f1))
print("Precision: {:.4f}".format(precision))
print("Recall: {:.4f}".format(recall))
print("ROC-AUC score: {:.4f}".format(roc_auc))

[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 75ms/step
[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 126ms/step
Accuracy: 0.9336
F1-score: 0.9374
Precision: 0.9252
Recall: 0.9500
ROC-AUC score: 0.9847



<br>

## Model 3:

In [None]:
from tensorflow.keras import layers

with tf.device('/device:GPU:0'):
  model_3 = Sequential()
  model_3.add(layers.Embedding(25000, 150))
  model_3.add(layers.Conv1D(32, 7, activation='relu'))
  model_3.add(layers.AveragePooling1D(5))
  model_3.add(layers.GRU(128, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))
  model_3.add(layers.GRU(128, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))
  model_3.add(layers.GRU(128, dropout=0.3, recurrent_dropout=0.3))
  model_3.add(Flatten())
  model_3.add(layers.Dense(2, activation='softmax'))

  mc = ModelCheckpoint('best_model.keras', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
  red_lr = ReduceLROnPlateau(monitor='val_accuracy', patience=2, verbose=1, factor=0.05)

  model_3.compile(optimizer=RMSprop(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

  history = model_3.fit(X_train, y_train,
                      epochs=20,
                      batch_size=50,
                      callbacks=[mc,red_lr],
                      validation_split=0.2)

Epoch 1/20
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step - accuracy: 0.7086 - loss: 0.5588
Epoch 1: val_accuracy improved from -inf to 0.88162, saving model to best_model.keras
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 71ms/step - accuracy: 0.7093 - loss: 0.5578 - val_accuracy: 0.8816 - val_loss: 0.3193 - learning_rate: 0.0010
Epoch 2/20
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.9109 - loss: 0.2373
Epoch 2: val_accuracy improved from 0.88162 to 0.90031, saving model to best_model.keras
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 56ms/step - accuracy: 0.9110 - loss: 0.2371 - val_accuracy: 0.9003 - val_loss: 0.2690 - learning_rate: 0.0010
Epoch 3/20
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 65ms/step - accuracy: 0.9550 - loss: 0.1210
Epoch 3: val_accuracy did not improve from 0.90031
[1m103/103[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

In [None]:
pickle.dump(model_3, open('model_3.pkl', 'wb'))

In [None]:
model_3 = pickle.load(open('model_3.pkl', 'rb'))

In [None]:
# Generate predictions as labels
y_pred = model_3.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]

# Get the true labels
true_labels = y_test.idxmax(axis=1)

# Compute metrics for binary classification
# Specify the positive label explicitly
accuracy = accuracy_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels, pos_label='real')
precision = precision_score(true_labels, predicted_labels, pos_label='real')
recall = recall_score(true_labels, predicted_labels, pos_label='real')

# Indicate that 'real' is the positive class for calculating the ROC-AUC
roc_auc = roc_auc_score((true_labels == 'real').astype(int), model_3.predict(X_test)[:, 1])

# Display the metrics
print("Accuracy: {:.4f}".format(accuracy))
print("F1-score: {:.4f}".format(f1))
print("Precision: {:.4f}".format(precision))
print("Recall: {:.4f}".format(recall))
print("ROC-AUC score: {:.4f}".format(roc_auc))

[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 18ms/step
[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step
Accuracy: 0.9089
F1-score: 0.9140
Precision: 0.9032
Recall: 0.9250
ROC-AUC score: 0.9676



<br>

## Model 4:

In [None]:
from tensorflow.keras import layers

with tf.device('/device:GPU:0'):
  model_4 = Sequential()
  model_4.add(layers.Embedding(25000, 50))
  model_4.add(layers.Conv1D(60, 5, activation='relu'))
  model_4.add(layers.AveragePooling1D(3))
  model_4.add(layers.GRU(128, dropout=0.2, recurrent_dropout=0.2, return_sequences=True))
  model_4.add(layers.GRU(128, dropout=0.2, recurrent_dropout=0.2))
  model_4.add(Flatten())
  model_4.add(layers.Dense(2, activation='softmax'))

  mc = ModelCheckpoint('best_model.keras', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
  red_lr = ReduceLROnPlateau(monitor='val_accuracy', patience=2, verbose=1, factor=0.05)

  model_4.compile(optimizer=RMSprop(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

  history = model_4.fit(X_train, y_train,
                      epochs=20,
                      batch_size=20,
                      callbacks=[mc,red_lr],
                      validation_split=0.2)

Epoch 1/20
[1m257/257[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step - accuracy: 0.7807 - loss: 0.4483
Epoch 1: val_accuracy improved from -inf to 0.90810, saving model to best_model.keras
[1m257/257[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 72ms/step - accuracy: 0.7810 - loss: 0.4479 - val_accuracy: 0.9081 - val_loss: 0.2685 - learning_rate: 0.0010
Epoch 2/20
[1m257/257[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step - accuracy: 0.9507 - loss: 0.1416
Epoch 2: val_accuracy improved from 0.90810 to 0.91978, saving model to best_model.keras
[1m257/257[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 66ms/step - accuracy: 0.9507 - loss: 0.1416 - val_accuracy: 0.9198 - val_loss: 0.2090 - learning_rate: 0.0010
Epoch 3/20
[1m257/257[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 62ms/step - accuracy: 0.9732 - loss: 0.0707
Epoch 3: val_accuracy did not improve from 0.91978
[1m257/257[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [

In [None]:
pickle.dump(model_4, open('model_4.pkl', 'wb'))

In [None]:
model_4 = pickle.load(open('model_4.pkl', 'rb'))

In [None]:
# Generate predictions as labels
y_pred = model_4.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]

# Get the true labels
true_labels = y_test.idxmax(axis=1)

# Compute metrics for binary classification
# Specify the positive label explicitly
accuracy = accuracy_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels, pos_label='real')
precision = precision_score(true_labels, predicted_labels, pos_label='real')
recall = recall_score(true_labels, predicted_labels, pos_label='real')

# Indicate that 'real' is the positive class for calculating the ROC-AUC
roc_auc = roc_auc_score((true_labels == 'real').astype(int), model_4.predict(X_test)[:, 1])

# Display the metrics
print("Accuracy: {:.4f}".format(accuracy))
print("F1-score: {:.4f}".format(f1))
print("Precision: {:.4f}".format(precision))
print("Recall: {:.4f}".format(recall))
print("ROC-AUC score: {:.4f}".format(roc_auc))

[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step
[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step
Accuracy: 0.9238
F1-score: 0.9275
Precision: 0.9246
Recall: 0.9304
ROC-AUC score: 0.9767



<br>

## Model 5:

In [None]:
with tf.device('/device:GPU:0'):

  model_5 = Sequential()
  model_5.add(Embedding(25000, 26))
  model_5.add(Bidirectional(LSTM(128, dropout=0.15, recurrent_dropout=0.15, return_sequences=True)))
  model_5.add(Bidirectional(LSTM(128, dropout=0.15, recurrent_dropout=0.15, return_sequences=True)))
  model_5.add(Bidirectional(GRU(32, dropout=0.15, recurrent_dropout=0.15, return_sequences=True)))
  model_5.add(Bidirectional(GRU(32, dropout=0.15, recurrent_dropout=0.15)))
  model_5.add(Flatten())
  model_5.add(Dense(2, activation='softmax'))

  mc = ModelCheckpoint('best_model.keras', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
  red_lr = ReduceLROnPlateau(monitor='val_accuracy', patience=2, verbose=1, factor=0.05)

  model_5.compile(optimizer=RMSprop(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

  history = model_5.fit(X_train, y_train,
                      epochs=20,
                      batch_size=32,
                      callbacks=[mc,red_lr],
                      validation_split=0.2)

Epoch 1/20
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 659ms/step - accuracy: 0.7368 - loss: 0.5382
Epoch 1: val_accuracy improved from -inf to 0.90732, saving model to best_model.keras
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m126s[0m 703ms/step - accuracy: 0.7373 - loss: 0.5383 - val_accuracy: 0.9073 - val_loss: 0.2458 - learning_rate: 0.0010
Epoch 2/20
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 625ms/step - accuracy: 0.9321 - loss: 0.1991
Epoch 2: val_accuracy improved from 0.90732 to 0.91667, saving model to best_model.keras
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m134s[0m 652ms/step - accuracy: 0.9321 - loss: 0.1993 - val_accuracy: 0.9167 - val_loss: 0.2195 - learning_rate: 0.0010
Epoch 3/20
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 615ms/step - accuracy: 0.9535 - loss: 0.1494
Epoch 3: val_accuracy did not improve from 0.91667
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m

In [None]:
pickle.dump(model_5, open('model_5.pkl', 'wb'))

In [None]:
model_5 = pickle.load(open('model_5.pkl', 'rb'))

In [None]:
from sklearn.metrics import f1_score, roc_auc_score, precision_score, recall_score, accuracy_score

# Generate predictions as labels
y_pred = model_5.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]

# Get the true labels
true_labels = y_test.idxmax(axis=1)

# Compute metrics for binary classification
# Specify the positive label explicitly
accuracy = accuracy_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels, pos_label='real')
precision = precision_score(true_labels, predicted_labels, pos_label='real')
recall = recall_score(true_labels, predicted_labels, pos_label='real')

# Indicate that 'real' is the positive class for calculating the ROC-AUC
roc_auc = roc_auc_score((true_labels == 'real').astype(int), model_5.predict(X_test)[:, 1])

# Display the metrics
print("Accuracy: {:.4f}".format(accuracy))
print("F1-score: {:.4f}".format(f1))
print("Precision: {:.4f}".format(precision))
print("Recall: {:.4f}".format(recall))
print("ROC-AUC score: {:.4f}".format(roc_auc))

[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 142ms/step
[1m67/67[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 106ms/step
Accuracy: 0.9379
F1-score: 0.9411
Precision: 0.9333
Recall: 0.9491
ROC-AUC score: 0.9860


---

## **Discussion of the best and worst performing models:**

### All of my models use a keras tokenizer with the num_words argument set to 25000, which restricts the vocabulary size to be considered by the models to the 25,000 most frequently used words in the dataset. All of my models also use the maxlen argument value of 45 in the pad_sequences function in the processor. Since all sequence data in keras deep learning models must contain the same number of tokens to ensure the same length, a maxlen argument value of 45 means that token sequences longer than 45 will be truncated at 45, and all token sequences shorter than 45 will be padded with zeroes.

<br>

### My 5th model performed the best - overall - on test set data (93.79% accuracy, 0.9411 F1-score, 0.9333 precision, 0.9491 recall, and 0.9860 ROC-AUC score). This model combined two bidirectional LSTM layers (each having 128 neurons) with two bidirectional GRU layers (each having 32 neurons). All of my models used embeddings; in my 5th model, the embedding layer contained 26 attributes. I also used dropout (.15) and recurrent dropout (.15) to try to reduce overfitting. It is possible that in this case the lower percentage for dropout compared to my other models may have contributed towards the higher test set performance, although normally the opposite might be expected.

<br>

### Model 3 achieved the lowest test set metrics in every category compared to all other models. It used conv1d (32 filters of size 7x7) and average pooling of size 5x5 to decrease the number of parameters before stacking. I then included 3 GRU layers back to back - each having 128 neurons, dropout (0.30), and recurrent dropout (0.30). It is noteable that the larger number of stacked layers, larger number of neurons per layer, and larger proportion of nuerons experiencing dropout compared to the other models did not result in better validation accuracy or better test set performance. This may suggest that with this dataset simpler models perform better. Bigger is not always better.


---

## **Using the Best Model for Predictions on Unseen Tweets:**

### Below, I have created a series of tweets (5 real and 5 fake), and used model 5 to predict if the tweets contain real or fake information about COVID-19.

In [None]:
# Fake example #1

print(model_5.predict(preprocessor(["COVID is fake news. It's nothing more than the common flu. This is just anti-Trump propoganda from the Radical Left."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 313ms/step
[[9.990910e-01 9.089763e-04]]


In [None]:
# Fake example #2

print(model_5.predict(preprocessor(["COVID-19 is no more deadly than the flu. Don't believe what the 'experts' are telling you. Don't wear a mask if you don't feel like it!"], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 175ms/step
[[0.9975681  0.00243191]]


In [None]:
# Fake example #3

print(model_5.predict(preprocessor(["Don't let the Antifa radicals convince you that COVID is dangerous. They just want to destroy our economy by making people stay home so that they can steal the election."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 176ms/step
[[0.9967694  0.00323056]]


In [None]:
# Fake example #4

print(model_5.predict(preprocessor(["COVID-19 is China's attempt to take over the world. They have been developing biological weapons for decades to unleash on the US. They will kill their own people to do it if necessary."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 180ms/step
[[0.98120946 0.01879057]]


In [None]:
# Fake example #5

print(model_5.predict(preprocessor(["COVID-19 mRNA is actually not a vaccine at all, but rather an operating system that will convert our bodies into zombies."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 180ms/step
[[0.99219114 0.00780889]]


In [None]:
# Real example #1

print(model_5.predict(preprocessor(["COVID-19 was the 3rd leading cause of death in the US in 2020, with heart disease and cancer being even deadlier."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 172ms/step
[[0.96569633 0.03430365]]


In [None]:
# Real example #2

print(model_5.predict(preprocessor(["There is no evidence to back up the claim that COVID-19 increases the chances of women having miscarriages."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 170ms/step
[[0.98464453 0.01535552]]


In [None]:
# Real example #3

print(model_5.predict(preprocessor(["People with cancer, kidney disease, lung diseases, dementia, diabetes, or liver disease are more likely to become seriously ill from COVID-19."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 169ms/step
[[0.9320307  0.06796932]]


In [None]:
# Real example #4

print(model_5.predict(preprocessor(["Children have been impacted less harmfully by COVID infections, on average, than adults."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 171ms/step
[[0.14142035 0.85857964]]


In [None]:
# Real example #5

print(model_5.predict(preprocessor(["People who have substance abuse problems are more likely to experience severe COVID-19 symtoms, if infected, than those who do not."], maxlen=60)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 180ms/step
[[0.16539998 0.8346    ]]



---

## **Results:**

### Recall that in the predictions above, **0 (the first index position)** corresponds to the **fake** class and that **1 (the second index position)** corresponds to the **real** class.

<br>

### Since predicted probabilities for accuracy are rounded up for values greater than 0.5, and rounded down for values less than 0.5, the model correctly predicts all 5 of the fake tweets as being fake. However, it only correctly predicts 2 of the 5 real tweets (real examples #4 and #5) as being real. This corresponds to an overall model accuracy level of 70% - at least when tested on my "unseen" samples.