**Name:** Izza Yaqoob (RAI_008)

**Base article:** Fake News Classification using transformer based enhanced LSTM
and BERT

**Link:** https://www.sciencedirect.com/science/article/pii/S2666307422000092

In [1]:
import pandas as pd

**Repository Cloning**

In [2]:
import os
import subprocess

# URL of the GitHub repository
repo_url = "https://github.com/KaiDMML/FakeNewsNet"
clone_dir = "FakeNewsNet"

# Check if already cloned
if not os.path.exists(clone_dir):
    print("Cloning FakeNewsNet repository...")
    subprocess.run(["git", "clone", repo_url])
    print("Repository cloned successfully.")
else:
    print("Repository already exists.")

# Change directory into the repo
os.chdir(clone_dir)
print(f"Changed working directory to: {os.getcwd()}")


Cloning FakeNewsNet repository...
Repository cloned successfully.
Changed working directory to: /content/FakeNewsNet


**Reading both fake and real datasets of gossipcop and politifact**

In [3]:
gossipcop_fake = pd.read_csv("/content/FakeNewsNet/dataset/gossipcop_fake.csv")
gossipcop_real = pd.read_csv("/content/FakeNewsNet/dataset/gossipcop_real.csv")
gossipcop_fake['label'] = 0  # 0 = Fake
gossipcop_real['label'] = 1  # 1 = Real
data1 = pd.concat([gossipcop_fake, gossipcop_real], ignore_index=True)[['title','label']]


In [4]:
data1.isnull().sum()

Unnamed: 0,0
title,0
label,0


In [5]:
data1.head()

Unnamed: 0,title,label
0,Did Miley Cyrus and Liam Hemsworth secretly ge...,0
1,Paris Jackson & Cara Delevingne Enjoy Night Ou...,0
2,Celebrities Join Tax March in Protest of Donal...,0
3,Cindy Crawford's daughter Kaia Gerber wears a ...,0
4,Full List of 2018 Oscar Nominations – Variety,0


In [6]:
politifact_fake = pd.read_csv("/content/FakeNewsNet/dataset/politifact_fake.csv")
politifact_real = pd.read_csv("/content/FakeNewsNet/dataset/politifact_real.csv")
politifact_fake['label'] = 0  # 0 = Fake
politifact_real['label'] = 1  # 1 = Real
data2 = pd.concat([politifact_fake, politifact_real], ignore_index=True)[['title','label']]

In [7]:
data2.head()

Unnamed: 0,title,label
0,BREAKING: First NFL Team Declares Bankruptcy O...,0
1,Court Orders Obama To Pay $400 Million In Rest...,0
2,UPDATE: Second Roy Moore Accuser Works For Mic...,0
3,Oscar Pistorius Attempts To Commit Suicide,0
4,Trump Votes For Death Penalty For Being Gay,0


**Shapes of data**

In [8]:
print('Gossipcop Fake: ',gossipcop_fake.shape)
print('Gossipcop Real: ',gossipcop_real.shape)
print('Politifact Fake: ',politifact_fake.shape)
print('Politifact Real: ',politifact_real.shape)
print('Concatenated Gossipcop: ',data1.shape)
print('Concatenated Politifact: ',data2.shape)

Gossipcop Fake:  (5323, 5)
Gossipcop Real:  (16817, 5)
Politifact Fake:  (432, 5)
Politifact Real:  (624, 5)
Concatenated Gossipcop:  (22140, 2)
Concatenated Politifact:  (1056, 2)


**Reducing the dimensions of Gossipcop data**

In [9]:
# combining 528 samples of both classes
first_528 = data1.head(528)
last_528= data1.tail(528)
data1 = pd.concat([first_528, last_528], ignore_index=True)
data1.shape

(1056, 2)

In [10]:
print(data1['title'].isna().sum())

0


In [11]:
import re #built-in regular expressions module

import string

import nltk  #Natural Language Toolkit, a key Python package for NLP tasks

from nltk.corpus import stopwords  # Common words like "the", "is", "in", etc., usually removed from text

from nltk.tokenize import word_tokenize  # Splits text into individual words (tokens)

# Download NLTK resources

nltk.download('punkt')   #Tokenizer models

nltk.download('punkt_tab') #Likely a typo or legacy resource

nltk.download('stopwords') # Predefined list of stopwords for multiple languages

# Custom stopwords list, keeping "not" and "can"

custom_stopwords = set(stopwords.words('english')) - {"not", "can"} # remove stopwords except can and not

# Function to expand contractions like "can't" to "can not"

def expand_contractions(text):
    return re.sub(r"['’']t\b", " not", text)

# Function to clean and tokenize text

def preprocess_text(text):

    # Handle non-string inputs (e.g., lists, NaN, etc.)
    if not isinstance(text, str):
        text = str(text)
    # Lowercase the text
    text = text.lower()

    # Expand contractions
    text = expand_contractions(text)

    # Remove @mentions
    text = re.sub(r"@\w+", "", text)  # w is for word character

    # Remove special characters except question marks and keep words and ?

    text = re.sub(r"[^\w\s?]", "", text) # Removes special characters except question marks, words, and whitespace

    # Remove digits and underscores (optional, depending on context)
    text = re.sub(r"[\d_]", "", text)

    # Remove extra whitespace
    text = text.strip() # remove both start and trainling (ending) whitespaces

    # Tokenize, explicitly specifying the language
    tokens = word_tokenize(text, language='english')

    # Remove stopwords except "not" and "can"
    filtered_tokens = [word for word in tokens if word not in custom_stopwords]

    return filtered_tokens

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


**Testing Pre-processing steps on sample**

In [12]:
example = "I can’t believe @jack said that! Isn’t it amazing? Honestly, it shouldn't happen."
print(preprocess_text(example))

['can', 'not', 'believe', 'said', 'not', 'amazing', '?', 'honestly', 'not', 'happen']


**Testing Pre-processing steps on datasets**

In [13]:
data1['title'] = data1['title'].apply(preprocess_text)
data1

Unnamed: 0,title,label
0,"[miley, cyrus, liam, hemsworth, secretly, get,...",0
1,"[paris, jackson, cara, delevingne, enjoy, nigh...",0
2,"[celebrities, join, tax, march, protest, donal...",0
3,"[cindy, crawfords, daughter, kaia, gerber, wea...",0
4,"[full, list, oscar, nominations, variety]",0
...,...,...
1051,"[hollywood, film, awards, complete, list, winn...",1
1052,"[jada, pinkett, smith, explains, son, jaden, m...",1
1053,"[tinsley, mortimer, reacts, luann, de, lesseps...",1
1054,"[prince, harry, carries, princess, dianas, leg...",1


In [14]:
data2['title'] = data2['title'].apply(preprocess_text)

In [15]:
data2.tail()

Unnamed: 0,title,label
1051,"[flake, religious, tests, place, senate]",1
1052,"[change, can, believe]",1
1053,"[deputy, director, national, health, statistic...",1
1054,"[romneys, prolife, conversion, myth, reality, ...",1
1055,"[interest, group, ratings]",1


**Splitting the two datasets into train-test modules**

In [16]:
from sklearn.model_selection import train_test_split
X1 = data1.title
X2 = data2.title
Y1= data1.label
Y2 = data2.label
X_train1,X_test1,y_train1,y_test1 = train_test_split(X1,Y1,stratify=Y1,test_size=0.2,random_state=42)
X_train2,X_test2,y_train2,y_test2 = train_test_split(X2,Y2,stratify=Y2,test_size=0.2,random_state=42)

In [17]:
X_train1 , y_train1

(123    [report, obama, pleads, jayz, prevent, hip, ho...
 288    [watch, blake, shelton, performs, underwear, ?...
 877       [best, looks, cmt, music, awards, red, carpet]
 615    [priyanka, chopras, makeup, artist, explains, ...
 300    [kim, kardashian, reveals, limit, kids, kanye,...
                              ...                        
 469                           [kate, hudson, net, worth]
 528    [chris, brown, wants, tour, rihanna, beyonce, ...
 892         [royal, wedding, william, harrys, best, man]
 198    [dolly, parton, californias, biblical, disaste...
 397    [nicole, kidman, helping, isabella, cruise, ad...
 Name: title, Length: 844, dtype: object,
 123    0
 288    0
 877    1
 615    1
 300    0
       ..
 469    0
 528    1
 892    1
 198    0
 397    0
 Name: label, Length: 844, dtype: int64)

In [18]:
X_train2 , y_train2

(641                 [week, transcript, adm, mike, mullen]
 936                 [kerrymccain, welcome, massachusetts]
 551     [rd, democratic, debate, transcript, annotated...
 580     [ad, says, obama, apologized, showed, weakness...
 609     [average, cable, tv, bill, cited, article, ind...
                               ...                        
 1035     [latest, political, news, headlines, dc, beyond]
 221       [manager, killed, employees, checkers, st, ave]
 761     [pwned, house, gop, dominates, twitter, youtub...
 1010    [employment, hours, earnings, current, employm...
 418     [senate, report, admits, clinton, gifted, chil...
 Name: title, Length: 844, dtype: object,
 641     1
 936     1
 551     1
 580     1
 609     1
        ..
 1035    1
 221     0
 761     1
 1010    1
 418     0
 Name: label, Length: 844, dtype: int64)

**Installing upgraded tensowflow-hub**

In [19]:
!pip install --upgrade tensorflow-hub tensorflow_text

Collecting tensorflow_text
  Downloading tensorflow_text-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.0 kB)
Collecting tensorflow<2.20,>=2.19.0 (from tensorflow_text)
  Downloading tensorflow-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting tensorboard~=2.19.0 (from tensorflow<2.20,>=2.19.0->tensorflow_text)
  Downloading tensorboard-2.19.0-py3-none-any.whl.metadata (1.8 kB)
Collecting ml-dtypes<1.0.0,>=0.5.1 (from tensorflow<2.20,>=2.19.0->tensorflow_text)
  Downloading ml_dtypes-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
INFO: pip is looking at multiple versions of tf-keras to determine which version is compatible with other requirements. This could take a while.
Collecting tf-keras>=2.14.1 (from tensorflow-hub)
  Downloading tf_keras-2.19.0-py3-none-any.whl.metadata (1.8 kB)
Downloading tensorflow_text-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

**Using BERT to generate embeddings for the data**

In [20]:
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_text
preprocessor = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3", trainable=True)

In [21]:
X_train1.shape

(844,)

**Main function to generate embeddings**

In [22]:
def get_bert_embeddings(texts):
    # Join the list of words into a single string for each text
    texts = [" ".join(text) for text in texts]
    text_input = tf.constant(texts)
    #print(text_input)
    preprocessed_text = preprocessor(text_input)
    #print(preprocessed_text)
    outputs = encoder(preprocessed_text)
    print(outputs)
    pooled_output = outputs['pooled_output']
    #print(pooled_output)
    return pooled_output

# Get BERT embeddings for training and test sets
train_embeddings1 = get_bert_embeddings(X_train1)

print(train_embeddings1.shape)
print(type(train_embeddings1))


{'pooled_output': <tf.Tensor: shape=(844, 768), dtype=float32, numpy=
array([[-0.8371907 , -0.40450677, -0.86797774, ..., -0.622965  ,
        -0.5620247 ,  0.7916527 ],
       [-0.9191437 , -0.5064055 , -0.8555118 , ..., -0.86856735,
        -0.67118365,  0.90609837],
       [-0.9168854 , -0.5948156 , -0.9036317 , ..., -0.6395854 ,
        -0.6655246 ,  0.90999407],
       ...,
       [-0.9147055 , -0.55137515, -0.8927793 , ..., -0.47993332,
        -0.6173769 ,  0.9094683 ],
       [-0.83769923, -0.5132386 , -0.8740812 , ..., -0.58981764,
        -0.65165037,  0.8442663 ],
       [-0.86476785, -0.46954104, -0.52548987, ..., -0.3070244 ,
        -0.57126117,  0.88666606]], dtype=float32)>, 'sequence_output': <tf.Tensor: shape=(844, 128, 768), dtype=float32, numpy=
array([[[ 0.00344767,  0.172313  , -0.25955677, ..., -0.47740543,
          0.5818506 ,  0.3834846 ],
        [ 0.669153  , -0.14307511, -0.04766881, ..., -0.31157437,
          0.57115734, -0.01358759],
        [ 0.40376732

**Embeddings of test data of gossipcop dataset**

In [23]:
test_embeddings1 = get_bert_embeddings(X_test1)

{'pooled_output': <tf.Tensor: shape=(212, 768), dtype=float32, numpy=
array([[-0.8378137 , -0.60682756, -0.8121384 , ..., -0.36962754,
        -0.754748  ,  0.8948909 ],
       [-0.8071678 , -0.43824923, -0.42457506, ...,  0.08271664,
        -0.6499784 ,  0.7996681 ],
       [-0.8127184 , -0.2717708 ,  0.05281968, ...,  0.14045201,
        -0.50047964,  0.8377152 ],
       ...,
       [-0.77719957, -0.45370573, -0.69639367, ..., -0.29009184,
        -0.5233553 ,  0.73492795],
       [-0.9111435 , -0.47009522, -0.9359695 , ..., -0.8110985 ,
        -0.6917372 ,  0.8561151 ],
       [-0.9126613 , -0.6254094 , -0.8357384 , ..., -0.62310475,
        -0.6762251 ,  0.8033462 ]], dtype=float32)>, 'sequence_output': <tf.Tensor: shape=(212, 128, 768), dtype=float32, numpy=
array([[[-0.25806084,  0.02070523,  0.21867874, ..., -0.6767113 ,
          0.17721608, -0.03392202],
        [-0.21495739,  0.2313491 ,  0.2886209 , ..., -0.53640527,
          0.59230274, -0.26333827],
        [-0.08176404

**Embeddings of train data of politifact dataset**

In [24]:
train_embeddings2 = get_bert_embeddings(X_train2)


{'pooled_output': <tf.Tensor: shape=(844, 768), dtype=float32, numpy=
array([[-0.82290787, -0.4204227 , -0.5031359 , ..., -0.19608662,
        -0.63881177,  0.7735298 ],
       [-0.86000097, -0.32396093, -0.14490199, ...,  0.30413076,
        -0.62152237,  0.86706686],
       [-0.8791201 , -0.43847844, -0.05629341, ..., -0.07327621,
        -0.68495274,  0.87504554],
       ...,
       [-0.8678114 , -0.43310788, -0.6586275 , ..., -0.42467803,
        -0.64043266,  0.74260205],
       [-0.8855852 , -0.51629806, -0.6988588 , ..., -0.4211111 ,
        -0.6993052 ,  0.85128236],
       [-0.74211717, -0.26746723,  0.31575504, ...,  0.47686592,
        -0.5087289 ,  0.7079619 ]], dtype=float32)>, 'sequence_output': <tf.Tensor: shape=(844, 128, 768), dtype=float32, numpy=
array([[[-4.40206736e-01,  7.92053044e-02, -1.95380211e-01, ...,
         -2.02376723e-01,  2.55369365e-01,  1.83849663e-01],
        [-3.82508278e-01, -3.35439414e-01, -1.69808447e-01, ...,
          3.21278691e-01,  7.2104

**Embeddings of test data of politifact dataset**

In [25]:
test_embeddings2 = get_bert_embeddings(X_test2)

{'pooled_output': <tf.Tensor: shape=(212, 768), dtype=float32, numpy=
array([[-8.1355649e-01, -3.4807611e-01, -1.7143974e-01, ...,
         2.4381702e-01, -5.8921176e-01,  8.0384022e-01],
       [-7.4156779e-01, -5.2267450e-01, -7.7282786e-01, ...,
        -3.2409102e-01, -6.1932445e-01,  8.1577039e-01],
       [-8.3775419e-01, -3.5651356e-01,  7.9888599e-03, ...,
         3.3396701e-03, -5.2547884e-01,  6.7905915e-01],
       ...,
       [-8.9657831e-01, -4.2703694e-01, -4.4531563e-01, ...,
        -1.0207027e-01, -5.4749423e-01,  8.0256659e-01],
       [-8.9822984e-01, -3.4099507e-01, -2.2409305e-01, ...,
        -4.9051165e-04, -5.5285174e-01,  7.6433003e-01],
       [-8.7605715e-01, -3.6677775e-01, -2.3489900e-01, ...,
        -2.5411528e-02, -6.2245905e-01,  8.4138781e-01]], dtype=float32)>, 'sequence_output': <tf.Tensor: shape=(212, 128, 768), dtype=float32, numpy=
array([[[-3.09293032e-01, -1.48147538e-01, -1.68780342e-01, ...,
         -3.60784411e-01,  3.77532333e-01,  6.23311

In [None]:
train_embeddings1.shape

TensorShape([844, 768])

**Adding Dense layer for classification**

In [31]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the model
model = Sequential([
    Dense(1, activation='sigmoid', input_shape=(train_embeddings1.shape[1],))
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Display the model summary
model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [34]:
METRICS = [
    tf.keras.metrics.BinaryAccuracy(name='accuracy'),
    tf.keras.metrics.Precision(name='prediction'),
    tf.keras.metrics.Recall(name='recall')
]

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=METRICS)


**Training the model for dataset1 with 50 epochs**

In [42]:
model.fit(train_embeddings1, y_train1, epochs=50)

Epoch 1/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7226 - loss: 0.5737 - prediction: 0.6973 - recall: 0.7735
Epoch 2/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.7188 - loss: 0.5630 - prediction: 0.7020 - recall: 0.7657
Epoch 3/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step - accuracy: 0.7181 - loss: 0.5720 - prediction: 0.7164 - recall: 0.7473
Epoch 4/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7213 - loss: 0.5770 - prediction: 0.7165 - recall: 0.7280
Epoch 5/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7157 - loss: 0.5692 - prediction: 0.7335 - recall: 0.6612
Epoch 6/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7271 - loss: 0.5553 - prediction: 0.7494 - recall: 0.7378
Epoch 7/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

<keras.src.callbacks.history.History at 0x790836c626d0>

**Evaluating Model performance for Test data of Gossipcop**

In [43]:
# Evaluate the model on the test data
loss, accuracy, precision, recall = model.evaluate(test_embeddings1, y_test1)

print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")
print(f"Test Precision: {precision}")
print(f"Test Recall: {recall}")

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6301 - loss: 0.6187 - prediction: 0.6545 - recall: 0.5852 
Test Loss: 0.5926693081855774
Test Accuracy: 0.6650943160057068
Test Precision: 0.6732673048973083
Test Recall: 0.6415094137191772


**Predictions on Test data of Gossicop**

In [46]:
y_predicted = model.predict(test_embeddings1)
y_predicted = y_predicted.flatten()

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 


In [51]:
import numpy as np
y_predicted = np.where(y_predicted > 0.9, 1, 0)
y_predicted

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

**Training the model for dataset1 with 50 epochs**

In [44]:
model.fit(train_embeddings2, y_train2, epochs=50)

Epoch 1/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6118 - loss: 0.6863 - prediction: 0.6353 - recall: 0.7790
Epoch 2/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6228 - loss: 0.7036 - prediction: 0.6362 - recall: 0.8342
Epoch 3/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6546 - loss: 0.6417 - prediction: 0.6806 - recall: 0.7944 
Epoch 4/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6697 - loss: 0.6363 - prediction: 0.6711 - recall: 0.8275
Epoch 5/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6588 - loss: 0.6502 - prediction: 0.6720 - recall: 0.8073
Epoch 6/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6753 - loss: 0.6262 - prediction: 0.7032 - recall: 0.7305
Epoch 7/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m

<keras.src.callbacks.history.History at 0x79083599b450>

**Evaluating Model performance for Test data of Politifact**

In [45]:
# Evaluate the model on the test data
loss, accuracy, precision, recall = model.evaluate(test_embeddings2, y_test2)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")
print(f"Test Precision: {precision}")
print(f"Test Recall: {recall}")

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.8349 - loss: 0.4413 - prediction: 0.8159 - recall: 0.9362
Test Loss: 0.458078533411026
Test Accuracy: 0.8207547068595886
Test Precision: 0.8085106611251831
Test Recall: 0.9120000004768372


**Predictions on Test Data of Politifact**

In [52]:
y_predicted = model.predict(test_embeddings2)
y_predicted = y_predicted.flatten()
y_predicted = np.where(y_predicted > 0.9, 1, 0)
y_predicted

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 


array([0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

**Discussion and Results**



1.   Results of the data Politifact are better than that of Gossicop because the data of gossicop is reduced
2.   Overall, the precision and recall values are aligned with the accuracy showing that there is no overfitting


