## Bachelor Thesis
## "Exploring Various Classification Techniques In Detection of Disinformation."
Ilia Sokolovskiy
HTW SS23

Notebook 5/5 - Model Ensemble + Final Predictions

**Installing all necessary dependencies**

**Importing all necessary libraries**

In [14]:
import os
import pickle

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch

from transformers import (
    AutoModelForTokenClassification,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    BertTokenizerFast,
    TrainingArguments,
    Trainer,
    BertForSequenceClassification,
)
from peft import (
    get_peft_config,
    PeftModel,
    PeftConfig,
    get_peft_model,
    LoraConfig,
    TaskType,
)

from torch.utils.data import DataLoader, TensorDataset
from torch.optim import AdamW
from torch.utils.data import Dataset
from tqdm.notebook import tqdm
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from sklearn.model_selection import train_test_split

from utils import NewsClassifier

### Load the data and form and define the num_labels, id2label and label2id for BERT

In [15]:
# Load the pickle with the df
base_dir = "Data"
pickle_folder = "Pickles"
filename_pickle = "pickle_lg_df_2"

full_path_pickle = os.path.join(base_dir, pickle_folder, filename_pickle)

df = pd.read_pickle(full_path_pickle)

In [16]:
labels = df['label'].unique().tolist()
labels = [s.strip() for s in labels ]

num_labels= len(labels)
id2label={id:label for id,label in enumerate(labels)}
label2id={label:id for id,label in enumerate(labels)}

print(f"num_labels: {num_labels}")
print(f"id2label: {id2label}")
print(f"label2id: {label2id}")

num_labels: 2
id2label: {0: 'FAKE', 1: 'TRUE'}
label2id: {'FAKE': 0, 'TRUE': 1}


### Loading three models - SVM (96.4% Accuracy), Bi-LSTM (97.35% Accuracy), BERT (99.1% Accuracy)

In [17]:
# Setting path parameters
base_dir = "Models"

svm_dir = "Pickles"
svm_model_pickle = "best_sklearn_model_1.pkl"
svm_scaler_pickle = "best_sklearn_model_scaler_1.pkl"
svm_path_model = os.path.join(base_dir, svm_dir, svm_model_pickle)
svm_path_scaler = os.path.join(base_dir, svm_dir, svm_scaler_pickle)

bi_lstm_dir = "Torches"
bi_lstm_weights_file = "bi_lstm_weights_1.pth"
bi_lstm_path_weights = os.path.join(base_dir, bi_lstm_dir, bi_lstm_weights_file)

peft_model_id = "il1a/BERT_Fake_News_Classification_LoRA_v1"

**Loading SVM**

In [18]:
svm = pd.read_pickle(svm_path_model)
svm_scaler = pd.read_pickle(svm_path_scaler)

**Loading Bi-LSTM**

In [19]:
bi_lstm = NewsClassifier()
bi_lstm.load_state_dict(torch.load(bi_lstm_path_weights))
bi_lstm.eval()

NewsClassifier(
  (lstm): LSTM(300, 50, batch_first=True, bidirectional=True)
  (fc): Linear(in_features=100, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

**Loading BERT Adapter from HuggingFace 🤗**

In [20]:
peft_config = PeftConfig.from_pretrained(peft_model_id)
bert_inference = AutoModelForSequenceClassification.from_pretrained(
    peft_config.base_model_name_or_path, num_labels=num_labels, id2label=id2label, label2id=label2id
)
bert_tokenizer = BertTokenizerFast.from_pretrained(peft_config.base_model_name_or_path)
bert = PeftModel.from_pretrained(bert_inference, peft_model_id)

Downloading (…)/adapter_config.json:   0%|          | 0.00/434 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model t

Downloading adapter_model.bin:   0%|          | 0.00/2.83M [00:00<?, ?B/s]