Fake News Detection

Paper about evaluating BERT, RoBERTa, DistilBERT, ALBERT and XLNet for detecting stances of Fake News.

Goal and Background

In this paper, the two datasets FNC-1 and FNC-ARC are used to finetune large pretrained NLP models to classify the stances of article bodies towards their respective headline.

The goal is to systematically analyze the following questions:

How well do the models perform in general?
How much hyperparameter tuning is necessary?
Which of the models performs best?

The background of the paper is the Fake News Challenge which was held in 2017. More details can be found here.

Datasets

In total, two datasets are used to finetune the five models. The first dataset comes from the Fake News Challenge itself, while the second dataset is an extesion that was created by Hanselowski et al. Both datasets consist of article bodies, headlines and class labels. The class label expresses the stance of the article body towards the headline. The article body can either Agree (AGR) or Disagree (DSG) with the headline, it can Discuss (DSC) it or be completely Unrelated (UNR).

Dataset	Data Source	Data Type	Instances	AGR	DSG	DSC	UNR
FNC-1	Fake News Challenge Stage 1	News articles	49,972	7.4%	1.7%	17.8%	73.1%
FNC-1 ARC	Review of the Challenge	+ User posts	64,205	7.7%	3.5%	15.3%	73.5%

Data Pre-Processing

Step	Details
Concatenation	Headline + Article body
Stop word removal	The, the, A, a, An, an
Train-dev split	80:20

Models

In total, five models are examined and their implementation of HuggingFace is used.

Model	Publication Date	Published By	Idea in a Nutshell
BERT	Oct 2018	Google AI Language	Bidirectional Encoders from Transformer
RoBERTa	Jul 2019	Facebook AI & University of Washington	Pretrain BERT excessively
DistilBERT	Aug 2019	HuggingFace	Distill BERT
ALBERT	Sep 2019	Google Research & Toyota Technological Institute at Chicago	Distill BERT
XLNet	Jun 2019	Carnegie Mellon University & Google Brain	Permutation Language Model

Evaluating Unsupervised Representation Learning

The evaluation is conducted in two steps.

In the first experimental setup, all models are trained for 2 epochs,
with a learning rate of 3e-5, a sequence length of 512 tokens, a batch size of 8 and a linear learning rate schedule. With this fixed setting of hyperparameters three runs were conducted per model and dataset. The first run freezes all layers except for the last two (pooling & classification layer). The second run finetunes all layers. The third run freezes all embeddings layers.

The second step consists of an extensive grid search over the hyperparameters learning rate, batch size, sequence length and learning rate schedule and covers the following grid:

Hyperparameter
Sequence length	256	512
Batch size	16,32	4,8
Learning rate	1e-5, 2e-5, 3e-5, 4e-5
Learning rate schedule	constant, linear, cosine

Key Results

RoBERTa performs best
Encoder-based approach of RoBERTa beats autoregressive approach of XLNet
Learning rate is most important hyperparameter

Remarks Scripts

There are three main scripts:

data_prep
experiments
grid_search

All three scripts are used via the command line.

To execute everything, first create a virtual environment and then install the necessary packages via pip3 install -r requirements.txt

Details on Data Pre-Processing script

Executing python3 data_prep.py takes the files

train_bodies.csv
train_stances.csv
competition_test_bodies.csv
competition_test_stances.csv

for the FNC-1 and FNC-1 ARC dataset and fully processes them.
The processed files can be found under data/processed.
For both datasets three files are created for training (train), evaluation (dev) and testing (test) respectively.

The main pre-processing steps are

assign integer values 0,1,2,3 to the four classes AGR, DSG, DSC, UNR
merge headline and article body
remove stop words The, the, A, a, An, an by using the word tokenizer of NLTK
create split into training and development by using the 80:20 split function of the FNC-1

The folder data/splits contains the ids for the training and evaluation (hold_out) instances.

Details on Initial Experiments script

Executing python3 experiments.py yields the evaluation of the three different freezing techniques when setting the corresponding freeze flag accordingly. All models are trained for two epochs only and evaluation is done with respect to the evaluation dataset.

Most important flags:
The --model flag defines whether to use bert, roberta, distilbert, albert or xlnet
The --model_type flag takes the specific pretrained model from HuggingFace, for example bert-base-cased for bert
The --num_epochs flag is set to a default value of 2 epochs and should not be changed
The --dataset_name flag can be used to switch between the FNC-1 and FNC-1 ARC dataset
The --freeze flag sets the freezing technique to be used. A choice between freezing all but the finetuned layers (freeze), freezing the embedding layers only (freeze_embed) and freezing nothing, id est finetuning all layers (no_freeze) is possible.

Since the experiment script has to be run several times for each model and dataset, an additional bash script is used to facilitate the handling which can be used via ./experiments.sh in the terminal.

Details on Grid Search script

Executing python3 grid_search.py is the script used that conducts the grid search over 48 hyperparameter combinations.
It uses the tune package.

Important: the current learning rate has to be set manually within the script in the search_space dictionary. The storage capacity of the virtual machine only allowed for saving 12 model combinations at the same time. Thus for each model and dataset, the script grid_search.py had to be run 4 times for each of the learning rates separately.

Go to Details on Initial Experiments script for details on the flags that can be set.

Additional Remarks

The difference between the experiments and the grid search scripts is that the latter relies on the use of tune to speed up training and to perform grid search.

In some cases, the grid_search didn't end for one run, in that case, the evaluation and testing step were performed separately in addition.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.vscode		.vscode
data		data
results		results
src		src
.gitignore		.gitignore
README.md		README.md
data_prep.py		data_prep.py
experiments.py		experiments.py
experiments_pipeline.sh		experiments_pipeline.sh
grid_search.py		grid_search.py
requirements.txt		requirements.txt

magud/fake-news-detection

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection

Goal and Background

Datasets

Data Pre-Processing

Models

Evaluating Unsupervised Representation Learning

Key Results

Remarks Scripts

Details on Data Pre-Processing script

Details on Initial Experiments script

Details on Grid Search script

Additional Remarks

About

Resources

Stars

Watchers

Forks

Languages