Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
This branch is 1 commit behind rmit-ir:master.

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

RMIT at PAN-CLEF 2020: Profiling Fake News Spreaders on Twitter

Implementation of our system submitted to the "Profiling Fake News Spreaders on Twitter" at PAN @ CLEF 2020


If you use this resource, please cite our paper:

Xinhuan Duan, Elham Naghizade, Damiano Spina, and Xiuzhen Zhang. 2020. RMIT at PAN-CLEF 2020: Profiling Fake News Spreaders on Twitter. In CLEF 2020 Labs and Workshops, Notebook Papers. (Sep 2020).


author = {Duan, Xinhuan and Naghizade, Elham and Spina, Damiano and Zhang, Xiuzhen},
title = {{RMIT at PAN-CLEF 2020: Profiling Fake News Spreaders on Twitter}},
booktitle = {{CLEF 2020 Labs and Workshops, Notebook Papers}},

Data Preparation

Download the files from the github

cd path-to-your-repositrary

Then you are ready for the reproduction of our PAN2020 submission.

  • note: if you only want to use our software you can switch to the software branch, where you can use our software directly after you have done the installation
  • note: your computer should support cuda cudnn, if you haven't install cuda or cudnn, go to the following links:


python3 -m pip install -r requirement.txt

Build the TLSP model

data preparation

The train data for our task is should not be exposed to the public according to the PAN restrictions, if you want to get access to the data, go to this link :

After downloding the data, put them copy and paste the en and es folder into the relative path /text_classification/data

Train a single model


In the folder you can then see a new file:"", which is a tweet-level model which can predict whether a user is a fake news spreader or not.

reproduce a 10-fold validation

Since this project contains of both tweet-level and user level classifiction, So when implementing a 10-fold validation, the data trained on the tweet-level and user-profile level should be the same data. So the 10 fold validation is implemented manually. All the users together with their tweets are divided into 10 folds and they are saved in 10 csv file. The in the file modelTrainer, the method train_model() can do the 10 fold validation.

Build the profile-level model


The script can produce a confusion matrix with the 10 fold validation result in the paper, and the features extracted from the user-level are already done and written with csv files in the path:


and in file line 56 and 57, you can add or delete word in the columns list and you are expected to see the change of the performance

  columns = ['median_score','mean_score','score_std','median_compound','mean_compound','compound_std','emoji','hash',

you can edit my useer csv and add your customed value to do the evaluation or you can even use the function in line 58 to add a new data

  columns =assemble(columns,'trump')


For more information, please contact the first author Xinhuan Duan:


Repository for RMIT participation at the "Profiling Fake News Spreaders on Twitter" at PAN @ CLEF 2020







No releases published


No packages published


  • Python 100.0%