GitHub

Large Language Models for Propaganda Detection

This repository demonstrates how to reproduce the results of our paper: Large Language Models for Propaganda Detection.

We provide a requirements.txt file that contains the necessary Python packages and dependencies. The installation has been tested on a Linux-based GPU cluster and a 2021 M1 MacBook Pro. To install the requirements, we recommend creating a new virtual environment and running the following command:

pip install -r requirements.txt

Additionally, you need to add the API organization ID and API key to the environment variables:

export OPENAI_ORGANIZATION="your_organization"
export OPENAI_API_KEY="your_api_key"

Below, we outline the step-by-step process to replicate our results:

Create text classification data from the token classification dataset. The original dataset can be found here.

python src/text_classification/helper/create_labels_text_classification.py

Split the articles for fine-tuning the GPT-3 model to fit within the token limit.

python src/text_classification/helper/split_text.py

Create the .json file for GPT-3 fine-tuning using one of the three GPT-3 config files, e.g., src/text_classification/gpt/config/gpt3_instruction_base.yaml. This will generate the dataset under src/text_classification/gpt/train-gpt3-json/gpt-3_instruction_base.json.

python src/text_classification/gpt/create_training_data.py -c src/text_classification/gpt/config/gpt3_instruction_base.yaml

Run the fine-tuning jobs. First, validate the created .json file and split it into training and validation data. Then, train the model. Note that for GPT-3 base and chain of thought, we only fine-tune one model with different prompts at inference.

bash src/text_classification/gpt/fine-tune-gpt-3.sh

Run inference and calculate metrics for each of the five model combinations using the corresponding .yaml file. The results will be saved under src/text_classification/gpt/results. For the GPT-3 models, this process requires training your own model (step 4) and adapting the model_name accordingly within the provided .yaml file.

python src/text_classification/gpt/inference.py -c src/text_classification/gpt/config/gpt4_base.yaml

By following these steps, you should be able to replicate our results as described in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data/propaganda		data/propaganda
src/text_classification		src/text_classification
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/propaganda

data/propaganda

src/text_classification

src/text_classification

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Large Language Models for Propaganda Detection

About

Releases

Packages

Contributors 2

Languages

sprenkamp/LLM_propaganda_detection

Folders and files

Latest commit

History

Repository files navigation

Large Language Models for Propaganda Detection

About

Resources

Stars

Watchers

Forks

Languages