Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Overview

This repository contains code and data produced for the analysis we carried out for ``Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection''. It is necessary to download the datasets described in the paper to replicate the analysis.

Directory overview

Folder or Filename	Description
`Simplistic Collection and Labeling Practices Limit the Generalizability of Benchmark Datasets for Twitter Bot Detection.ipynb`	Notebook for running experiments and visualizing results.
`gen_tables.ipynb`	Notebook for generating the tables used in the paper.
`data`	Outputs of analysis code.
`data_accessor.py`	Utilities for loading datasets.
`fit_and_score.py`	Utilities for fiting and scoring models.
`preprocess.py`	Utilities for preprocessing data used in `data_accessor`.
`print_table.py`	Utilities for printing latex-ready tables.
`train_on_one_test_on_another.py`	Utilities for training on one dataset and testing on another.
`leave_one_dataset_out.py`	Utilities for experiments training on all but one dataset and leaving one out.

Setup

Install needed packages:

python3 -m pip install -r requirements.txt

Some of the data used in this paper is available in the OSoMe Bot Repository. For other datasets, it will be necessary to contact the authors of the dataset's originating paper.

Run Analysis

Run jupyter notebook:

jupyter notebook

and select Simplistic Collection and Labeling Practices Limit the Generalizability of Benchmark Datasets for Twitter Bot Detection.ipynb for analysis or gen_tables.ipynb to generate the tables in the analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Overview

Directory overview

Setup

Run Analysis

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data		data
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
Simplistic Collection and Labeling Practices Limit the Generalizability of Benchmark Datasets for Twitter Bot Detection.ipynb		Simplistic Collection and Labeling Practices Limit the Generalizability of Benchmark Datasets for Twitter Bot Detection.ipynb
data_accessor.py		data_accessor.py
ensemble.py		ensemble.py
fit_and_score.py		fit_and_score.py
gen_tables.ipynb		gen_tables.ipynb
leave_one_dataset_out.py		leave_one_dataset_out.py
plotting.py		plotting.py
preprocess.py		preprocess.py
print_table.py		print_table.py
requirements.txt		requirements.txt
scores.csv		scores.csv
train_on_one_test_on_another.py		train_on_one_test_on_another.py
unused_data.py		unused_data.py

License

johnchrishays/bot-detection

Folders and files

Latest commit

History

Repository files navigation

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Overview

Directory overview

Setup

Run Analysis

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages