TweBot

In recent years, online social media platforms have become an indispensable part of our daily routines. These online public spaces are getting largely occupied by suspicious and manipulative social media bots, governed by software but disguising as human users. The problem of detecting bots has strong implications. For example, bots have been used to sway political elections by distorting online discourse or to manipulate the stock market.

In this project we tackle the problem, by focusing on data. Our goal was to understand what kind of information is needed to accurately discriminate bot from humans on the Twitter social media platform. To do this we decided to feed our models more and more data from multiple sources to discover what piece of information works best for this task.

We used (nearly) the same model at all steps, with increasingly more data to learn from:

the text from only one tweet per user
the text from one tweet + custom metadata features on that tweet
the text from multiple tweets + custom statistical features on the aggregated tweets
all the above + custom features from the user account information

We decided to start from simple text of a single tweet; in fact, given the same pool of users, a tweet-based bot detection approach would have significantly more labeled examples to exploit at training time, and would be much faster and flexible when actually deployed. However, prior results in bot detection suggested that tweet text alone is not highly predictive of bot accounts and many works on account-level classification have found that user metadata tends to be the best predictor for bot detection. For this reason we tackle both account-level bot detection and tweet-level bot detection, with a custom DL model based on LSTM. Moreover, given the tabular nature of the metadata features, and with a view to achieve better model efficiency and interpretability, we also decided to test the performance of a Random Forest classifier on the detection task.

All the experiments are conducted on the TwiBot-20 dataset.

We demonstrate that, from just one single tweet, is still possible to discriminate bots from humans, with an acceptable level of accuracy. At the same time, the results clearly show that larger amount of information is needed to achieve the best performance.

Results

Model	Accuracy	Recall	F1-score
SingleTweet	0.655	0.682	0.674
SingleTweetAndMetadata	0.656	0.682	0.676
MultiTweetAndMetadata	0.702	0.708	0.705
TweetAndAccount	0.733	0.729	0.730
Random Forest	0.762	0.886	0.801

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
data		data
models		models
pres		pres
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TweBot

Results

About

Releases

Packages

Contributors 2

Languages

riccardodm97/twebot

Folders and files

Latest commit

History

Repository files navigation

TweBot

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages