Skip to content

Content based classifying the trolls on PTT, which is the largest forum in Taiwan.

Notifications You must be signed in to change notification settings

joeychang0204/PTT-Trolls-Terminator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PTT-Trolls-Terminator

We try to classify the trolls on PTT, which is the most popular forum in Taiwan.
Technical report
data

Data Collection

Flow chart of building ground truth Since there's no existing data, we made the ground truth by ourselves, including labeling articles and users.
We first crawled all the articles and comments in the period of one week before the mayoral election in 2018.
From this data, we manually classify the popular articles as political-related or not.
We also manually labeled the users who made a lot of comments as trolls or not (according to their comments) as our ground truth.
Users who with obvious political tendency while attacking others or urging others to vote for somebody would be classified as trolls.
As a result, 835 (about one-third) of users are classified as trolls while the others are normal users. And among the 3020 popular articles, 2200 of them are political articles.

Methods

  • Comment Length Analysis
  • Used IP Number Analysis
  • Content-based SVM classifier

The architecture of PTT Trolls Terminator

Result

Final performance

About

Content based classifying the trolls on PTT, which is the largest forum in Taiwan.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages