-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding validation and vandalism detection work on Wikipedia #13
Comments
Awesome research into Vandalism detection on Wikipedia @bkowshik . The wiki community have a mature bot policy and encourage focused and effective mechanical editing that has built community curated ecosystem of AI workers that has been highly effective in quickly fixing the most common problems to occur. A large academic community is interested in the mechanics of this and the associated research has further helped to strengthen the defenses To compare, the OSM Automated Edits Policy has not evolved much. Validation is a good angle to have some bots running to catch simple issues like a invalid capitalization in a tag like |
{
"params": {
"balanced_sample": false,
"balanced_sample_weight": true,
"center": true,
"init": null,
"learning_rate": 0.01,
"loss": "deviance",
"max_depth": 7,
"max_features": "log2",
"max_leaf_nodes": null,
"min_samples_leaf": 1,
"min_samples_split": 2,
"min_weight_fraction_leaf": 0.0,
"n_estimators": 700,
"presort": "auto",
"random_state": null,
"scale": true,
"subsample": 1.0,
"verbose": 0,
"warm_start": false
},
"table": {
"false": {
"false": 15563,
"true": 2551
},
"true": {
"false": 457,
"true": 962
}
},
"precision": {
"false": 0.971,
"true": 0.274
},
"trained": 1491356274.077835,
"type": "GradientBoosting",
"version": "0.3.0"
} |
Vandalism detection on OpenStreetMap is similar to vandalism detection on Wikidata, both are structured datasets. With Wikipedia, things are different due to the more free-flow nature of the text. I am curious to see how ORES, a machine learning as a service for Wikimedia projects for vandalism detection and removal worked for Wikidata. The following is what I found. There are 3 main models for Wikidata:
DatasetsIt looks like there are 5,000 samples that are manually labelled and 20,000 samples that are auto-labelled. AttributesLooks like all the 3 kinds of models - reverted, damaging and goodfaith make use of the same set of features. The list of attributes can be found at the link below: A bigger list of attributes can be found at the link below: ModelsModel tuning reports:
Models for both Wikipedia and Wikidata get prepared together with a Properties about the model deployed can be viewed at the link below: This has been super-helpful. No next actions here. Closing. |
NOTE: This is a work in progress. Posting here to start discussion around the topic
Wikimedia uses Artificial Intelligence for the following broad categories:
On Wikipedia there are 160k edits, 50k new articles and 1400 new editors everyday. The goal is to split the 160k edits into:
Themes for validation
Welcoming newcomers
More newcomers is a major Wikimedia goal and new spaces have been developed to support newcomers. Quality control in Wikipedia is being designed with newcomer socialization in mind so that newcomers (especially those who don't conform) are not marginalized and good-faith newcomers are retained. Although anonymous edits on Wikipedia are twice as likely to be vandalism, 90% of anonymous edits are good.
From this Slate article:
Popular validation tools
There are around 20 volunteer developed tools, 3 major Wikimedia product initiatives. Some popular ones are:
Basic web interface for ORES at https://ores.wikimedia.org/ui Some of the features used to aid classification of a revision as problematic or not are: Is user anonymous, number of characters/words added, modified and removed, number of repeated characters and bad words added. Prediction scores for a problematic revision look like below:
https://ores.wmflabs.org/scores/enwiki/damaging/642215410
There has been quite a lot of research in this field evident from the number of results on Google scholar about Wikipedia vandalism detection.
Hyperlinks
Reading
Videos
cc: OpenStreetMap Community
The text was updated successfully, but these errors were encountered: