Pool-based active learning for crowdsourcing word-sense disambiguation tasks

Word-sense disambiguation task is a task to resolve ambiguity: find out which of the possible meanings the phrase has in a particular context. An example of disambiguation task:

Its use should be postponed in patients with Sardinella siccus affecting the stomach or gut.

Does Sardinella siccus in this text mean a type of disorder or a living being?

There are 190 000 cases of ambiguous terms produced by automated text annotation tool. The goal is to resolve all of them. To train a classifier to perform such tasks labeled data is needed. A project is conducted at Computational Linguistics Lab of UZH to use crowdsourcing: Amazon Mechanical Turk workers are asked to solve such tasks:

As of now, tasks are being randomly picked from a pool of 190 000 ambiguous cases. Each of them is solved by at least 3 different workers. The goal of the project would be to implement active learning:

Have a classifier to predict phrase meaning from context (solve disambiguation tasks)
Request MTurk workers to solve tasks which are the most informative for training the classifier

Data

Unlabeled data: ~195 000 disambiguation tasks

Labeled data:

821 answers to 255 tasks (taken out of these 195 000) by MTurk workers. More answers can be easily retrieved if needed.
Up to 16 million non-ambiguous annotations, which can be viewed as tasks with known answers to train the initial classifier

Resources

Applying active learning to supervised word sense disambiguation in MEDLINE. Chen et al., 2012
Active Learning with Amazon Mechanical Turk. Laws et at., EMNLP, 2011 (link)
Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. Golovin and Krause, 2011
Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization. Chen and Krause, 2013

Results

See final report.

Log of the results can be viewed here.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
prototypes		prototypes
results		results
.gitignore		.gitignore
README.md		README.md
accuracy_gain.py		accuracy_gain.py
active_passive_test.py		active_passive_test.py
compare_vectorizers.py		compare_vectorizers.py
cv.py		cv.py
data.py		data.py
disambiguate_annotated.py		disambiguate_annotated.py
expert_classifer_agreement.py		expert_classifer_agreement.py
learning_curve.py		learning_curve.py
models.py		models.py
mturk_classifier_agreement.py		mturk_classifier_agreement.py
mturk_classifier_agreement_labels.py		mturk_classifier_agreement_labels.py
plot_curves.py		plot_curves.py
results.org		results.org
semgrp_disambig_on_GSC_eval.py		semgrp_disambig_on_GSC_eval.py
statistics.py		statistics.py
train_and_annotate.py		train_and_annotate.py
train_and_serialize.py		train_and_serialize.py
transfer.py		transfer.py
transfer_active.md		transfer_active.md
transfer_active_svm.md		transfer_active_svm.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pool-based active learning for crowdsourcing word-sense disambiguation tasks

Data

Resources

Results

About

Releases

Packages

Contributors 4

Languages

martinthenext/eth_ml

Folders and files

Latest commit

History

Repository files navigation

Pool-based active learning for crowdsourcing word-sense disambiguation tasks

Data

Resources

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages