bratutils

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.

Installation

Install as a normal package from the source directory.

$ pip install bratutils

Agreement Definition

Agreement in multi-token annotations is commonly evaluated using f-score. due to various problems with computing the traditional Krippendorf's alpha and Cohen's kappa. Hripcsak prove the validity of the metric for very large populations, i.e. for unrestricted text annotations.

This library roughly follows the definitions of precision and recall calculation from the MUC-7 test scoring. The basic definitions along with some additional restrictions are laid out below:

CORRECT - when annotation tags and indices match completely
INCORRECT - when annotation tags do not match, but the indices coincide
PARTIAL - when the annotation tags are the same but one of the annotations has the same end index and a different start index
MISSING - annotations exising only in the gold standard annotation set
SPURIOUS - annotations existing only in the candidate annotation set

Note: the gold standard is considered the collections/document from which the comparison is invoked, while the supplied parallel annotation is considered the candidate set.

Disclaimer: the current definition of the PARTIAL category accomodates working with syntactic chunks. A different arrangement (e.g. pick largest contained tag as partial match instead of rightmost) might be more suitable for other tasks, for example some types of semantic annotation.

Examples

Simple example:

from bratutils import agreement as a

doc = a.Document('res/samples/A/data-sample-1.ann')
doc2 = a.Document('res/samples/B/data-sample-1.ann')

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

print(statistics)

Output:

-------------------MUC-Table--------------------
------------------------------------------------
pos:135
act:134
cor:115
par:5
inc:4
mis:11
spu:10
------------------------------------------------
pre:0.858208955224
rec:0.851851851852
fsc:0.855018587361
------------------------------------------------
und:0.0814814814815
ovg:0.0746268656716
sub:0.0725806451613
------------------------------------------------
bor:119
ibo:15
------------------------------------------------
------------------------------------------------

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.circleci		.circleci
examples		examples
res		res
src		src
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md
setup.py		setup.py
test_requirements.txt		test_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.circleci

.circleci

examples

examples

res

res

src

src

.coveragerc

.coveragerc

.gitignore

.gitignore

LICENCE

LICENCE

Makefile

Makefile

README.md

README.md

setup.py

setup.py

test_requirements.txt

test_requirements.txt

Repository files navigation

bratutils

Installation

Agreement Definition

Examples

About

Releases

Packages

Languages

License

Evpok/bratutils

Folders and files

Latest commit

History

Repository files navigation

bratutils

Installation

Agreement Definition

Examples

About

Resources

License

Stars

Watchers

Forks

Languages