Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate "scorer's gold standard format" ? #10

Closed
magician-david opened this issue Sep 25, 2018 · 1 comment
Closed

How to generate "scorer's gold standard format" ? #10

magician-david opened this issue Sep 25, 2018 · 1 comment

Comments

@magician-david
Copy link

I can generate the format with the script in scripts/edit_creator.py if there is only one annotator. But what should I do if there are two or more annotated texts?

For example:
Src:
The cat sat at mat .
The dog .

Gold1:
The cat sat on the mat .
The dogs .

Gold2:
The cat sat on a mat .
The dog .

How to get a file like example/source_gold ? Do I have to write a script?

@shamilcm
Copy link
Collaborator

shamilcm commented Sep 26, 2018

You need to generate two separate M2 files using the edit_creator.py script and then combine the M2 files together by marking the annotator id at the end of each annotation line (starting with "A") in example/source_gold (e.g., 0 or 1). However, if any annotator does not have any annotation for a sentence, make sure you add the NOOP (no operation) annotation line for that annotator.

Example:
if for a sentence annotator 0 does not have an annotation, add this line to the anntoation lines.
A -1 -1|||noop||||||-NONE-|||-NONE-|||0

Unfortunately, there are no released scripts that does this. Also, note that generating edits using edit_creator.py is suboptimal compared to edits annotated manually by human annotators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants