-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate IDs in standoff output #9
Comments
Currently, entity IDs are separately managed for each semantic type. (just a C++ map container :-) The output above shows that you used the "standoff" option, not the "brat" option for output. |
I think unique IDs would be a benefit for all output options. Miwa-san is currently planning to use NERsuite in an extraction pipeline using the "standoff" output format and would hope to be able to avoid duplicate IDs without running a separate script, if possible. |
While I am trying to add this functionality today, I found that Sampo added this already. |
Oh, sorry about my mistake. I am now working on this. |
Now the brat output option (-o brat) generates unique IDs for all entities regardless of their semantic types. It also counts the IDs in document level, whereas other options (-o conll, -o standoff) still use IDs in sentence level. |
@priancho : thanks, but I think this issue actually applies to the
|
Hi, sorry for my mistake. |
Great, thanks! S On Mon, Jun 25, 2012 at 7:29 PM, Han-Cheol Cho <
|
When run with the
-o standoff
option, NERsuite output contains duplicate IDs (within a single input document). For example (for an AnEM model):Entity IDs should preferably be unique for each input document.
The text was updated successfully, but these errors were encountered: