-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow separate annotation space for TokenRegex #323
Conversation
…om the statistical NER annotations
I have added the legal disclaimer. Will someone please get back to me on whether this change will make it in and if not what I need to do to add it. I currently depend on it in another open source project I'm working on. Thank you very much. |
I don't see the need to do this. You can create your own custom keys and put them into a CoreMap. Here is some example code:
The corresponding rules file:
|
Sorry, I beg to differ. I first tried the method you specified. However, mentioned were being clobbered by the statistical NER tagger if it was later in the pipeline or the custom annotations were clobbering the statistical NER tagger's annotations. This is why not only the class annotations need to be in its own namespace, but the mentions created annotations must be in their own name space as well. This change is necessary and others' will eventually request the same change as you're missing functionality otherwise. Thank you. |
Ok I am sorry I misunderstood. You are saying you want a separate namespace for entity mentions as well. I will review this and fix this issue thanks! |
@J38 Yes, correct and I'm sorry I didn't make this more clear initially. This is the project I need this for and provides an example (if needed): |
Ok I added this to master, but I made some changes. Our convention is Annotators are customized through the properties they are passed, so the way this should be done is:
If you want to use this in addition to the standard There is also an issue that pops up with this, which is that the TokensRegexAnnotator doesn't declare the custom keys it uses for Annotations in the requirementsSatisfied() method , so there is no way for the customized EntityMentionsAnnotator to know what it requires. So I just removed the If you have any further feedback on this please let me know! Thanks! |
Regarding configuring with properties; yes this makes sense and thanks for changing that. As for annotations requirements, yes I agree. I'm not sure what to do either in this situation. Regardless, this change does provide additional needed (at least for me) functionality. Thanks again for your help. |
This change allows for a separate annotation space for TokenRegex to avoid clobbering from the statistical NER annotations. For example:
https://github.com/stanfordnlp/CoreNLP/blob/master/doc/tokensregex/examples/color.rules.txt#L8-L10
lists the default annotations used by the TokenRegex system. However, I'd like to use these annotations instead:
https://github.com/plandes/clj-nlp-parse/blob/master/test-resources/token-regex.txt#L35-L37
Let me know if there are any changes you'd like me to make.
This contribution is public domain.