Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow separate annotation space for TokenRegex #323

Closed
wants to merge 1 commit into from

Conversation

plandes
Copy link
Contributor

@plandes plandes commented Dec 19, 2016

This change allows for a separate annotation space for TokenRegex to avoid clobbering from the statistical NER annotations. For example:

https://github.com/stanfordnlp/CoreNLP/blob/master/doc/tokensregex/examples/color.rules.txt#L8-L10

lists the default annotations used by the TokenRegex system. However, I'd like to use these annotations instead:

https://github.com/plandes/clj-nlp-parse/blob/master/test-resources/token-regex.txt#L35-L37

Let me know if there are any changes you'd like me to make.

This contribution is public domain.

@plandes
Copy link
Contributor Author

plandes commented Feb 2, 2017

I have added the legal disclaimer. Will someone please get back to me on whether this change will make it in and if not what I need to do to add it. I currently depend on it in another open source project I'm working on.

Thank you very much.

@J38
Copy link
Contributor

J38 commented Mar 11, 2017

I don't see the need to do this. You can create your own custom keys and put them into a CoreMap.

Here is some example code:

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.CoreAnnotation;

public class MyCoreAnnotations {

  public static class MyNamedEntityTagAnnotation implements CoreAnnotation<String> {
    @Override
    public Class<String> getType() {
      return String.class;
    }
  }

}

The corresponding rules file:

ner = { type: "CLASS", value: "edu.stanford.nlp.examples.MyCoreAnnotations$MyNamedEntityTagAnnotation" }

@J38 J38 closed this Mar 11, 2017
@plandes
Copy link
Contributor Author

plandes commented Mar 15, 2017

Sorry, I beg to differ. I first tried the method you specified. However, mentioned were being clobbered by the statistical NER tagger if it was later in the pipeline or the custom annotations were clobbering the statistical NER tagger's annotations. This is why not only the class annotations need to be in its own namespace, but the mentions created annotations must be in their own name space as well.

This change is necessary and others' will eventually request the same change as you're missing functionality otherwise.

Thank you.

@J38
Copy link
Contributor

J38 commented Mar 15, 2017

Ok I am sorry I misunderstood. You are saying you want a separate namespace for entity mentions as well. I will review this and fix this issue thanks!

@J38 J38 reopened this Mar 15, 2017
@plandes
Copy link
Contributor Author

plandes commented Mar 15, 2017

@J38 Yes, correct and I'm sorry I didn't make this more clear initially. This is the project I need this for and provides an example (if needed):

https://github.com/plandes/clj-nlp-parse

@J38
Copy link
Contributor

J38 commented Mar 30, 2017

Ok I added this to master, but I made some changes. Our convention is Annotators are customized through the properties they are passed, so the way this should be done is:

entitymentions.nerCoreAnnotation = edu.stanford.nlp.MyNERCoreAnnotation
entitymentions.nerNormalizedCoreAnnotation  = edu.stanford.nlp.MyNormalizedNERCoreAnnotation
entitymentions.mentionsCoreAnnotation = edu.stanford.nlp.MyMentionsCoreAnnotation

If you want to use this in addition to the standard entitymentions annotator, you will have to make a custom my.entitymentions annotator and set it to use your custom keys. If you are only building entities in your custom namespace, you can just set the custom keys.

There is also an issue that pops up with this, which is that the TokensRegexAnnotator doesn't declare the custom keys it uses for Annotations in the requirementsSatisfied() method , so there is no way for the customized EntityMentionsAnnotator to know what it requires. So I just removed the ner requirement from requires( ) in the custom key case. I'll look into upgrading TokensRegexNERAnnotator to say what keys it creates.

If you have any further feedback on this please let me know! Thanks!

@J38 J38 closed this Mar 30, 2017
@plandes
Copy link
Contributor Author

plandes commented Mar 31, 2017

Regarding configuring with properties; yes this makes sense and thanks for changing that.

As for annotations requirements, yes I agree. I'm not sure what to do either in this situation. Regardless, this change does provide additional needed (at least for me) functionality.

Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants