Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android: NER NullPointerException on some models #961

Open
erksch opened this issue Oct 27, 2019 · 8 comments
Open

Android: NER NullPointerException on some models #961

erksch opened this issue Oct 27, 2019 · 8 comments

Comments

@erksch
Copy link

erksch commented Oct 27, 2019

I (somewhat) successfully integrated CoreNLP (3.9.2) in an Android app.
The following annotator configuration works just fine:

props.setProperty("annotators", "tokenize,ssplit,pos,lemma")

But as soon as I add the NER annotator I start to get the following error:

Caused by: java.lang.NullPointerException: Attempt to invoke interface method 'int java.util.List.size()' on a null object reference
        at edu.stanford.nlp.util.HashIndex.size(HashIndex.java:94)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getCliqueTree(CRFClassifier.java:1499)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getSequenceModel(CRFClassifier.java:1190)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getSequenceModel(CRFClassifier.java:1186)
        at edu.stanford.nlp.ie.crf.CRFClassifier.classifyMaxEnt(CRFClassifier.java:1218)
        at edu.stanford.nlp.ie.crf.CRFClassifier.classify(CRFClassifier.java:1128)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifySentence(AbstractSequenceClassifier.java:299)
        at edu.stanford.nlp.ie.ClassifierCombiner.classify(ClassifierCombiner.java:476)
        at edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(NERClassifierCombiner.java:269)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifySentenceWithGlobalInformation(AbstractSequenceClassifier.java:343)
        at edu.stanford.nlp.pipeline.NERCombinerAnnotator.doOneSentence(NERCombinerAnnotator.java:368)
        at edu.stanford.nlp.pipeline.SentenceAnnotator.annotate(SentenceAnnotator.java:102)
        at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(NERCombinerAnnotator.java:310)
        at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:637)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:629)

The code I use (Kotlin):

val props = Properties()
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner")
pipeline = StanfordCoreNLP(props)
val document = CoreDocument("Joe Smith is from Seattle.")
pipeline.annotate(document)

The error is very similar to the one described in this issue where the author tried to use the parser annotator.

Debugging

I debugged the stack trace and found that the error is caused by this line (on classIndex.size()) in CRFClassifier:1480:

return CRFCliqueTree.getCalibratedCliqueTree(data, labelIndices, classIndex.size(), 
  classIndex, flags.backgroundSymbol, getCliquePotentialFunctionForTest(), featureVal);

Meaning classIndex is null and was not initialized properly.

The classIndex property of CRFClassifier is initialized in the loadClassifier(ObjectInputStream ois, Properties props) method:

public void loadClassifier(ObjectInputStream ois, Properties props) {
    Object o = ois.readObject();
    [...]
    classIndex = (Index<String>) ois.readObject();

I found out that the passed ObjectInputStream is effectively a stream on the file from a model path that is determined in the NERCombinerAnnotator constructor:

public NERCombinerAnnotator(Properties properties) throws IOException {
    List<String> models = new ArrayList<>();
    String modelNames = properties.getProperty("ner.model");
    if (modelNames == null) {
      modelNames = DefaultPaths.DEFAULT_NER_THREECLASS_MODEL + ',' + DefaultPaths.DEFAULT_NER_MUC_MODEL + ',' + DefaultPaths.DEFAULT_NER_CONLL_MODEL;
    }
    [...]
    String[] loadPaths = models.toArray(new String[models.size()]);

Those loadPaths are iterated in the loadClassifiers method in ClassifierCombiner:

 private void loadClassifiers(Properties props, List<String> paths) throws IOException {
    baseClassifiers = new ArrayList<>();
    [...]
    for(String path: paths) {
      AbstractSequenceClassifier<IN> cls = loadClassifierFromPath(props, path);
      baseClassifiers.add(cls);
      [...]
    }

By adding a breakpoint to this method I found out that the first model path (DefaultPaths.DEFAULT_NER_THREECLASS_MODEL) in the first iteration of the for-loop is loaded without problems and the classIndex property is set correctly:

Bildschirmfoto vom 2019-10-27 23-39-33

But in the second iteration, when loading from DefaultPaths.DEFAULT_NER_MUC_MODEL, it fails:

Bildschirmfoto vom 2019-10-27 23-43-18

Workaround

My current workaround is to just set the ner model to only the threeclass and conll model:

props.setProperty("ner.model", DefaultPaths.DEFAULT_NER_THREECLASS_MODEL + "," + DefaultPaths.DEFAULT_NER_CONLL_MODEL)

But I actually don't what the consequences are if the MUC model is missing.

Explanation

My theory is that the MUC model is especially large and thus can not be loaded into memory on a mobile device. Is that true? How big is the model in particular?
But when I monitor the memory consumption of the app I don't spot anything critical, the app stays under 512 MB before it crashes.

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Oct 28, 2019 via email

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Oct 28, 2019 via email

@erksch
Copy link
Author

erksch commented Oct 28, 2019

When applying the MUC only I get the same as above.
Debugging the baseClassifier array it looks like this:
Bildschirmfoto vom 2019-10-28 17-41-52
As you can see the same errors as above.

If the model is smaller then maybe it's not due to memory...
Is there something fundamentally different between MUC and THREECLASS, CONLL?
By the way, I use the latest models from the maven repository.

PS:
I now use my custom NER models anyway and it works like a charm without any problems!
Thank you very much for this wonderful library and for enabling me to accomplish offline NLP & NER for Android.

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Oct 28, 2019 via email

@erksch
Copy link
Author

erksch commented Oct 29, 2019

@AngledLuffa
Thanks for the hint with the license. I am not a licensee (yet) but will get in contact once everything works as expected.

But remember that due to the required minSDKVersion of 26 (8% of devices) using CoreNLP for B2C Android apps is not really an option. Maybe if this would work for more devices you would license more software hint hint ;)

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Oct 31, 2019 via email

@J38
Copy link
Contributor

J38 commented Oct 31, 2019

@erksch congratulations on being the first person I've seen in 4+ years to report running NER locally on an Android phone!

@AngledLuffa
Copy link
Contributor

As an update, CoreNLP 4.0.0 uses less memory for NER than previous versions, and there have been even more optimizations in the master branch since the 4.0.0 release. Do you have any interest in retrying the MUC model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants