Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(407) #6

Open
feider opened this issue Apr 17, 2021 · 2 comments

Comments

@feider
Copy link

feider commented Apr 17, 2021

Hi,

my issue may be related to issue #4

I call the tiling script with
sh topictiling.sh -tmd ../topicmodel -s -tmn model-final -fp "Wigalois.txt" -fd ../../data/pdf/ascii/ -out results -d

The LDA model (generated with jgibblda) and the file seem to be read correctly and the file is printed out using the -d option. I use -s for simple segmentation, but also tried adding a . at the end of every line instead. The text is in Middle High German, but converted to ASCII characters. Here is an example:

wer hat mich guoter uf getan
si ez iemen der mich kan
beidiu lesen und versten
der sol genade an mir begen
ob iht wandels an mir si
daz er mich doch laze vri
valscher rede daz eret in
ich weiz wol daz ich niene bin
geliutert und gerihtet
noch so wol getihtet

The full error output is here:

pr 17, 2021 3:17:44 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(407)
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.  
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
	at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:223)
	at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:143)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.<init>(RunTopicTilingOnFile.java:133)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.main(RunTopicTilingOnFile.java:94)
Caused by: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 0
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
	at java.base/java.util.Objects.checkIndex(Objects.java:359)
	at java.base/java.util.ArrayList.get(ArrayList.java:427)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.annotator.TopicTilingSegmenterAnnotator.annotateSegments(TopicTilingSegmenterAnnotator.java:231)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.annotator.TopicTilingSegmenterAnnotator.process(TopicTilingSegmenterAnnotator.java:142)
	at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375)
	... 6 more

Exception in thread "main" org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
	at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:223)
	at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:143)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.<init>(RunTopicTilingOnFile.java:133)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.main(RunTopicTilingOnFile.java:94)
Caused by: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 0
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
	at java.base/java.util.Objects.checkIndex(Objects.java:359)
	at java.base/java.util.ArrayList.get(ArrayList.java:427)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.annotator.TopicTilingSegmenterAnnotator.annotateSegments(TopicTilingSegmenterAnnotator.java:231)
	at de.tudarmstadt.langtech.semantics.segmentation.segmenter.annotator.TopicTilingSegmenterAnnotator.process(TopicTilingSegmenterAnnotator.java:142)
	at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375)
	... 6 more

I tried openjdk-15, openjdk-7 and the current oracle jre.
Is there anything I'm doing wrong or anything different that I can try?

Kind regards

@patsab
Copy link

patsab commented Apr 19, 2021

I think the each line is interpreted as a new document. With adding a "." to the end of the line, there are still no multiple sentences per document.
Can you try to add all the different lines into a single line, then add a "." between them ?
So the line would look like:
wer hat mich guoter uf getan . si ez iemen der mich kan . beidiu lesen und versten . ...

@feider
Copy link
Author

feider commented Apr 19, 2021

Thank you for the suggestion, but that did not help. This way it just reordered the sentences/lines, similar to each verse on a single line with the -s option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants