Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'TokenStream contract violation: close() call missing' error for .jar files #2030

Closed
ahornace opened this issue Mar 5, 2018 · 26 comments
Closed
Labels
Milestone

Comments

@ahornace
Copy link
Contributor

ahornace commented Mar 5, 2018

Test output:

Mar 05, 2018 12:28:35 PM org.opensolaris.opengrok.index.Indexer prepareIndexer
INFO: Scanning for repositories...
Mar 05, 2018 12:28:35 PM org.opensolaris.opengrok.index.Indexer prepareIndexer
INFO: Done scanning for repositories (0s)
Mar 05, 2018 12:28:35 PM org.opensolaris.opengrok.index.Indexer doIndexerExecution
INFO: Starting indexing
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.index.IndexDatabase lambda$indexParallel$1
WARNING: ERROR addFile(): /private/var/folders/b7/t9s65dpx1blcklnlfvy1__340000gn/T/source14347768915021023503opengrok/java/Main.class
java.lang.IllegalStateException: TokenStream contract violation: close() call missing
	at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
	at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:412)
	at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:202)
	at org.apache.lucene.document.Field.tokenStream(Field.java:505)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:730)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:240)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:496)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1729)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1464)
	at org.opensolaris.opengrok.index.IndexDatabase.addFile(IndexDatabase.java:719)
	at org.opensolaris.opengrok.index.IndexDatabase.lambda$indexParallel$1(IndexDatabase.java:1018)
	at java.base/java.util.stream.Collectors.lambda$groupingByConcurrent$55(Collectors.java:1245)
	at java.base/java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:508)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1494)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:747)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:283)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1603)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.index.IndexDatabase addFile
INFO: Skipped file '/bug3430/truncated' because the analyzer didn't understand it.
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.analysis.Ctags$1 run
WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
Mar 05, 2018 12:28:36 PM org.opensolaris.opengrok.index.IndexDatabase indexParallel
WARNING: 1 failures (1.6%) while parallel-indexing
Mar 05, 2018 12:28:37 PM org.opensolaris.opengrok.index.IndexDatabase optimize
INFO: Optimizing the index ... 
Mar 05, 2018 12:28:37 PM org.opensolaris.opengrok.index.IndexDatabase optimize
INFO: done
Mar 05, 2018 12:28:37 PM org.opensolaris.opengrok.util.Statistics report
INFO: Done indexing data of all repositories (took 1.804 seconds)

java.lang.AssertionError: Search for main~ in testdata sources 
Expected :9
Actual   :8
 <Click to see difference>


	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.opensolaris.opengrok.search.SearchTest.testSearch(SearchTest.java:155)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

Notice: sometimes the test passes and sometimes it does not -> probably race condition

Setup:

  • macOS High Sierra
  • CPU: Intel Core i7-7700HQ 2.80GHz
@tarzanek
Copy link
Contributor

tarzanek commented Mar 6, 2018

I can try to repro on the same setup ...

@ahornace
Copy link
Contributor Author

I think I should have mentioned that I'm using Java 9.

I was looking into it but was not able to produce a solution :/

This answer seems to be helpful: https://stackoverflow.com/questions/39920564/tokenstream-contract-violation-close-call-missing-when-calling-adddocument

What seems to be the problem that we are not using proper workflow as mentioned here:
https://lucene.apache.org/core/7_2_1/core/org/apache/lucene/analysis/TokenStream.html (This can also be deduced from the exception message which is thrown very occasionally: "TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.")

The problem is related to the FileAnalyzer#createComponents method and only in the case when it takes field REFS as a parameter. This method is called twice in the test and it returns new TokenStreamComponents(symbolTokenizer); upon which the Analyzer class calls setReader method. If this setReader is called before the first one was closed then exception is thrown. I tried creating a copy of symbolTokenizer in FileAnalyzer#createComponents but it did not work (different exception was thrown).

Hope this helps at least a little.

@tarzanek
Copy link
Contributor

this is good info
it means that the test should be fixed ...
honestly a lot of tests might be a bit dated (and the lucene compatibility tests are not even turned on)

@vladak
Copy link
Member

vladak commented Apr 13, 2018

I have just hit this on Travis Linux build too: https://travis-ci.org/vladak/OpenGrok/jobs/366215912 (Oracle JDK 8) however in different place:

[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.483 s <<< FAILURE! - in org.opensolaris.opengrok.search.SearchEngineTest
[ERROR] testSearch(org.opensolaris.opengrok.search.SearchEngineTest)  Time elapsed: 0.371 s  <<< FAILURE!
java.lang.AssertionError: expected:<8> but was:<7>
	at org.opensolaris.opengrok.search.SearchEngineTest.testSearch(SearchEngineTest.java:276)

@vladak
Copy link
Member

vladak commented Apr 16, 2018

Given how often this now fails in Travis I think this is a stopper candidate.

@tarzanek
Copy link
Contributor

tarzanek commented Apr 16, 2018

@vladak that test is dumb from the point of view, that when it will not add to index one of the files in repo, the count will just be less ...
the test itself should be split to check analyzer after analyzer and not give a "summary" count.
Since then you would be able to see immediately what analyser is broken (and from above it seems like bcel - so jar/class analyser again)

@vladak
Copy link
Member

vladak commented Apr 19, 2018

So maybe we can disable/remove the test now and replace it (or that code behind) with something better later.

@vladak
Copy link
Member

vladak commented Jun 27, 2018

commented out the failing test piece in 496029d and 2f0d40e and ca20911

@vladak
Copy link
Member

vladak commented Jul 12, 2018

For some reason I've started getting these exceptions (with WARNING level) for some jar files in one project out of many. Currently running 1.1-rc33.

@tulinkry
Copy link
Contributor

tulinkry commented Aug 8, 2018

I hit these exceptions only for jar files when I tried to implement the parallel indexing (which was then introduced by someone else), so these are connected to the parallel execution. Unfortunately I wasn't able to solve it at that time.

@vladak
Copy link
Member

vladak commented Aug 8, 2018

Okay, this means this is not too serious. Waiving for now. That said, this should be addressed rather sooner than later since it creates test noise.

@vladak vladak removed the stopper label Aug 8, 2018
@vladak vladak added this to the 1.2 milestone Aug 8, 2018
@vladak vladak changed the title Failing testSearch test TokenStream contract violation: close() call missing May 21, 2019
@vladak vladak changed the title TokenStream contract violation: close() call missing 'TokenStream contract violation: close() call missing' error for .jar files May 22, 2019
@vladak
Copy link
Member

vladak commented May 22, 2019

Just to confirm some assumptions about this bug I built 1.2.8, took all the .jar files from there and repeatedly indexed from scratch on my i5-5300U based Lenovo laptop with SSD:

$ cd /var/tmp
$ mkdir OG-jartest
$ cd OG-jartest/
$ mkdir src data
$ cd src
$ find ~/opengrok-git-vladak/ -type f -name '*.jar' -exec cp {} . \; # some 86 files
$ cd -
$ rm -rf /var/tmp/OG-jartest/data/
$ java -cp 'distribution/target/dist/*' org.opengrok.indexer.index.Indexer -s /var/tmp/OG-jartest/src -d /var/tmp/OG-jartest/data -c /usr/local/bin/ctags -W /var/tmp/OG-jartest/config.xml 2>&1 | grep ERROR
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/validation-api-1.1.0.Final.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/xmlpull-1.1.3.1.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/chronicle-threads-2.17.0.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/commons-io-2.6.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/jcl-over-slf4j-1.6.6.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/lucene-join-7.7.0.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/lucene-memory-7.7.0.jar
$ rm -rf /var/tmp/OG-jartest/data/
$ java -cp 'distribution/target/dist/*' org.opengrok.indexer.index.Indexer -s /var/tmp/OG-jartest/src -d /var/tmp/OG-jartest/data -c /usr/local/bin/ctags -W /var/tmp/OG-jartest/config.xml 2>&1 | grep ERROR
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/classmate-1.0.0.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/javassist-3.22.0-CR2.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/jaxb-impl-2.3.0.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/jaxb-api-2.3.0.jar
$ rm -rf /var/tmp/OG-jartest/data/
$ java -cp 'distribution/target/dist/*' org.opengrok.indexer.index.Indexer -s /var/tmp/OG-jartest/src -d /var/tmp/OG-jartest/data -c /usr/local/bin/ctags -W /var/tmp/OG-jartest/config.xml 2>&1 | grep ERROR
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/chronicle-map-3.17.0.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/opengrok-1.2.8.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/pax-url-aether-2.4.5.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/plugins-1.2.7.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/chronicle-algorithms-1.16.0.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/main.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/lucene-queries-7.7.0.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/jersey-common-2.27.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/jersey-container-servlet-2.27.jar
WARNING: ERROR addFile(): /var/tmp/OG-jartest/src/jersey-container-servlet-core-2.27.jar

So, couple of data points:

  • happens non-deterministically
  • specific to .jar files

@vladak
Copy link
Member

vladak commented May 22, 2019

There has to be something special about JarAnalyzer and/or JavaClassAnalyzer in the way they work with Lucene Document Fields (the exact nature of the relationship between Lucene TokenStream and Document w.r.t. object reuse and TokenStream workflow still escapes me). Some thoughts for consideration:

  • JarAnalyzer is nested - it invokes JavaClassAnalyzer to get more info about contained classes
    • however it does it in a way so that it lets the JavaClassAnalyzer do the job and then extracts the data out of it, the Document is not manipulated in JavaClassAnalyzer directly in this mode as JarAnalyzer passes JFieldBuilder instance to JavaClassAnalyzer.analyze()
  • JavaClassAnalyzer does not seem to be thread safe

@vladak
Copy link
Member

vladak commented May 22, 2019

To dig further, I stopped the indexer when Tokenizer#setReader() was about to throw the exception:

this = {JFlexTokenizer@2767} "JFlexTokenizer@16af6010 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1"
input = {ReusableStringReader@3014} 
 pos = 0
 size = 12197
 s = "BootstrapMainStarter.java\norg.apache.maven.wrapper\norg.apache.maven.wrapper.BootstrapMainStarter\njava.lang.Object\n<init>\nstart\njava.lang.Exception\nargs\nmavenHome\nmavenJar\ncontextClassLoader\nmainClass\nmainMethod\nfindLauncherJar\nfile\narr$\nlen$\ni$\nmavenHome\nAbstractCommandLineConverter.java\norg.apache.maven.wrapper.cli\norg.apache.maven.wrapper.cli.AbstractCommandLineConverter\njava.lang.Object\norg.apache.maven.wrapper.cli.CommandLineConverter\n<init>\nconvert\norg.apache.maven.wrapper.cli.CommandLineArgumentException\nargs\nparser\nconvert\norg.apache.maven.wrapper.cli.CommandLineArgumentException\nargs\nconvert\norg.apache.maven.wrapper.cli.CommandLineArgumentException\nargs\ntarget\nparser\nnewInstance\nAbstractPropertiesCommandLineConverter.java\norg.apache.maven.wrapper.cli\norg.apache.maven.wrapper.cli.AbstractPropertiesCommandLineConverter\norg.apache.maven.wrapper.cli.AbstractCommandLineConverter\n<init>\ngetPropertyOption\ngetPropertyOptionDetailed\ngetPropertyOptionDescription\nconfigure\nparser\noption\nn"
 lock = {ReusableStringReader@3014} 
 skipBuffer = null
this.input = {ReusableStringReader@2770} 
 pos = 65525
 size = 247751
 s = "Application.java\ncom.sun.research.ws.wadl\ncom.sun.research.ws.wadl.Application\njava.lang.Object\ndoc\ngrammars\nresources\nresourceTypeOrMethodOrRepresentation\nany\n<init>\ngetDoc\ngetGrammars\nsetGrammars\nvalue\ngetResources\ngetResourceTypeOrMethodOrRepresentation\ngetAny\nDoc.java\ncom.sun.research.ws.wadl\ncom.sun.research.ws.wadl.Doc\njava.lang.Object\ncontent\ntitle\nlang\notherAttributes\n<init>\ngetContent\ngetTitle\nsetTitle\nvalue\ngetLang\nsetLang\nvalue\ngetOtherAttributes\nGrammars.java\ncom.sun.research.ws.wadl\ncom.sun.research.ws.wadl.Grammars\njava.lang.Object\ndoc\ninclude\nany\n<init>\ngetDoc\ngetInclude\ngetAny\nHTTPMethods.java\ncom.sun.research.ws.wadl\ncom.sun.research.ws.wadl.HTTPMethods\njava.lang.Enum\nGET\nPOST\nPUT\nHEAD\nDELETE\n$VALUES\nvalues\nvalueOf\nname\n<init>\nvalue\nfromValue\nv\n<clinit>\nInclude.java\ncom.sun.research.ws.wadl\ncom.sun.research.ws.wadl.Include\njava.lang.Object\ndoc\nhref\notherAttributes\n<init>\ngetDoc\ngetHref\nsetHref\nvalue\ngetOtherAttributes\nLink.java\ncom.sun.research.ws.wadl\ncom.sun.research"
 lock = {ReusableStringReader@2770} 
 skipBuffer = null
this.inputPending = {Tokenizer$1@2771} 

The object (instance of JFlexTokenizer) already has a Reader instance set while it tries to override it with another which breaches the TokenStream workflow as specified on http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/analysis/TokenStream.html.

The 2 Reader objects contain refs values and they clearly belong to distinct jar files, in this case to ./maven-wrapper.jar and ./jersey-server-2.27.jar so this is indeed some data sharing problem between threads.

For the record the thread stacks at this point were:

"ForkJoinPool-1-worker-1@1926" daemon prio=5 tid=0x11 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:222)
	  at org.apache.lucene.index.FreqProxTermsWriterPerField.writeOffsets(FreqProxTermsWriterPerField.java:101)
	  at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:186)
	  at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
	  at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:843)
	  at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
	  at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
	  at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
	  at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
	  at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616)
	  at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1235)
	  at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:780)
	  at org.opengrok.indexer.index.IndexDatabase.lambda$null$1(IndexDatabase.java:1193)
	  at org.opengrok.indexer.index.IndexDatabase$$Lambda$88.360647414.apply(Unknown Source:-1)
	  at java.util.stream.Collectors.lambda$groupingByConcurrent$51(Collectors.java:1070)
	  at java.util.stream.Collectors$$Lambda$94.396810238.accept(Unknown Source:-1)
	  at java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:496)
	  at java.util.stream.ReferencePipeline$$Lambda$95.1043543144.accept(Unknown Source:-1)
	  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	  at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	  at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	  at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	  at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	  at java.util.concurrent.ForkJoinPool.helpComplete(ForkJoinPool.java:1870)
	  at java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:2045)
	  at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:404)
	  at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
	  at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
	  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
	  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	  at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	  at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)
	  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:496)
	  at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$2(IndexDatabase.java:1182)
	  at org.opengrok.indexer.index.IndexDatabase$$Lambda$87.1427630844.call(Unknown Source:-1)
	  at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1424)
	  at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	  at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	  at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	  at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool-1-worker-0@1957" daemon prio=5 tid=0x14 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:222)
	  at org.apache.lucene.index.FreqProxTermsWriterPerField.writeOffsets(FreqProxTermsWriterPerField.java:101)
	  at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:186)
	  at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
	  at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:843)
	  at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
	  at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
	  at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
	  at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
	  at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616)
	  at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1235)
	  at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:780)
	  at org.opengrok.indexer.index.IndexDatabase.lambda$null$1(IndexDatabase.java:1193)
	  at org.opengrok.indexer.index.IndexDatabase$$Lambda$88.360647414.apply(Unknown Source:-1)
	  at java.util.stream.Collectors.lambda$groupingByConcurrent$51(Collectors.java:1070)
	  at java.util.stream.Collectors$$Lambda$94.396810238.accept(Unknown Source:-1)
	  at java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:496)
	  at java.util.stream.ReferencePipeline$$Lambda$95.1043543144.accept(Unknown Source:-1)
	  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	  at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	  at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	  at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	  at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	  at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
	  at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
	  at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	  at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool-1-worker-3@1948" daemon prio=5 tid=0x13 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
	  at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:412)
	  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:202)
	  at org.apache.lucene.document.Field.tokenStream(Field.java:513)
	  at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:787)
	  at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
	  at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
	  at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
	  at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
	  at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616)
	  at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1235)
	  at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:780)
	  at org.opengrok.indexer.index.IndexDatabase.lambda$null$1(IndexDatabase.java:1193)
	  at org.opengrok.indexer.index.IndexDatabase$$Lambda$88.360647414.apply(Unknown Source:-1)
	  at java.util.stream.Collectors.lambda$groupingByConcurrent$51(Collectors.java:1070)
	  at java.util.stream.Collectors$$Lambda$94.396810238.accept(Unknown Source:-1)
	  at java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:496)
	  at java.util.stream.ReferencePipeline$$Lambda$95.1043543144.accept(Unknown Source:-1)
	  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	  at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	  at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	  at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	  at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	  at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
	  at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
	  at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	  at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool-1-worker-2@1947" daemon prio=5 tid=0x12 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.opengrok.indexer.analysis.plain.PlainSymbolTokenizer.yylex(PlainSymbolTokenizer.java:619)
	  at org.opengrok.indexer.analysis.JFlexTokenizer.incrementToken(JFlexTokenizer.java:98)
	  at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:793)
	  at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
	  at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
	  at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
	  at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
	  at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616)
	  at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1235)
	  at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:780)
	  at org.opengrok.indexer.index.IndexDatabase.lambda$null$1(IndexDatabase.java:1193)
	  at org.opengrok.indexer.index.IndexDatabase$$Lambda$88.360647414.apply(Unknown Source:-1)
	  at java.util.stream.Collectors.lambda$groupingByConcurrent$51(Collectors.java:1070)
	  at java.util.stream.Collectors$$Lambda$94.396810238.accept(Unknown Source:-1)
	  at java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:496)
	  at java.util.stream.ReferencePipeline$$Lambda$95.1043543144.accept(Unknown Source:-1)
	  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	  at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	  at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	  at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	  at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	  at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	  at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	  at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"pool-2-thread-1@1584" prio=5 tid=0x10 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.util.concurrent.ForkJoinTask.externalInterruptibleAwaitDone(ForkJoinTask.java:367)
	  at java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1001)
	  at org.opengrok.indexer.index.IndexDatabase.indexParallel(IndexDatabase.java:1229)
	  at org.opengrok.indexer.index.IndexDatabase.update(IndexDatabase.java:496)
	  at org.opengrok.indexer.index.IndexDatabase$1.run(IndexDatabase.java:225)
	  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	  at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	  at java.lang.Thread.run(Thread.java:748)

"pool-3-thread-8@3164" prio=5 tid=0x1c nid=NA waiting
  java.lang.Thread.State: WAITING
	  at sun.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
	  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
	  at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	  at java.lang.Thread.run(Thread.java:748)

"pool-3-thread-7@3161" prio=5 tid=0x1b nid=NA waiting
  java.lang.Thread.State: WAITING
	  at sun.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
	  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
	  at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	  at java.lang.Thread.run(Thread.java:748)

"main@1" prio=5 tid=0x1 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at sun.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
	  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
	  at org.opengrok.indexer.index.Indexer.doIndexerExecution(Indexer.java:1096)
	  at org.opengrok.indexer.index.Indexer.main(Indexer.java:328)

"Finalizer@2320" daemon prio=8 tid=0x3 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
	  at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

"Reference Handler@2321" daemon prio=10 tid=0x2 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	  at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"Signal Dispatcher@2319" daemon prio=9 tid=0x4 nid=NA runnable
  java.lang.Thread.State: RUNNABLE

@cross
Copy link
Contributor

cross commented May 22, 2019

As I noted in #2778 I was seeing this in my tree with 1.2.8. I am running there repeatedly, but like @vladak am not seeing any consistency. I'll report any more traces here, but it sounds like you're far enough along that my traces may not help.
I just had a run of the same source trees with no failures. I'm running across 10s of gigabytes of sources though, so my runs are taking many hours. ;-)

@cross
Copy link
Contributor

cross commented May 30, 2019

Okay. I've done multiple full reindexes, and am not seeing this. Sorry. I saw it a few times, but have had many runs without it happening.

@vladak
Copy link
Member

vladak commented May 31, 2019 via email

@tarzanek
Copy link
Contributor

https://www.apache.org/dist/commons/bcel/RELEASE-NOTES.txt
6.3.1 bcel should be used(they released it recently), bcel isn't that thread safe, afaik they wanted to rewrite it for a long time, changelog seems they just fix bugs ... (fwiw I checked the way we use it and it should be fairly safe, but not 100%, just by pure bcel design)

@ahornace ahornace modified the milestones: 1.2, 1.4 Aug 5, 2019
@Me4791
Copy link

Me4791 commented Feb 7, 2020

Hello Contributors,
We have deployed the OpenGrok 1.3.3 in test environment and we are getting below error:
image
Environment:
Git - 2.17.1
OS -Ubuntu 18.04
Can you please help? Is this known issue? Is the indexing get failed in between or completed?
Is Analyzer part come in workflow after indexing?
Any help would be appreciated as we are close to production deployment date.

  • IS there any way I can skip this analyzer.

Thanks!

@tarzanek
Copy link
Contributor

tarzanek commented Feb 7, 2020

ok, I guess we can update bcel from 6.2 to 6.4.1
but it still looks like the ideal fix is to make this Analyzer sequential somehow

@vladak
Copy link
Member

vladak commented Feb 7, 2020

Can you please help? Is this known issue? Is the indexing get failed in between or completed?
Is Analyzer part come in workflow after indexing?
Any help would be appreciated as we are close to production deployment date.

You found the correct issue so yes, it is known. The consequence of the failure during the indexing is that contents of some .jar files will not be indexed. The analysis is one of the core steps during indexing.

* IS there any way I can skip this analyzer.

You can experiment with the -A and -i options of Indexer, i.e. assign different analyzer to .jar files or ignore the .jar files, respectively.

@vladak
Copy link
Member

vladak commented Feb 7, 2020

ok, I guess we can update bcel from 6.2 to 6.4.1
but it still looks like the ideal fix is to make this Analyzer sequential somehow

My feeling is that this is something deeply baked into how the analyzers work in OpenGrok (in combination with Lucene) so very likely upgrading bcel will not help.

@vladak
Copy link
Member

vladak commented Feb 7, 2020

Actually, using -A .jar:- should do the trick as workaround.

@Me4791
Copy link

Me4791 commented Feb 7, 2020

Thank you so much tarzanek sir and vladak sir for your quick response. I will definitely try the workaround. I will update the results soon here.
Thanks!

@idodeclare
Copy link
Contributor

Actually, using -A .jar:- should do the trick as workaround.

Because JarAnalyzerFactory is enlisted either by file extension (.jar, .war, .ear) or by format magic string (PK\{3}\{4} with 0xCAFE Extra Field ID), unmapping with -A is insufficient. The magic string check would still be active.

Only the -i, --ignore switch would keep files with those extensions from being evaluated by the analyzer-search:

-i '*.jar' -i '*.war' -i '*.ear'

@idodeclare
Copy link
Contributor

@vladak , I think your debugging from 12-May-19 has the clue: ReusableStringReader managed by Lucene is likely the gremlin for OpenGrok. Only JarAnalyzer, JavaClassAnalyzer, and ELFAnalyzer pass a String to new OGKTextField instances. Lucene seems to be using a cached ReusableStringReader in that case.

I'm guessing our creating StringReader wrappers in those analyzers would circumvent that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants