testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

vladak · 2019-02-27T15:53:32Z

[ERROR]   SuggesterControllerTest.testComplexSuggestions2:407 
Expected: iterable over ["main", "mutablelistof"] in any order
     but: No item matches: "main", "mutablelistof" in []

The text was updated successfully, but these errors were encountered:

vladak · 2019-02-27T15:53:54Z

like in https://travis-ci.org/oracle/opengrok/jobs/499311368 but I have also seen the failures in AppVeyor builds.

ahornace · 2019-02-27T16:01:04Z

~~From my experience it fails because of this race condition which is bugging us for a long time: #2030~~

The job you provided is still running and I see no logs :/

vladak · 2019-02-27T16:14:44Z

Sorry, just restarted the build because this was a release build. Dne st 27. 2. 2019 17:01 uživatel Adam Hornáček <notifications@github.com> napsal:

…

From my experience it fails because of this race condition which is bugging us for a long time: #2030 <#2030> The link you provided is still running and I see no logs :/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2704 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACzGDGslvbm4dxSosDInUKTR1Lt2TtaTks5vRqvCgaJpZM4bU2fm> .

vladak · 2020-01-20T11:38:01Z

Getting more failure in SuggesterControllerTest on Windows:

testGetPopularityDataDifferentField: https://travis-ci.org/oracle/opengrok/jobs/637835530
testGetPopularityDataAll: https://travis-ci.org/oracle/opengrok/jobs/639408758

vladak · 2020-01-20T11:44:07Z

The before method of the test has timeout set to 15 seconds:

          await().atMost(15, TimeUnit.SECONDS).until(() ->
                  getSuggesterProjectDataSize() == env.getProjectList().size());

which makes me wonder whether this is enough and how/if Awaitility provides any feedback/log if the timeout expired and the condition is not met.

vladak · 2020-01-20T11:55:15Z

Also, for both the failed cases listed above the setup of the test reported:

Jan 16, 2020 9:09:10 AM org.opengrok.suggest.SuggesterProjectData initSearchCountMap
SEVERE: Could not create ChronicleMap, most popular completion disabled, if you are using JDK9+ make sure to specify: --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED
java.lang.IllegalArgumentException: Average key size must be a positive, finite number
	at net.openhft.chronicle.map.ChronicleMapBuilder.checkAverageSize(ChronicleMapBuilder.java:314)
	at net.openhft.chronicle.map.ChronicleMapBuilder.averageKeySize(ChronicleMapBuilder.java:531)
	at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.<init>(ChronicleMapAdapter.java:52)
	at org.opengrok.suggest.SuggesterProjectData.initSearchCountMap(SuggesterProjectData.java:300)
	at org.opengrok.suggest.SuggesterProjectData.init(SuggesterProjectData.java:158)
	at org.opengrok.suggest.Suggester.lambda$getInitRunnable$1(Suggester.java:185)
	at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

For completeness, this was on Windows with OpenJDK 8.

vladak · 2020-01-20T11:59:57Z

w.r.t. suggester timeouts: both failed cases listed above had:

INFO: Suggester successfully initialized (took 4 seconds)

so it seems the 15 seconds is enough.

vladak · 2020-01-20T12:07:20Z

The question is whether the failure to initialize Chronicle map can lead to the observed test case failures, i.e. if there is a implicit dependency. Given that the failed test cases do test popularity, this seems plausible.

vladak · 2020-01-23T10:16:32Z

added a catch for IllegalArgumentException in initSearchCountMap() and for the failed case it reported the value of conf.getAverageKeySize() like so:

SEVERE: Could not create ChronicleMap due to invalid key size: NaN

vladak · 2020-03-09T10:04:52Z

Added a "debug" print to save() and it verifies @ahornace's hypothesis that the data in the map.cfg file already contains the NaN:

Mar 06, 2020 4:02:37 PM org.opengrok.suggest.popular.impl.chronicle.ChronicleMapConfiguration save
INFO: Saving chronicle map configuration in directory C:\Users\travis\AppData\Local\Temp\data9188628379176297630\suggester\json for field 'refs': entries: 0 , average key size: NaN
Mar 06, 2020 4:02:37 PM org.opengrok.suggest.SuggesterProjectData initSearchCountMap
SEVERE: Could not create ChronicleMap from directory C:\Users\travis\AppData\Local\Temp\data9188628379176297630\suggester\json for field refs due to invalid key size: entries: 0 , average key size: NaN
java.lang.IllegalArgumentException: Average key size must be a positive, finite number
	at net.openhft.chronicle.map.ChronicleMapBuilder.checkAverageSize(ChronicleMapBuilder.java:314)
	at net.openhft.chronicle.map.ChronicleMapBuilder.averageKeySize(ChronicleMapBuilder.java:531)
	at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.<init>(ChronicleMapAdapter.java:52)
	at org.opengrok.suggest.SuggesterProjectData.initSearchCountMap(SuggesterProjectData.java:300)
	at org.opengrok.suggest.SuggesterProjectData.init(SuggesterProjectData.java:158)
	at org.opengrok.suggest.Suggester.lambda$getInitRunnable$1(Suggester.java:185)
	at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

vladak · 2020-03-19T12:44:26Z

Both numbers on the right side of the expression here

opengrok/suggester/src/main/java/org/opengrok/suggest/SuggesterProjectData.java

Line 264 in 1a0db5e

    
           double averageLength = (double) iterator.termLengthAccumulator / lookup.getCount();

are 0 for the failed case, thus giving NaN for the double value (that will be represented as Double in the averageLengths map).

vladak · 2020-03-19T13:04:42Z

The workaround could be not to put the entry to the averageLengths map if the numeric value is NaN. The getAverageLength() used in initSearchCountMap() will return default (non zero) value in such case.

vladak · 2020-03-23T08:21:42Z

Interestingly the setup/tests seems to fail only when failing to build average lengths map for field refs for the testdata/sources/json directory.

vladak · 2020-03-23T09:18:14Z

Actually, the ChronicleMapAdapter() init actually fails on Linux builds too, it just does not lead to test failure. Either the init failure is really just a red herring or is interpreted differently on Windows builds. Anyhow, this should be fixed as suggester above - if there is no index data for particular field it should be ignored w.r.t. averge key lengths.

vladak · 2020-03-23T15:14:32Z

The popularity tests fail because the term count cannot be updated using SuggesterServiceImpl.getInstance().increaseSearchCount() due to rebuild being in progress.

vladak · 2020-03-23T22:11:22Z

It seems that the trouble is that SuggesterControllerTest#before() waits merely for suggester to be initialized (Suggester#projectData to be filled) while lots of the test cases are sensitive to suggester rebuild (rather than init).

In one of the failed cases the log goes like this:

Mar 23, 2020 4:11:25 PM org.opengrok.suggest.Suggester shutdownAndAwaitTermination
INFO: Suggester successfully initialized (took 5 seconds)
Mar 23, 2020 4:11:25 PM org.opengrok.web.api.v1.suggester.provider.service.impl.SuggesterServiceImpl scheduleRebuild
INFO: Suggester rebuild not scheduled
Mar 23, 2020 4:11:26 PM org.opengrok.suggest.Suggester rebuild
INFO: Rebuilding the following suggesters: [bug16465, archives, data, scala, clojure, document, pascal, bug19195, issue10, sql, rust, csharp, java, fortran, json, perl, tcl, swift, haskell, python, c, kotlin, lisp, rfe2575, bug16070, javascript, script, bug3658, php, scopes, bug14924, bug3430]
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:60342]
Mar 23, 2020 4:11:26 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:62271/
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:62271]
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-1] Started.
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:62271]
Mar 23, 2020 4:11:27 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:57989/
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:57989]
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-2] Started.
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:57989]
Mar 23, 2020 4:11:27 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:56937/
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:56937]
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-3] Started.
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:56937]
Mar 23, 2020 4:11:28 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:54678/
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:54678]
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-4] Started.
[ERROR] Tests run: 37, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 77.04 s <<< FAILURE! - in org.opengrok.web.api.v1.controller.SuggesterControllerTest
[ERROR] testGetSuggestionsSimplePath  Time elapsed: 0.297 s  <<< FAILURE!
java.lang.AssertionError: 
Expected: iterable containing ["header"]
     but: No item matched: "header"
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.opengrok.web.api.v1.controller.SuggesterControllerTest.testGetSuggestionsSimplePath(SuggesterControllerTest.java:220)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.ja
        ...

... lots of messages about Grizzly HTTP server start and shutdown


Mar 23, 2020 4:11:35 PM org.opengrok.suggest.Suggester shutdownAndAwaitTermination
INFO: Suggesters for [bug16465, archives, data, scala, clojure, document, pascal, bug19195, issue10, sql, rust, csharp, java, fortran, json, perl, tcl, swift, haskell, python, c, kotlin, lisp, rfe2575, bug16070, javascript, script, bug3658, php, scopes, bug14924, bug3430] were successfully rebuilt (took 9 seconds)

We can see this sequence of events:

suggester was initialized
suggester rebuild started
test cases running (judging by the Grizzly messages about starting server)
suggester rebuild finished

vladak · 2020-03-24T07:45:36Z

When TimeUnit.SECONDS.sleep(15); was added to @BeforeClass the failures can no longer be seen.

vladak · 2020-03-24T15:10:51Z

I found out that SuggesterControllerTest enforces the order of test case execution by using @FixMethodOrder(MethodSorters.NAME_ASCENDING). There are 2 test cases that trigger scheduler rebuild - ZtestRebuild (rebuilds all projects, takes bunch of seconds), ZtestRebuildProject (rebuilds just the c project, nearly instantaneous).

When the SuggesterControllerTest is run in Travis or in IDEA locally on my Linux workstation (with en_US.UTF-8 locale) the Z tests actually get run first. I commented out the Z tests and the tests still pass. So, not sure why this ordering is in place, @ahornace ?

Anyhow, either ordering is not good - if the Z tests get run first they might interfere with the tests in SuggesterControllerTest (as demonstrated in this issue), if they get to run last they might interfere with tests in another test class since the suggester rebuild thread is asynchronous.

vladak added bug suggester labels Feb 27, 2019

vladak changed the title ~~testComplexSuggestions2 sometimes fails~~ testComplexSuggestions2, testGetPopularityDataAll sometimes fails Feb 12, 2020

vladak mentioned this issue Feb 12, 2020

Fix #2030 : avoid ReusableStringReader for some analyzers #3038

Closed

vladak mentioned this issue Mar 23, 2020

make suggester popularity tests wait on rebuild #3086

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

vladak commented Feb 27, 2019

vladak commented Feb 27, 2019

ahornace commented Feb 27, 2019 •

edited

Loading

vladak commented Feb 27, 2019 via email

vladak commented Jan 20, 2020

vladak commented Jan 20, 2020

vladak commented Jan 20, 2020 •

edited

Loading

vladak commented Jan 20, 2020

vladak commented Jan 20, 2020

vladak commented Jan 23, 2020

vladak commented Mar 9, 2020

vladak commented Mar 19, 2020

vladak commented Mar 19, 2020

vladak commented Mar 23, 2020

vladak commented Mar 23, 2020

vladak commented Mar 23, 2020

vladak commented Mar 23, 2020

vladak commented Mar 24, 2020

vladak commented Mar 24, 2020

testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

Comments

vladak commented Feb 27, 2019

vladak commented Feb 27, 2019

ahornace commented Feb 27, 2019 • edited Loading

vladak commented Feb 27, 2019 via email

vladak commented Jan 20, 2020

vladak commented Jan 20, 2020

vladak commented Jan 20, 2020 • edited Loading

vladak commented Jan 20, 2020

vladak commented Jan 20, 2020

vladak commented Jan 23, 2020

vladak commented Mar 9, 2020

vladak commented Mar 19, 2020

vladak commented Mar 19, 2020

vladak commented Mar 23, 2020

vladak commented Mar 23, 2020

vladak commented Mar 23, 2020

vladak commented Mar 23, 2020

vladak commented Mar 24, 2020

vladak commented Mar 24, 2020

ahornace commented Feb 27, 2019 •

edited

Loading

vladak commented Jan 20, 2020 •

edited

Loading