Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

Open
vladak opened this issue Feb 27, 2019 · 18 comments
Open

testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

vladak opened this issue Feb 27, 2019 · 18 comments

Comments

@vladak
Copy link
Member

vladak commented Feb 27, 2019

[ERROR]   SuggesterControllerTest.testComplexSuggestions2:407 
Expected: iterable over ["main", "mutablelistof"] in any order
     but: No item matches: "main", "mutablelistof" in []
@vladak
Copy link
Member Author

vladak commented Feb 27, 2019

like in https://travis-ci.org/oracle/opengrok/jobs/499311368 but I have also seen the failures in AppVeyor builds.

@ahornace
Copy link
Contributor

ahornace commented Feb 27, 2019

From my experience it fails because of this race condition which is bugging us for a long time: #2030

The job you provided is still running and I see no logs :/

@vladak
Copy link
Member Author

vladak commented Feb 27, 2019 via email

@vladak
Copy link
Member Author

vladak commented Jan 20, 2020

Getting more failure in SuggesterControllerTest on Windows:

@vladak
Copy link
Member Author

vladak commented Jan 20, 2020

The before method of the test has timeout set to 15 seconds:

          await().atMost(15, TimeUnit.SECONDS).until(() ->
                  getSuggesterProjectDataSize() == env.getProjectList().size());

which makes me wonder whether this is enough and how/if Awaitility provides any feedback/log if the timeout expired and the condition is not met.

@vladak
Copy link
Member Author

vladak commented Jan 20, 2020

Also, for both the failed cases listed above the setup of the test reported:

Jan 16, 2020 9:09:10 AM org.opengrok.suggest.SuggesterProjectData initSearchCountMap
SEVERE: Could not create ChronicleMap, most popular completion disabled, if you are using JDK9+ make sure to specify: --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED
java.lang.IllegalArgumentException: Average key size must be a positive, finite number
	at net.openhft.chronicle.map.ChronicleMapBuilder.checkAverageSize(ChronicleMapBuilder.java:314)
	at net.openhft.chronicle.map.ChronicleMapBuilder.averageKeySize(ChronicleMapBuilder.java:531)
	at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.<init>(ChronicleMapAdapter.java:52)
	at org.opengrok.suggest.SuggesterProjectData.initSearchCountMap(SuggesterProjectData.java:300)
	at org.opengrok.suggest.SuggesterProjectData.init(SuggesterProjectData.java:158)
	at org.opengrok.suggest.Suggester.lambda$getInitRunnable$1(Suggester.java:185)
	at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

For completeness, this was on Windows with OpenJDK 8.

@vladak
Copy link
Member Author

vladak commented Jan 20, 2020

w.r.t. suggester timeouts: both failed cases listed above had:

INFO: Suggester successfully initialized (took 4 seconds)

so it seems the 15 seconds is enough.

@vladak
Copy link
Member Author

vladak commented Jan 20, 2020

The question is whether the failure to initialize Chronicle map can lead to the observed test case failures, i.e. if there is a implicit dependency. Given that the failed test cases do test popularity, this seems plausible.

@vladak
Copy link
Member Author

vladak commented Jan 23, 2020

added a catch for IllegalArgumentException in initSearchCountMap() and for the failed case it reported the value of conf.getAverageKeySize() like so:

SEVERE: Could not create ChronicleMap due to invalid key size: NaN

@vladak vladak changed the title testComplexSuggestions2 sometimes fails testComplexSuggestions2, testGetPopularityDataAll sometimes fails Feb 12, 2020
@vladak
Copy link
Member Author

vladak commented Mar 9, 2020

Added a "debug" print to save() and it verifies @ahornace's hypothesis that the data in the map.cfg file already contains the NaN:

Mar 06, 2020 4:02:37 PM org.opengrok.suggest.popular.impl.chronicle.ChronicleMapConfiguration save
INFO: Saving chronicle map configuration in directory C:\Users\travis\AppData\Local\Temp\data9188628379176297630\suggester\json for field 'refs': entries: 0 , average key size: NaN
Mar 06, 2020 4:02:37 PM org.opengrok.suggest.SuggesterProjectData initSearchCountMap
SEVERE: Could not create ChronicleMap from directory C:\Users\travis\AppData\Local\Temp\data9188628379176297630\suggester\json for field refs due to invalid key size: entries: 0 , average key size: NaN
java.lang.IllegalArgumentException: Average key size must be a positive, finite number
	at net.openhft.chronicle.map.ChronicleMapBuilder.checkAverageSize(ChronicleMapBuilder.java:314)
	at net.openhft.chronicle.map.ChronicleMapBuilder.averageKeySize(ChronicleMapBuilder.java:531)
	at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.<init>(ChronicleMapAdapter.java:52)
	at org.opengrok.suggest.SuggesterProjectData.initSearchCountMap(SuggesterProjectData.java:300)
	at org.opengrok.suggest.SuggesterProjectData.init(SuggesterProjectData.java:158)
	at org.opengrok.suggest.Suggester.lambda$getInitRunnable$1(Suggester.java:185)
	at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

@vladak
Copy link
Member Author

vladak commented Mar 19, 2020

Both numbers on the right side of the expression here

double averageLength = (double) iterator.termLengthAccumulator / lookup.getCount();
are 0 for the failed case, thus giving NaN for the double value (that will be represented as Double in the averageLengths map).

@vladak
Copy link
Member Author

vladak commented Mar 19, 2020

The workaround could be not to put the entry to the averageLengths map if the numeric value is NaN. The getAverageLength() used in initSearchCountMap() will return default (non zero) value in such case.

@vladak
Copy link
Member Author

vladak commented Mar 23, 2020

Interestingly the setup/tests seems to fail only when failing to build average lengths map for field refs for the testdata/sources/json directory.

@vladak
Copy link
Member Author

vladak commented Mar 23, 2020

Actually, the ChronicleMapAdapter() init actually fails on Linux builds too, it just does not lead to test failure. Either the init failure is really just a red herring or is interpreted differently on Windows builds. Anyhow, this should be fixed as suggester above - if there is no index data for particular field it should be ignored w.r.t. averge key lengths.

@vladak
Copy link
Member Author

vladak commented Mar 23, 2020

The popularity tests fail because the term count cannot be updated using SuggesterServiceImpl.getInstance().increaseSearchCount() due to rebuild being in progress.

@vladak
Copy link
Member Author

vladak commented Mar 23, 2020

It seems that the trouble is that SuggesterControllerTest#before() waits merely for suggester to be initialized (Suggester#projectData to be filled) while lots of the test cases are sensitive to suggester rebuild (rather than init).

In one of the failed cases the log goes like this:

Mar 23, 2020 4:11:25 PM org.opengrok.suggest.Suggester shutdownAndAwaitTermination
INFO: Suggester successfully initialized (took 5 seconds)
Mar 23, 2020 4:11:25 PM org.opengrok.web.api.v1.suggester.provider.service.impl.SuggesterServiceImpl scheduleRebuild
INFO: Suggester rebuild not scheduled
Mar 23, 2020 4:11:26 PM org.opengrok.suggest.Suggester rebuild
INFO: Rebuilding the following suggesters: [bug16465, archives, data, scala, clojure, document, pascal, bug19195, issue10, sql, rust, csharp, java, fortran, json, perl, tcl, swift, haskell, python, c, kotlin, lisp, rfe2575, bug16070, javascript, script, bug3658, php, scopes, bug14924, bug3430]
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:60342]
Mar 23, 2020 4:11:26 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:62271/
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:62271]
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-1] Started.
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:62271]
Mar 23, 2020 4:11:27 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:57989/
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:57989]
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-2] Started.
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:57989]
Mar 23, 2020 4:11:27 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:56937/
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:56937]
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-3] Started.
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:56937]
Mar 23, 2020 4:11:28 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:54678/
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:54678]
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-4] Started.
[ERROR] Tests run: 37, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 77.04 s <<< FAILURE! - in org.opengrok.web.api.v1.controller.SuggesterControllerTest
[ERROR] testGetSuggestionsSimplePath  Time elapsed: 0.297 s  <<< FAILURE!
java.lang.AssertionError: 
Expected: iterable containing ["header"]
     but: No item matched: "header"
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.opengrok.web.api.v1.controller.SuggesterControllerTest.testGetSuggestionsSimplePath(SuggesterControllerTest.java:220)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.ja
        ...

... lots of messages about Grizzly HTTP server start and shutdown


Mar 23, 2020 4:11:35 PM org.opengrok.suggest.Suggester shutdownAndAwaitTermination
INFO: Suggesters for [bug16465, archives, data, scala, clojure, document, pascal, bug19195, issue10, sql, rust, csharp, java, fortran, json, perl, tcl, swift, haskell, python, c, kotlin, lisp, rfe2575, bug16070, javascript, script, bug3658, php, scopes, bug14924, bug3430] were successfully rebuilt (took 9 seconds)

We can see this sequence of events:

  • suggester was initialized
  • suggester rebuild started
  • test cases running (judging by the Grizzly messages about starting server)
  • suggester rebuild finished

@vladak
Copy link
Member Author

vladak commented Mar 24, 2020

When TimeUnit.SECONDS.sleep(15); was added to @BeforeClass the failures can no longer be seen.

@vladak
Copy link
Member Author

vladak commented Mar 24, 2020

I found out that SuggesterControllerTest enforces the order of test case execution by using @FixMethodOrder(MethodSorters.NAME_ASCENDING). There are 2 test cases that trigger scheduler rebuild - ZtestRebuild (rebuilds all projects, takes bunch of seconds), ZtestRebuildProject (rebuilds just the c project, nearly instantaneous).

When the SuggesterControllerTest is run in Travis or in IDEA locally on my Linux workstation (with en_US.UTF-8 locale) the Z tests actually get run first. I commented out the Z tests and the tests still pass. So, not sure why this ordering is in place, @ahornace ?

Anyhow, either ordering is not good - if the Z tests get run first they might interfere with the tests in SuggesterControllerTest (as demonstrated in this issue), if they get to run last they might interfere with tests in another test class since the suggester rebuild thread is asynchronous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants