New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] IndexingMemoryControllerIT.testIndexBufferSizeUpdateInactiveShard #13487

Closed
brwe opened this Issue Sep 10, 2015 · 6 comments

Comments

Projects
None yet
4 participants
@brwe
Contributor

brwe commented Sep 10, 2015

Shard is not marked as inactive although we set the time shard_inactive_time to 100ms and then wait 10s for shard to be marked as inactive.
This here seems to be the only failure so far:
http://build-us-00.elastic.co/job/es_core_2x_centos/89/consoleText
I does not reproduce for me.

@brwe brwe added the >test label Sep 10, 2015

@mikemccand

This comment has been minimized.

Contributor

mikemccand commented Sep 11, 2015

The test indexes one doc, confirms the memory controller up'd the indexing buffer, then flushes, then sits idle waiting for indexing buffer to go inactive (500 KB)...

In this failure, it looks like after indexing the one doc but before flushing, the memory controller had kicked in to set to inactive, but then with the flush it became active again ... still not sure why it didn't then become inactive again. But maybe the timing in this run (becoming inactive before test could flush) is a clue ...

@jasontedor

This comment has been minimized.

Member

jasontedor commented Sep 21, 2015

This failed again, but does not immediately reproduce.

@mikemccand

This comment has been minimized.

Contributor

mikemccand commented Sep 21, 2015

Thanks for the pointer @jasontedor ... I'll dig.

@markharwood

This comment has been minimized.

Contributor

markharwood commented Dec 22, 2015

@mikemccand

This comment has been minimized.

Contributor

mikemccand commented Dec 23, 2015

OK I can explain the frequent 1.7 failures, where the test times out waiting for the index to become active again ... it's a case we already know about from the code:

                // since we sync flush once a shard becomes inactive, the translog id can change, however that
                // doesn't mean the an indexing operation has happened. Note that if we're really unlucky and a flush happens
                // immediately after an indexing operation we may not become active immediately. The following
                // indexing operation will mark the shard as active, so it's OK. If that one doesn't come, we might as well stay
                // inactive

I.e., a sync'd flush snuck in after the test indexed one document but before IMC noticed the shard was active and so IMC never marks the shard active ... maybe I can fix the check to detect a flush occurred and index another document ... but this (IMC) is all greatly simplified in 2.x (#15252), maybe we should just remove this test from 1.7.x?

The 2.x failures are different (failing to become inactive) ... not sure about those yet.

@mikemccand

This comment has been minimized.

Contributor

mikemccand commented Jan 28, 2016

It looks like my fix worked for 1.7, and 2.x stopped failing (not sure why, but there have been plenty of changes) ... so I'm optimistically closing this ... feel free to re-open if we see new test failures.

@mikemccand mikemccand closed this Jan 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment