Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax testStressMaybeFlushOrRollTranslogGeneration #38918

Merged
merged 3 commits into from Feb 15, 2019

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Feb 14, 2019

The predicate InternalEngine#shouldPeriodicallyFlush depends on the uncommitted translog size and the local checkpoint. The uncommitted translog size also depends on the local checkpoint. Rarely, this condition becomes true twice in testStressMaybeFlushOrRollTranslogGeneration in the following scenario:

  1. Index doc-0 and advances the local checkpoint to 0, the condition shouldPeriodicallyFlush remains false.

  2. Index doc-1 and add it to translog, but the local checkpoint is not advanced yet (still 0). The condition shouldPeriodicallyFlush becomes true because the uncommitted translog size is 216bytes (2 operations + generation-1 + generation-2) > 180bytes and the translog generation of the new index commit would advance from 1 to 2.

[2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine ] [node_s_0] [test][0] committing writer with commit data [{local_checkpoint=0, max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, retention_leases=primary_term:1;version:0;, translog_generation=2, max_seq_no=1}]

  1. The condition shouldPeriodicallyFlush becomes true again after the local checkpoint is advanced to 1 because the uncommitted translog size is 216bytes (2 operations + generation-2 + generation-3) > 180bytes and the translog generation of the new index commit would advance from 2 to 4.

[2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine ] [node_s_0] [test][0] committing writer with commit data [{local_checkpoint=1, max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, retention_leases=primary_term:1;version:0;, translog_generation=4, max_seq_no=1}]

This scenario is producible by injecting some unlucky timing. We need to relax the assertion in this test to cover this situation.

Closes #31629

@dnhatn dnhatn added >test Issues or PRs that are addressing/adding tests :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v6.7.0 v8.0.0 v7.2.0 v7.0.0-beta1 v6.6.2 labels Feb 14, 2019
@dnhatn dnhatn requested a review from ywelsch February 14, 2019 18:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn merged commit fadad3c into elastic:master Feb 15, 2019
@dnhatn dnhatn deleted the relax-flush-test branch February 15, 2019 17:49
@dnhatn
Copy link
Member Author

dnhatn commented Feb 15, 2019

Thanks @ywelsch.

dnhatn added a commit that referenced this pull request Feb 15, 2019
The predicate shouldPeriodicallyFlush is determined by the uncommitted
translog size and the local checkpoint. The uncommitted translog size
depends on the local checkpoint. The condition shouldPeriodicallyFlush
can be true twice in in the test in the following scenario:

1. Index doc-0 and advances the local checkpoint to 0, the condition
shouldPeriodicallyFlush remains false.

2. Index doc-1 and add it to translog, but the local checkpoint is not
advanced yet (still 0). The condition shouldPeriodicallyFlush becomes
true because the uncommitted translog size is 216bytes (2ops + gen-1 +
gen-2) > 180bytes and the translog generation of the new index commit
would advance from 1 to 2.

> [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=0,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=2,
> max_seq_no=1}]

1. The shouldPeriodicallyFlush becomes true again after the local
checkpoint is advanced to 1 because the uncommitted translog size is
216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation
of the new index commit would advance from 2 to 4.

> [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=1,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=4,
> max_seq_no=1}]

We need to relax the assertion in this test to cover this situation.

Closes #31629
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 15, 2019
* master:
  Address some CCR REST test case flakiness (elastic#38975)
  Edits to text in Completion Suggester doc (elastic#38980)
  SQL: doc polishing
  [DOCS] Fixes broken formatting
  SQL: Polish the rest chapter (elastic#38971)
  Remove `nGram` and  `edgeNGram` token filter names (elastic#38911)
  Add an exception throw if waiting on transport port file fails (elastic#37574)
  Improve testcluster distribution artifact handling (elastic#38933)
  Advance max_seq_no before add operation to Lucene (elastic#38879)
  Reduce global checkpoint sync interval in disruption tests (elastic#38931)
  [test] disable packaging tests for suse boxes
  Relax testStressMaybeFlushOrRollTranslogGeneration (elastic#38918)
  [DOCS] Edits warning in put watch API (elastic#38582)
  Fix serialization bug in ShardFollowTask after cutting this class over to extend from ImmutableFollowParameters.
  [DOCS] Updates methods for upgrading machine learning (elastic#38876)
dnhatn added a commit that referenced this pull request Feb 15, 2019
The predicate shouldPeriodicallyFlush is determined by the uncommitted
translog size and the local checkpoint. The uncommitted translog size
depends on the local checkpoint. The condition shouldPeriodicallyFlush
can be true twice in in the test in the following scenario:

1. Index doc-0 and advances the local checkpoint to 0, the condition
shouldPeriodicallyFlush remains false.

2. Index doc-1 and add it to translog, but the local checkpoint is not
advanced yet (still 0). The condition shouldPeriodicallyFlush becomes
true because the uncommitted translog size is 216bytes (2ops + gen-1 +
gen-2) > 180bytes and the translog generation of the new index commit
would advance from 1 to 2.

> [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=0,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=2,
> max_seq_no=1}]

1. The shouldPeriodicallyFlush becomes true again after the local
checkpoint is advanced to 1 because the uncommitted translog size is
216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation
of the new index commit would advance from 2 to 4.

> [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=1,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=4,
> max_seq_no=1}]

We need to relax the assertion in this test to cover this situation.

Closes #31629
dnhatn added a commit that referenced this pull request Feb 16, 2019
The predicate shouldPeriodicallyFlush is determined by the uncommitted
translog size and the local checkpoint. The uncommitted translog size
depends on the local checkpoint. The condition shouldPeriodicallyFlush
can be true twice in in the test in the following scenario:

1. Index doc-0 and advances the local checkpoint to 0, the condition
shouldPeriodicallyFlush remains false.

2. Index doc-1 and add it to translog, but the local checkpoint is not
advanced yet (still 0). The condition shouldPeriodicallyFlush becomes
true because the uncommitted translog size is 216bytes (2ops + gen-1 +
gen-2) > 180bytes and the translog generation of the new index commit
would advance from 1 to 2.

> [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=0,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=2,
> max_seq_no=1}]

1. The shouldPeriodicallyFlush becomes true again after the local
checkpoint is advanced to 1 because the uncommitted translog size is
216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation
of the new index commit would advance from 2 to 4.

> [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=1,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=4,
> max_seq_no=1}]

We need to relax the assertion in this test to cover this situation.

Closes #31629
dnhatn added a commit that referenced this pull request Feb 16, 2019
The predicate shouldPeriodicallyFlush is determined by the uncommitted
translog size and the local checkpoint. The uncommitted translog size
depends on the local checkpoint. The condition shouldPeriodicallyFlush
can be true twice in in the test in the following scenario:

1. Index doc-0 and advances the local checkpoint to 0, the condition
shouldPeriodicallyFlush remains false.

2. Index doc-1 and add it to translog, but the local checkpoint is not
advanced yet (still 0). The condition shouldPeriodicallyFlush becomes
true because the uncommitted translog size is 216bytes (2ops + gen-1 +
gen-2) > 180bytes and the translog generation of the new index commit
would advance from 1 to 2.

> [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=0,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=2,
> max_seq_no=1}]

1. The shouldPeriodicallyFlush becomes true again after the local
checkpoint is advanced to 1 because the uncommitted translog size is
216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation
of the new index commit would advance from 2 to 4.

> [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine           ] [node_s_0]
> [test][0] committing writer with commit data [{local_checkpoint=1,
> max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g,
> min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q,
> retention_leases=primary_term:1;version:0;, translog_generation=4,
> max_seq_no=1}]

We need to relax the assertion in this test to cover this situation.

Closes #31629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test Issues or PRs that are addressing/adding tests v6.6.2 v6.7.0 v7.0.0-beta1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexShardIT#testStressMaybeFlushOrRollTranslogGeneration occasionally fails with an assertion error.
4 participants