Skip to content

Conversation

@bowenlan-amzn
Copy link
Member

@bowenlan-amzn bowenlan-amzn commented Nov 17, 2025

Description

Example fail: https://github.com/opensearch-project/index-management/actions/runs/19419192233/job/55553253589

RestStopRollupActionIT > test stop rollup when multiple shards configured for IM config index FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://127.0.0.1:46097/], URI [/_plugins/_rollup/jobs/multi_shard_stop/_stop], status line [HTTP/1.1 409 Conflict]
    {"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[E9xFkJoByZU6YAxWm9qS]: version conflict, required seqNo [7], primary term [1]. current document has seqNo [8] and primary term [1]","index":".opendistro-ism-config","shard":"3","index_uuid":"Fw8e4wccTXqI5NgJVSI2aQ"}],"type":"version_conflict_engine_exception","reason":"[E9xFkJoByZU6YAxWm9qS]: version conflict, required seqNo [7], primary term [1]. current document has seqNo [8] and primary term [1]","index":".opendistro-ism-config","shard":"3","index_uuid":"Fw8e4wccTXqI5NgJVSI2aQ"},"status":409}
        at __randomizedtesting.SeedInfo.seed([D8A2E3BF2B905973:9737F460ECE42543]:0)
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:501)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:384)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:359)
        at app//org.opensearch.indexmanagement.TestHelpersKt.makeRequest(TestHelpers.kt:117)
        at app//org.opensearch.indexmanagement.TestHelpersKt.makeRequest$default(TestHelpers.kt:102)
        at app//org.opensearch.indexmanagement.rollup.resthandler.RestStopRollupActionIT.test stop rollup when multiple shards configured for IM config index(RestStopRollupActionIT.kt:309)

The rollup start/stop tests with multiple shards were experiencing race conditions leading to version conflicts. The issue occurred when:

  1. Test reads rollup document (seqNo = N)
  2. Active rollup runner updates the document (seqNo = N+1)
  3. Test tries to update with seqNo = N → 409 Version Conflict

The stop/start actions perform two sequential updates:

  • First: Update rollup metadata status
  • Second: Enable/disable the rollup job

The fix moves the _stop and _start API calls inside the waitFor block, ensuring automatic retries on version conflicts. This is consistent with the pattern already used in other rollup and transform tests.

Fixed tests:

  • RestStopRollupActionIT: test stop rollup when multiple shards configured for IM config index
  • RestStartRollupActionIT: test start rollup when multiple shards configured for IM config index

Related Issues

Resolves #90

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.19%. Comparing base (39b856d) to head (6acfc38).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1529      +/-   ##
==========================================
+ Coverage   76.16%   76.19%   +0.02%     
==========================================
  Files         375      375              
  Lines       17567    17567              
  Branches     2410     2410              
==========================================
+ Hits        13380    13385       +5     
+ Misses       2946     2942       -4     
+ Partials     1241     1240       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The rollup start/stop tests with multiple shards were experiencing race
conditions leading to version conflicts. The issue occurred when:

1. Test reads rollup document (seqNo = N)
2. Active rollup runner updates the document (seqNo = N+1)
3. Test tries to update with seqNo = N → 409 Version Conflict

The stop/start actions perform two sequential updates:
- First: Update rollup metadata status
- Second: Enable/disable the rollup job

The fix moves the _stop and _start API calls inside the waitFor block,
ensuring automatic retries on version conflicts. This is consistent with
the pattern already used in other rollup and transform tests.

Fixed tests:
- RestStopRollupActionIT: test stop rollup when multiple shards configured for IM config index
- RestStartRollupActionIT: test start rollup when multiple shards configured for IM config index

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn marked this pull request as ready for review November 17, 2025 06:27
Copy link
Member

@shiv0408 shiv0408 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @bowenlan-amzn 💯

@shiv0408 shiv0408 merged commit c701c5b into opensearch-project:main Nov 18, 2025
23 checks passed
@bowenlan-amzn bowenlan-amzn deleted the fix-rollup-flaky branch November 18, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky tests

2 participants