Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shard failure on flush during upload failures for remote indexes #10513

Merged
merged 4 commits into from
Oct 12, 2023

Conversation

ashking94
Copy link
Member

@ashking94 ashking94 commented Oct 9, 2023

Description

If a flush happens while the remote uploads for translog (or ckp or the metadata file) are failing due to whatsoever reasons (be it transient), it was leading to shard failure. With this PR, we are not failing the engine if the exception thrown during flush is TranslogUploadFailedException.

Related Issues

Resolves #10512

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Compatibility status:

Checks if related components are compatible with change 1ebe969

Incompatible components

Incompatible components: [https://github.com/opensearch-project/neural-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git]

@github-actions github-actions bot added bug Something isn't working Storage:Durability Issues and PRs related to the durability framework labels Oct 10, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Ashish Singh <ssashish@amazon.com>
@ashking94
Copy link
Member Author

Gradle Check (Jenkins) Run Completed with:

Flaky tests - #10542, #10558, #10193,

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all}
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Merging #10513 (1ebe969) into main (ed1b624) will decrease coverage by 0.18%.
Report is 3 commits behind head on main.
The diff coverage is 37.50%.

@@             Coverage Diff              @@
##               main   #10513      +/-   ##
============================================
- Coverage     71.30%   71.12%   -0.18%     
+ Complexity    58435    58317     -118     
============================================
  Files          4844     4845       +1     
  Lines        275284   275290       +6     
  Branches      40083    40083              
============================================
- Hits         196290   195806     -484     
- Misses        62605    63047     +442     
- Partials      16389    16437      +48     
Files Coverage Δ
...in/java/org/opensearch/index/shard/IndexShard.java 69.68% <ø> (+0.15%) ⬆️
...dex/translog/transfer/TranslogTransferManager.java 80.15% <50.00%> (+0.77%) ⬆️
...search/index/translog/InternalTranslogManager.java 66.43% <0.00%> (-5.79%) ⬇️
...anslog/transfer/TranslogUploadFailedException.java 50.00% <50.00%> (ø)

... and 476 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Ashish Singh <ssashish@amazon.com>
@ashking94
Copy link
Member Author

Gradle Check (Jenkins) Run Completed with:

flaky test - #10542

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gbbafna gbbafna merged commit 90c4297 into opensearch-project:main Oct 12, 2023
17 of 18 checks passed
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Oct 12, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 12, 2023
…10513)

Signed-off-by: Ashish Singh <ssashish@amazon.com>
(cherry picked from commit 90c4297)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Oct 12, 2023
ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Oct 12, 2023
gbbafna pushed a commit that referenced this pull request Oct 12, 2023
…remote indexes (#10585)

* Fix shard failure on flush during upload failures for remote indexes (#10513)

Signed-off-by: Ashish Singh <ssashish@amazon.com>

* Fix compilation failure

Signed-off-by: Ashish Singh <ssashish@amazon.com>

---------

Signed-off-by: Ashish Singh <ssashish@amazon.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Oct 19, 2023
…pensearch-project#10513)

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Jan 15, 2024
Signed-off-by: Ashish Singh <ssashish@amazon.com>
ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Jan 15, 2024
Signed-off-by: Ashish Singh <ssashish@amazon.com>
gbbafna pushed a commit that referenced this pull request Jan 16, 2024
Signed-off-by: Ashish Singh <ssashish@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…pensearch-project#10513)

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working skip-changelog Storage:Durability Issues and PRs related to the durability framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Shard fails on flush due to exception during prepareAndUpload in RemoteFsTranslog for remote indexes
3 participants