Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onShardResult and onShardFailure are executed on one shard causes opensearch jvm crashed #12158

Merged
merged 5 commits into from Mar 11, 2024

Conversation

kkewwei
Copy link
Contributor

@kkewwei kkewwei commented Feb 4, 2024

Signed-off-by: kkewwei kkewwei@163.com

Description

onShardResult and onShardFailure are executed on one shard, which will cause opensearch jvm crashed:(similar to #4143

[2024-01-05T17:06:29,624+0800][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [node0] fatal error in thread [Thread-17309], exiting
java.lang.AssertionError: unexpected higher total ops [21] compared to expected [20]
	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:480) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:295) ~[opensearch-2.9.0.jar:2.9.0]

Related Issues

Resolves #11881

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

We can reuse this unit test: AbstractSearchAsyncActionTests.testExecutePhaseOnShardFailure()

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Feb 4, 2024

Compatibility status:

Checks if related components are compatible with change 4252242

Incompatible components

Incompatible components: [https://github.com/opensearch-project/security-analytics.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/sql.git]

@reta
Copy link
Collaborator

reta commented Feb 29, 2024

@reta @kkmr Sorry for the long time not following up due to other matters.

Not a problem @kkewwei , thank you, I am off till the end of next week, but I will get back to it as soon as possible, sorry about that

Copy link
Contributor

github-actions bot commented Mar 1, 2024

✅ Gradle check result for 814c4de: SUCCESS

@reta
Copy link
Collaborator

reta commented Mar 8, 2024

@kkewwei thank you, LGTM, could you please add changelog entry to CHANGELOG.md (under Unreleased 2.x section) for this change? Thank you.

@github-actions github-actions bot added the v2.13.0 Issues and PRs related to version 2.13.0 label Mar 10, 2024
…nsearch jvm crashed

Signed-off-by: kkewwei <kkewwei@163.com>
Signed-off-by: kkewwei <kkewwei@163.com>
Signed-off-by: kkewwei <kkewwei@163.com>
Signed-off-by: kkewwei <kkewwei@163.com>
Signed-off-by: kkewwei <kkewwei@163.com>
Copy link
Contributor

❌ Gradle check result for 02e1a7c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4252242: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei
Copy link
Contributor Author

kkewwei commented Mar 10, 2024

flaky test:
testConcurrentDecommissionAction : #12197
testDownloadStatsCorrectnessSinglePrimaryMultipleReplicaShards: #10152
testShardRoutingWithNetworkDisruption_FailOpenEnabled #10673
org.opensearch.search.query.SimpleQueryStringIT.testDocWithAllTypes #12575

Copy link
Contributor

❌ Gradle check result for 4252242: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator

reta commented Mar 11, 2024

❌ Gradle check result for 4252242: FAILURE

#12593

Copy link
Contributor

❕ Gradle check result for 4252242: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@reta reta merged commit 127f497 into opensearch-project:main Mar 11, 2024
46 of 51 checks passed
@reta reta added the backport 2.x Backport to 2.x branch label Mar 11, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-12158-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 127f497a5fb00f748e9577d76e272b9694ca0338
# Push it to GitHub
git push --set-upstream origin backport/backport-12158-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-12158-to-2.x.

@reta
Copy link
Collaborator

reta commented Mar 11, 2024

@kkewwei could you please backport manually to 2.x? thank you

@kkewwei
Copy link
Contributor Author

kkewwei commented Mar 12, 2024

@reta I'm pleasure to do it.

rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…nsearch jvm crashed (opensearch-project#12158)

* onShardResult and onShardFailure are executed on one shard causes opensearch jvm crashed

Signed-off-by: kkewwei <kkewwei@163.com>

* unit test

Signed-off-by: kkewwei <kkewwei@163.com>

* spotlessJavaCheck

Signed-off-by: kkewwei <kkewwei@163.com>

* rename variable names

Signed-off-by: kkewwei <kkewwei@163.com>

* add changelog

Signed-off-by: kkewwei <kkewwei@163.com>

---------

Signed-off-by: kkewwei <kkewwei@163.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…nsearch jvm crashed (opensearch-project#12158)

* onShardResult and onShardFailure are executed on one shard causes opensearch jvm crashed

Signed-off-by: kkewwei <kkewwei@163.com>

* unit test

Signed-off-by: kkewwei <kkewwei@163.com>

* spotlessJavaCheck

Signed-off-by: kkewwei <kkewwei@163.com>

* rename variable names

Signed-off-by: kkewwei <kkewwei@163.com>

* add changelog

Signed-off-by: kkewwei <kkewwei@163.com>

---------

Signed-off-by: kkewwei <kkewwei@163.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed bug Something isn't working Search Search query, autocomplete ...etc Severity-Major v2.13.0 Issues and PRs related to version 2.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] onShardResult and onShardFailure are executed on one shard causes opensearch jvm crashed
3 participants