Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write shard level metadata blob when snapshotting searchable snapshot indexes #13190

Merged
merged 3 commits into from
Jun 21, 2024

Conversation

bugmakerrrrrr
Copy link
Contributor

@bugmakerrrrrr bugmakerrrrrr commented Apr 15, 2024

Description

In #7247, we only snapshot the index metadata if the target index is searchable snapshot, so we don't create shard level snapshot metadata under shard blob container (ie. snap-{uuid}.dat blob). But this metadata blob needs to be loaded when getting snapshot status or cloning snapshot. Currently, following exception will be thrown if we call snapshot status api and target snapshot contains searchable snapshot.

SnapshotMissingException[[test-repo:test-repeated-snap-0] is missing
]
	at org.opensearch.repositories.blobstore.BlobStoreRepository.loadShardSnapshot(BlobStoreRepository.java:3418)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.getShardSnapshotStatus(BlobStoreRepository.java:3218)
	at org.opensearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.snapshotShards(TransportSnapshotsStatusAction.java:450)
	at org.opensearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.lambda$loadRepositoryData$3(TransportSnapshotsStatusAction.java:348)
	at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
	at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341)
	at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120)
	at org.opensearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:82)
	at org.opensearch.action.StepListener.whenComplete(StepListener.java:95)
	at org.opensearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.loadRepositoryData(TransportSnapshotsStatusAction.java:320)
	at org.opensearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:303)
	at org.opensearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.clusterManagerOperation(TransportSnapshotsStatusAction.java:149)
	at org.opensearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.clusterManagerOperation(TransportSnapshotsStatusAction.java:90)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.masterOperation(TransportClusterManagerNodeAction.java:177)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.clusterManagerOperation(TransportClusterManagerNodeAction.java:186)
	at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.lambda$doStart$1(TransportClusterManagerNodeAction.java:292)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

We can write the shard level snapshot metadata to resolve this issue.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@bugmakerrrrrr bugmakerrrrrr changed the title fix snapshot status Write shard level metadata blob when snapshotting searchable snapshot indexes Apr 15, 2024
@bugmakerrrrrr
Copy link
Contributor Author

Btw, I have no good idea when reading old snapshot. Maybe when we getting shard snapshot, we can check the snapshot index settings after SnapshotMissingException is thrown, if target index is remote snapshot, then return a empty snapshot status. Any thoughts?

@bugmakerrrrrr
Copy link
Contributor Author

@andrross you may be interested in this :)

Copy link
Contributor

❌ Gradle check result for 6045092: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 6045092

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/performance-analyzer.git]

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label May 15, 2024
@kotwanikunal
Copy link
Member

@bugmakerrrrrr This looks good to me.

Btw, I have no good idea when reading old snapshot. Maybe when we getting shard snapshot, we can check the snapshot index settings after SnapshotMissingException is thrown, if target index is remote snapshot, then return a empty snapshot status. Any thoughts?

I think that's a good idea. We could possibly add in additional checks to ensure if the cause of SnapshotMissingException is directly related to this scenario

Also, do you mind adding in a changelog entry and rebasing your code?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label May 22, 2024
@bugmakerrrrrr bugmakerrrrrr force-pushed the snapshot_status branch 2 times, most recently from 9899f6a to 6763bb8 Compare May 22, 2024 15:38
Copy link
Contributor

❌ Gradle check result for 9899f6a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6763bb8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bugmakerrrrrr
Copy link
Contributor Author

@andrross @kotwanikunal would you please have a look when you have a chance and merge this if it is possible

@andrross andrross added the backport 2.x Backport to 2.x branch label Jun 20, 2024
Copy link
Contributor

❌ Gradle check result for 55475a6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

bugmakerrrrrr and others added 3 commits June 20, 2024 15:24
Signed-off-by: panguixin <panguixin@bytedance.com>
Signed-off-by: panguixin <panguixin@bytedance.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Copy link
Contributor

❌ Gradle check result for d1b94bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for d1b94bc: SUCCESS

@andrross andrross merged commit 568c193 into opensearch-project:main Jun 21, 2024
29 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-13190-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 568c193dd817b76b2f0ca4647bc4e01908db709d
# Push it to GitHub
git push --set-upstream origin backport/backport-13190-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-13190-to-2.x.

andrross pushed a commit to andrross/OpenSearch that referenced this pull request Jun 21, 2024
… indexes (opensearch-project#13190)

* fix snapshot status

Signed-off-by: panguixin <panguixin@bytedance.com>

* add change log

Signed-off-by: panguixin <panguixin@bytedance.com>

* Fix spotless violations

Signed-off-by: Andrew Ross <andrross@amazon.com>

---------

Signed-off-by: panguixin <panguixin@bytedance.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 568c193)
andrross added a commit that referenced this pull request Jun 21, 2024
… indexes (#13190) (#14492)

* fix snapshot status

Signed-off-by: panguixin <panguixin@bytedance.com>

* add change log

Signed-off-by: panguixin <panguixin@bytedance.com>

* Fix spotless violations

Signed-off-by: Andrew Ross <andrross@amazon.com>

---------

Signed-off-by: panguixin <panguixin@bytedance.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 568c193)

Co-authored-by: panguixin <panguixin@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants