Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch async shard fetch transport action for replica #8218 #8356

Merged

Conversation

sudarshan-baliga
Copy link
Contributor

@sudarshan-baliga sudarshan-baliga commented Jun 29, 2023

Description

This pull request is part of the improvement #5098
This PR for adding the transport action for the async fetching of the shard information for batch of replica shards from the nodes in a single request per node. Code is inspired by the existing transport action TransportNodesListShardStoreMetadata.java which fetches just one shard information from the node per request.

Created new index

curl -X PUT "localhost:9200/my-index-4?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 1
    }
  }
}
'

Removed the data2 node and re-added it.
Data node log which was dropped and re-added. We can see that batch of shard information was fetched instead of one shard.

 ] [data2] started
[2023-06-29T20:29:28,613][INFO ][o.o.i.s.TransportNodesBatchListShardStoreMetadata] [data2] Loaded store meta data for {[my-index-4][0]=[[StoreFilesMetadata{, shardId=[my-index-4][0], metadataSnapshot{size=1, syncId=null}}]], [my-index-4][1]=[[StoreFilesMetadata{, shardId=[my-index-4][1], metadataSnapshot{size=1, syncId=null}}]]} shards

Master node log when the data node was readded. We can see that batch of shard information was fetched instead of one shard.

[2023-06-29T20:29:29,074][INFO ][o.o.g.G.TestInternalReplicaShardAllocator] [master1] sdarbStore replica data {{data1}{PSqG-WkGSD2WiCi3dvXccw}{J84l7dayS2mx-8ocMXMn_Q}{127.0.0.1}{127.0.0.1:9301}{dir}{shard_indexing_pressure_enabled=true}=[[{data1}{PSqG-WkGSD2WiCi3dvXccw}{J84l7dayS2mx-8ocMXMn_Q}{127.0.0.1}{127.0.0.1:9301}{dir}{shard_indexing_pressure_enabled=true}][StoreFilesMetadata{, shardId=[my-index-4][0], metadataSnapshot{size=1, syncId=null}}]], {data2}{EMGkWoq8Q2mK3zVXFyYK0g}{fW3xmnx7S16uJCiI9RjM9A}{127.0.0.1}{127.0.0.1:9302}{dir}{shard_indexing_pressure_enabled=true}=[[{data2}{EMGkWoq8Q2mK3zVXFyYK0g}{fW3xmnx7S16uJCiI9RjM9A}{127.0.0.1}{127.0.0.1:9302}{dir}{shard_indexing_pressure_enabled=true}][StoreFilesMetadata{, shardId=[my-index-4][0], metadataSnapshot{size=1, syncId=null}}]]}
[2023-06-29T20:29:29,438][INFO ][o.o.c.r.a.AllocationService] [master1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[my-index-4][1], [my-index-4][0]]]).
[2023-06-29T20:29:29,734][INFO ][o.o.g.GatewayAllocator   ] [master1] sdarbStore Collecting of total shards ={}, over transport

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Contributor

❌ Gradle check result for f05dfc2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 62c11d1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Copy link
Contributor

❌ Gradle check result for 29871bf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Copy link
Contributor

❌ Gradle check result for 85faccc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@shiv0408
Copy link
Member

Failing tests:

Copy link
Contributor

❕ Gradle check result for 85faccc: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testIndexReopenClose

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@shiv0408
Copy link
Member

org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob
Create a new issue #12651 for better tracking of this flaky test, this was already identified as flaky in #6090.

org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testIndexReopenClose
#10987

Aman Khare added 2 commits March 14, 2024 13:44
Signed-off-by: Aman Khare <amkhar@amazon.com>
Signed-off-by: Aman Khare <amkhar@amazon.com>
Copy link
Contributor

❌ Gradle check result for 776f625: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 2cc687f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@amkhar
Copy link
Contributor

amkhar commented Mar 14, 2024

❌ Gradle check result for 2cc687f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test : testIsolateClusterManagerAndVerifyClusterStateConsensus #12095

@shwetathareja shwetathareja merged commit 12115d1 into opensearch-project:main Mar 14, 2024
31 of 36 checks passed
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…roject#8218 (opensearch-project#8356)

* Add batch async shard fetch transport action for replica

Signed-off-by: sudarshan baliga <baliga108@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Signed-off-by: Aman Khare <amkhar@amazon.com>
@shiv0408 shiv0408 added the backport 2.x Backport to 2.x branch label Mar 19, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-8356-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 12115d1ad1feca522373220f5bef367f0c8008aa
# Push it to GitHub
git push --set-upstream origin backport/backport-8356-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8356-to-2.x.

shiv0408 added a commit to shiv0408/OpenSearch that referenced this pull request Mar 19, 2024
…roject#8218 (opensearch-project#8356)

* Add batch async shard fetch transport action for replica

Signed-off-by: Shivansh Arora <hishiv@amazon.com>
(cherry picked from commit 12115d1)
shwetathareja pushed a commit that referenced this pull request Mar 19, 2024
#12770)

* Add batch async shard fetch transport action for replica

Signed-off-by: Shivansh Arora <hishiv@amazon.com>
(cherry picked from commit 12115d1)
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…roject#8218 (opensearch-project#8356)

* Add batch async shard fetch transport action for replica

Signed-off-by: sudarshan baliga <baliga108@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Signed-off-by: Aman Khare <amkhar@amazon.com>
@shiv0408 shiv0408 self-assigned this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants