Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safeguard limits for file cache during node level allocation #8208

Merged

Conversation

kotwanikunal
Copy link
Member

@kotwanikunal kotwanikunal commented Jun 21, 2023

Description

  • Related to Add safeguards to prevent file cache over-subscription #7713
  • Adds safeguards to prevent file cache over-subscription during allocation for individual node level decisions.
  • Fetches the filecache stats to get node cache size, calculates the remote shard size on nodes and verifies if the shard can be safely allocated to the said node
  • size of shard + sum(remote shards on the node) < 5 * (node cache size)
  • The constant value will be replaced by a setting in a following PR.

Related Issues

Resolves #7033

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kotwanikunal
Copy link
Member Author

Tagging previous reviewers: @andrross / @reta / @Bukhtawar

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Merging #8208 (d94f6a9) into main (418ab51) will increase coverage by 0.10%.
The diff coverage is 91.66%.

❗ Current head d94f6a9 differs from pull request most recent head f5ddcdc. Consider uploading reports for the commit f5ddcdc to get more accurate results

@@             Coverage Diff              @@
##               main    #8208      +/-   ##
============================================
+ Coverage     70.85%   70.96%   +0.10%     
+ Complexity    56948    56942       -6     
============================================
  Files          4758     4759       +1     
  Lines        269382   269224     -158     
  Branches      39414    39405       -9     
============================================
+ Hits         190877   191055     +178     
+ Misses        62429    62002     -427     
- Partials      16076    16167      +91     
Impacted Files Coverage Δ
...search/index/store/remote/filecache/FileCache.java 72.05% <ø> (ø)
.../main/java/org/opensearch/cluster/ClusterInfo.java 59.42% <62.50%> (-0.13%) ⬇️
...opensearch/cluster/InternalClusterInfoService.java 75.53% <100.00%> (-1.02%) ⬇️
...uting/allocation/decider/DiskThresholdDecider.java 73.89% <100.00%> (+1.27%) ⬆️

... and 533 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@kotwanikunal
Copy link
Member Author

@reta Resolved your comments. 3 green gradle checks in a row might be a sign :)

@reta
Copy link
Collaborator

reta commented Jun 27, 2023

@reta Resolved your comments. 3 green gradle checks in a row might be a sign :)

@kotwanikunal my apologies, missed that somehow, will look first thing tomorrow morning

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication
      1 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testDropPrimaryDuringReplication

@kotwanikunal kotwanikunal added the backport 2.x Backport to 2.x branch label Jul 10, 2023
@kotwanikunal kotwanikunal merged commit 91bfa01 into opensearch-project:main Jul 10, 2023
10 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-8208-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 91bfa01606974b947455fbc289e21a1aad096fa8
# Push it to GitHub
git push --set-upstream origin backport/backport-8208-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8208-to-2.x.

kotwanikunal added a commit to kotwanikunal/OpenSearch that referenced this pull request Jul 10, 2023
…nsearch-project#8208)

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
(cherry picked from commit 91bfa01)
@kotwanikunal
Copy link
Member Author

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-8208-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 91bfa01606974b947455fbc289e21a1aad096fa8
# Push it to GitHub
git push --set-upstream origin backport/backport-8208-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8208-to-2.x.

Will look into the backport after the followup PR.

vikasvb90 pushed a commit to raghuvanshraj/OpenSearch that referenced this pull request Jul 12, 2023
raghuvanshraj pushed a commit to raghuvanshraj/OpenSearch that referenced this pull request Jul 12, 2023
dzane17 pushed a commit to dzane17/OpenSearch that referenced this pull request Jul 12, 2023
buddharajusahil pushed a commit to buddharajusahil/OpenSearch that referenced this pull request Jul 18, 2023
…nsearch-project#8208)

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: sahil buddharaju <sahilbud@amazon.com>
baba-devv pushed a commit to baba-devv/OpenSearch that referenced this pull request Jul 29, 2023
kotwanikunal added a commit to kotwanikunal/OpenSearch that referenced this pull request Jul 29, 2023
…nsearch-project#8208)

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
(cherry picked from commit 91bfa01)
kotwanikunal added a commit that referenced this pull request Jul 31, 2023
* Add safeguard limits for file cache during node level allocation (#8208)

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
(cherry picked from commit 91bfa01)

* Add restore level safeguards to prevent file cache oversubscription (#8606)

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
(cherry picked from commit a3aab67)
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…nsearch-project#8208)

Signed-off-by: Kunal Kotwani <kkotwani@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed
3 participants