osd: Prepare job needs significant more memory for provisioning #11103

travisn · 2022-10-05T14:30:47Z

Description of your changes:
The OSD creation may need to burst during OSD provisioning depending on the size of the OSD or similar factors. If the OSD prepare job is OOM killed it will cause OSD provisioning to fail and have various side effects that are difficult to troubleshoot to get the OSD to succeed. So we increase the recommendation significantly to avoid the OOM kill.

Which issue is resolved by this Pull Request:
Resolves #10219

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

The OSD creation may need to burst during OSD provisioning depending on the size of the OSD or similar factors. If the OSD prepare job is OOM killed it will cause OSD provisioning to fail and have various side effects that are difficult to troubleshoot to get the OSD to succeed. So we increase the recommendation significantly to avoid the OOM kill. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

osd: Prepare job needs significant more memory for provisioning (backport #11103)

rajha-korithrien · 2022-10-05T20:32:06Z

All,

Apologies for arriving late after this has already been merged. I just tested the value of 1200Mi (which is what this PR changes for the default resource limit for the OSD prepare job) and it is not sufficient to allow the bluestore formatting process to succeed on a 22Ti volume. It fails with this error:

RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid ac7ed11d-f2bd-4020-aaad-c3e9f58e52e2 --setuser ceph --setgroup ceph

And the Job is killed by the OOMKiller.

Further testing suggests that 1200Mi allows OSDs of size 15Ti to be correctly prepared but not much larger. Tested 18Ti and it fails. Given that 20Ti drives are just starting to become common place is 1200Mi enough? I think it is certainly a "sane default".

The note/hint about OSD prepare being potentially killed is useful. Perhaps a sentence could be added that users may need to increase this value if they have "large" volumes to prepare and they don't get OSD pods as expected.

travisn · 2022-10-05T20:46:22Z

@rajha-korithrien Thanks for your observations! We could keep raising the limit to something like 2Gi. But I'm wondering if we should just not specify the memory limits for the osd prepare. It's a one-time action and we don't want memory to prevent the creation. Is there really any reason to apply limits to it? @satoru-takeuchi @kfox1111 thoughts?

satoru-takeuchi · 2022-10-05T21:09:07Z

@travisn Now we know it's difficult to estimate the proper memory limit. So don't set memory limit by default and describe this behavior in ceph-common-issues.md is a reasonable solution for now.

osd: Prepare job needs significant more memory for provisioning (backport #11103)

kfox1111 · 2022-10-06T00:18:17Z

yeah, maybe no limit may be better... though I think someone said on slack that they had a prepare push over a running osd... so maybe a limit does help. Its kind of unclear. its messy to clean up after a failed one, so reserving rather then limiting more then enough memory may be a good default, and then if its too much, they can always tweak it down?

travisn · 2022-10-06T19:39:12Z

Like you said, it's messy to clean up if it gets in a failed state. So allowing it to run unconstrained to completion will be best. If it causes other pods to fall over, there would be a hiccup, but they should recover after the pod comes back up again.

It's still possible to set the limits or different requests if desired, for defaults we just need to leave it unconstrained.

* See note: rook/rook#11103 * See note: rook/rook#11109

travisn requested a review from satoru-takeuchi October 5, 2022 14:30

travisn mentioned this pull request Oct 5, 2022

OSD Prepare fails due to "unparsable uuid" #10219

Closed

BlaineEXE approved these changes Oct 5, 2022

View reviewed changes

satoru-takeuchi approved these changes Oct 5, 2022

View reviewed changes

satoru-takeuchi merged commit 259cf39 into rook:master Oct 5, 2022

satoru-takeuchi added the backport-release-1.10 label Oct 5, 2022

mergify bot mentioned this pull request Oct 5, 2022

osd: Prepare job needs significant more memory for provisioning (backport #11103) #11105

Merged

mergify bot added a commit that referenced this pull request Oct 5, 2022

Merge pull request #11105 from rook/mergify/bp/release-1.10/pr-11103

6d8ff55

osd: Prepare job needs significant more memory for provisioning (backport #11103)

travisn deleted the osdprepare-resources branch October 5, 2022 19:46

travisn added the backport-release-1.9 label Oct 5, 2022

mergify bot mentioned this pull request Oct 5, 2022

osd: Prepare job needs significant more memory for provisioning (backport #11103) #11106

Merged

travisn mentioned this pull request Oct 5, 2022

osd: Recommend removing memory limits from osd prepare job #11109

Merged

7 tasks

travisn added a commit that referenced this pull request Oct 5, 2022

Merge pull request #11106 from rook/mergify/bp/release-1.9/pr-11103

54d40fd

osd: Prepare job needs significant more memory for provisioning (backport #11103)

HoKim98 added a commit to SmartX-Team/OpenARK that referenced this pull request Oct 8, 2022

Resolve osdprepare memory limit issue

da73a4a

* See note: rook/rook#11103 * See note: rook/rook#11109

travisn mentioned this pull request Jan 5, 2023

docs: add limitRange comment to deploy/charts/rook-ceph-cluster/values.yaml #11512

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: Prepare job needs significant more memory for provisioning #11103

osd: Prepare job needs significant more memory for provisioning #11103

travisn commented Oct 5, 2022

rajha-korithrien commented Oct 5, 2022 •

edited

travisn commented Oct 5, 2022

satoru-takeuchi commented Oct 5, 2022

kfox1111 commented Oct 6, 2022

travisn commented Oct 6, 2022

osd: Prepare job needs significant more memory for provisioning #11103

osd: Prepare job needs significant more memory for provisioning #11103

Conversation

travisn commented Oct 5, 2022

rajha-korithrien commented Oct 5, 2022 • edited

travisn commented Oct 5, 2022

satoru-takeuchi commented Oct 5, 2022

kfox1111 commented Oct 6, 2022

travisn commented Oct 6, 2022

rajha-korithrien commented Oct 5, 2022 •

edited