Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set crash kernel memory based on host memory #3650

Closed

Conversation

Redent0r
Copy link
Contributor

@Redent0r Redent0r commented Aug 31, 2022

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

  • The toolchain has been rebuilt successfully (or no changes were made to it)
  • The toolchain/worker package manifests are up-to-date
  • Any updated packages successfully build (or no packages were changed)
  • Packages depending on static components modified in this PR (Golang, *-static subpackages, etc.) have had their Release tag incremented.
  • Package tests (%check section) have been verified with RUN_CHECK=y for existing SPEC files, or added to new SPEC files
  • All package sources are available
  • cgmanifest files are up-to-date and sorted (./cgmanifest.json, ./toolkit/tools/cgmanifest.json, ./toolkit/scripts/toolchain/cgmanifest.json, .github/workflows/cgmanifest.json)
  • LICENSE-MAP files are up-to-date (./SPECS/LICENSES-AND-NOTICES/data/licenses.json, ./SPECS/LICENSES-AND-NOTICES/LICENSES-MAP.md, ./SPECS/LICENSES-AND-NOTICES/LICENSE-EXCEPTIONS.PHOTON)
  • All source files have up-to-date hashes in the *.signatures.json files
  • sudo make go-tidy-all and sudo make go-test-coverage pass
  • Documentation has been updated to match any changes to the build system
  • Ready to merge

Summary

What does the PR accomplish, why was it needed?
Set crashkernel param in kernel.spec based on host memory. This makes kernel crash recovery work consistently. Using less memory than needed causes kdump to take longer to create the vmcore file (memory dump)

Based on a few sources (1, 2), the memory allocated to crash kernel ought to be based on the total memory available to the host. There's a crashkernel=auto option that tries to do this and I think ideally, we would like to use this. However, on a 2gb ram hyper vm machine, the auto option will still assign 128mb to the crash kernel, and this will cause kdump issues. So, we'd want to do something similar to the auto option, but with higher memory allocated to crash kernel. The range I use is based on a recommended config for RHEL6.0 and RHEL6.1 where I slightly modify the lower range of the values (2gb-6gb -> 1gb-6gb) to be able to get consistent kernel recoveries on my hyper vm instance(128mb wasn't enough for 1.8gb of ram available) Using this range, I was able to recover consistently with 6gb and 9gb of ram as well, showing this range works for higher ram values.

Follow up:
Why does mariner crashkernel requires more than 128mb (default recommended by canonical and assigned by auto option)for kernel crashes to recover consistently

Do higher ram values really require a bigger crashkernel, maybe we can use crashkernel=256mb(or lower) for higher values as well.
A: Further testing on hyper v suggest we could stick to 256m for everything. I was able to crash and recover logs fast and consecutively using 256m on a vm with 10gb of ram

Change Log
  • Updated crashkernel param
Does this affect the toolchain?

YES

Associated issues
Test Methodology
  • tested on hypervm
  • used echo c > /proc/sysrq-trigger to trigger a kernel crash and noticed inconsistent results. (Sometimes it would recover after a few minutes, sometimes it would hang and not recover)
  • modified crashkernel param to allocate more memory
  • Verified that kernel crash recovery happens fast and consistently now (about 15 seconds to get back to login prompt) and vmcore file gets created every time
  • Full local image build succeeded
  • AMD image build: https://dev.azure.com/mariner-org/mariner/_build/results?buildId=231743&view=results

@Redent0r
Copy link
Contributor Author

Redent0r commented Sep 7, 2022

moved to #3705

@Redent0r Redent0r closed this Sep 7, 2022
@Redent0r Redent0r deleted the redent0r/adjust_crashkernel_param branch November 27, 2023 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant