set crash kernel memory based on host memory #3650

Redent0r · 2022-08-31T21:05:56Z

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

Summary

What does the PR accomplish, why was it needed?
Set crashkernel param in kernel.spec based on host memory. This makes kernel crash recovery work consistently. Using less memory than needed causes kdump to take longer to create the vmcore file (memory dump)

Based on a few sources (1, 2), the memory allocated to crash kernel ought to be based on the total memory available to the host. There's a crashkernel=auto option that tries to do this and I think ideally, we would like to use this. However, on a 2gb ram hyper vm machine, the auto option will still assign 128mb to the crash kernel, and this will cause kdump issues. So, we'd want to do something similar to the auto option, but with higher memory allocated to crash kernel. The range I use is based on a recommended config for RHEL6.0 and RHEL6.1 where I slightly modify the lower range of the values (2gb-6gb -> 1gb-6gb) to be able to get consistent kernel recoveries on my hyper vm instance(128mb wasn't enough for 1.8gb of ram available) Using this range, I was able to recover consistently with 6gb and 9gb of ram as well, showing this range works for higher ram values.

Follow up:
Why does mariner crashkernel requires more than 128mb (default recommended by canonical and assigned by auto option)for kernel crashes to recover consistently

Do higher ram values really require a bigger crashkernel, maybe we can use crashkernel=256mb(or lower) for higher values as well.
A: Further testing on hyper v suggest we could stick to 256m for everything. I was able to crash and recover logs fast and consecutively using 256m on a vm with 10gb of ram

Change Log

Updated crashkernel param

Does this affect the toolchain?

YES

Associated issues

https://microsoft.visualstudio.com/OS/_workitems/edit/40870902?src=WorkItemMention&src-action=artifact_link

Test Methodology

tested on hypervm
used echo c > /proc/sysrq-trigger to trigger a kernel crash and noticed inconsistent results. (Sometimes it would recover after a few minutes, sometimes it would hang and not recover)
modified crashkernel param to allocate more memory
Verified that kernel crash recovery happens fast and consistently now (about 15 seconds to get back to login prompt) and vmcore file gets created every time
Full local image build succeeded
AMD image build: https://dev.azure.com/mariner-org/mariner/_build/results?buildId=231743&view=results

Redent0r · 2022-09-07T17:52:15Z

moved to #3705

Redent0r added 2 commits August 31, 2022 13:52

set crash kernel memory based on host memory

3d85f46

bumped release and added changelog entries

f340bd1

Redent0r closed this Sep 7, 2022

Redent0r deleted the redent0r/adjust_crashkernel_param branch November 27, 2023 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set crash kernel memory based on host memory #3650

set crash kernel memory based on host memory #3650

Redent0r commented Aug 31, 2022 •

edited

Redent0r commented Sep 7, 2022

set crash kernel memory based on host memory #3650

set crash kernel memory based on host memory #3650

Conversation

Redent0r commented Aug 31, 2022 • edited

Merge Checklist

Summary

Change Log

Does this affect the toolchain?

Associated issues

Test Methodology

Redent0r commented Sep 7, 2022

Redent0r commented Aug 31, 2022 •

edited