Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove default debug option #12

Merged
merged 1 commit into from
Jul 4, 2024

Conversation

prudo1
Copy link
Collaborator

@prudo1 prudo1 commented Jun 14, 2024

This series removes the -d option set per default to kexec followed by some small cleanup patches. Please see the patch description for details

@prudo1 prudo1 force-pushed the fixes/remove_default_debug_option branch from 4288ec7 to 5924a11 Compare June 14, 2024 13:49
Copy link
Collaborator

@baoquan-he baoquan-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in log:s/kdump_load/kexec_load/,

Copy link
Contributor

@daveyoung daveyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Philipp, If people use "-c" I think a warning about not supported should be enough instead of overwriting with "-s" so that we have a chance to use "-c" for testing purpose in case there are some upstream requests. If we overwriting with "-s" then it is hard to enable it anymore. It can be up to user to remove the "-c". See what do you and Baoquan think

Copy link
Contributor

@daveyoung daveyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to drop this patch as I commented in another one so we have the infrastructure there for testing purpose. Otherwise it maybe not correct to add "-s" "--dt-no-old-root" together, I vaguely remember that is ppc64 specific and kexec_load only option.

@baoquan-he
Copy link
Collaborator

Hi Philipp, If people use "-c" I think a warning about not supported should be enough instead of overwriting with "-s" so that we have a chance to use "-c" for testing purpose in case there are some upstream requests. If we overwriting with "-s" then it is hard to enable it anymore. It can be up to user to remove the "-c". See what do you and Baoquan think

I agree with Dave. And I am not sure if it's OK to hardcode the '-s' in kdumpctl. With this, user may not have chance to try '-c', unless they need and know how to edit the code. Imagine one day, we encounter a bug on kexec_file_load and have no clue at all. We may want to try kexec_load to compare.

@prudo1
Copy link
Collaborator Author

prudo1 commented Jun 18, 2024

There's a typo in log:s/kdump_load/kexec_load/,

Thanks... fixed it locally

@prudo1
Copy link
Collaborator Author

prudo1 commented Jun 18, 2024

Hi Philipp, If people use "-c" I think a warning about not supported should be enough instead of overwriting with "-s" so that we have a chance to use "-c" for testing purpose in case there are some upstream requests. If we overwriting with "-s" then it is hard to enable it anymore. It can be up to user to remove the "-c". See what do you and Baoquan think

I agree with Dave. And I am not sure if it's OK to hardcode the '-s' in kdumpctl. With this, user may not have chance to try '-c', unless they need and know how to edit the code. Imagine one day, we encounter a bug on kexec_file_load and have no clue at all. We may want to try kexec_load to compare.

Hi Dave,
Hi Baoquan,

thanks for taking a look. Please note that I don't intend to backport this commit to RHEL9 where -c is still an option. But for Fedora and RHEL10 I believe this commit makes sense. There the kernel is/will be compiled without CONFIG_KEXEC. So option -c won't work and kexec-tools will return with an error. In that case printing a warning plus automatically falling back to -s is the better option in my opinion.

I'll take a look on how the two commits concerning kexec_file_load can be split up better. So backporting the warning can be done easier without setting -s per default.

@daveyoung
Copy link
Contributor

Hi Philipp, If people use "-c" I think a warning about not supported should be enough instead of overwriting with "-s" so that we have a chance to use "-c" for testing purpose in case there are some upstream requests. If we overwriting with "-s" then it is hard to enable it anymore. It can be up to user to remove the "-c". See what do you and Baoquan think

I agree with Dave. And I am not sure if it's OK to hardcode the '-s' in kdumpctl. With this, user may not have chance to try '-c', unless they need and know how to edit the code. Imagine one day, we encounter a bug on kexec_file_load and have no clue at all. We may want to try kexec_load to compare.

Hi Dave, Hi Baoquan,

thanks for taking a look. Please note that I don't intend to backport this commit to RHEL9 where -c is still an option. But for Fedora and RHEL10 I believe this commit makes sense. There the kernel is/will be compiled without CONFIG_KEXEC. So option -c won't work and kexec-tools will return with an error. In that case printing a warning plus automatically falling back to -s is the better option in my opinion.

I'll take a look on how the two commits concerning kexec_file_load can be split up better. So backporting the warning can be done easier without setting -s per default.

Hi Philipp, hmm. I think the removing of "-d" is an independent issue, can you post it separately?

@daveyoung
Copy link
Contributor

Hi Philipp, If people use "-c" I think a warning about not supported should be enough instead of overwriting with "-s" so that we have a chance to use "-c" for testing purpose in case there are some upstream requests. If we overwriting with "-s" then it is hard to enable it anymore. It can be up to user to remove the "-c". See what do you and Baoquan think

I agree with Dave. And I am not sure if it's OK to hardcode the '-s' in kdumpctl. With this, user may not have chance to try '-c', unless they need and know how to edit the code. Imagine one day, we encounter a bug on kexec_file_load and have no clue at all. We may want to try kexec_load to compare.

Hi Dave, Hi Baoquan,
thanks for taking a look. Please note that I don't intend to backport this commit to RHEL9 where -c is still an option. But for Fedora and RHEL10 I believe this commit makes sense. There the kernel is/will be compiled without CONFIG_KEXEC. So option -c won't work and kexec-tools will return with an error. In that case printing a warning plus automatically falling back to -s is the better option in my opinion.
I'll take a look on how the two commits concerning kexec_file_load can be split up better. So backporting the warning can be done easier without setting -s per default.

Hi Philipp, hmm. I think the removing of "-d" is an independent issue, can you post it separately?

Hi Philipp, maybe I still do not understand why the "-c/-s" auto fallback is needed. If people use "-c" kexec load will finally fail and we can print error about this is not supported, it should be enough, I can not convince myself to auto fallback to -s as if so we will be hard to test kexec -c with upstream customized kernel, so the code is not harmful at least.. Frankly ditto about the i386 part, even if it is not used, but not harmful, but maybe drop ix86 is not so bad. Anyway it can be another pr if needed.

@prudo1
Copy link
Collaborator Author

prudo1 commented Jun 25, 2024

Hi Dave,
Baoquan and I discussed this in our last team event. My rational was that Baoquan unintentionally disabled kexec_load for Fedora as well when he disabled it for RHEL. So using the -c option on a current Rawhide kernel leads to the following error message

# kdumpctl start
kexec_load failed: Function not implemented
entry   	= 0xb2f54760 flags = 0x3e0001
nr_segments = 7
segment[0].buf   = 0x55d8cef2e9c0
segment[0].bufsz = 0x30
segment[0].mem   = 0xa3000000
segment[0].memsz = 0x1000
segment[1].buf   = 0x7f400f600010
segment[1].bufsz = 0x1c98600
segment[1].mem   = 0xac367000
segment[1].memsz = 0x1c99000
segment[2].buf   = 0x7f4011405010
segment[2].bufsz = 0xf3c968
segment[2].mem   = 0xae000000
segment[2].memsz = 0x44b9000
segment[3].buf   = 0x55d8cef4e4b0
segment[3].bufsz = 0x51a9
segment[3].mem   = 0xb2f4e000
segment[3].memsz = 0x6000
segment[4].buf   = 0x55d8cef473c0
segment[4].bufsz = 0x70e0
segment[4].mem   = 0xb2f54000
segment[4].memsz = 0x9000
segment[5].buf   = 0x55d8cef45ab0
segment[5].bufsz = 0x400
segment[5].mem   = 0xb2f5d000
segment[5].memsz = 0x4000
segment[6].buf   = 0x7f401244d010
segment[6].bufsz = 0x9e800
segment[6].mem   = 0xb2f61000
segment[6].memsz = 0x9f000
kdump: kexec: failed to load kdump kernel
kdump: Starting kdump: [FAILED]

which I find rather nasty. That's why I thought it would be better to fallback to -s. So if a user has option -c set on KEXEC_ARGS and upgrades to a new kernel it only prints an error message but still works.

However, Baoquan and I decided it's better to re-enable kexec_load for Fedora so we have a fallback mechanism in case there is a bug in kexec_file_load upstream without the need to recompile the kernel. I'll prepare a MR for kernel-ark for that and will rework this series.

And I'll also split out the commits that are not related to removing the default -d option to my other series that already contains various fixes for bugs found in the last test cycle.

@prudo1
Copy link
Collaborator Author

prudo1 commented Jun 25, 2024

BTW, I've created an MR to re-enable kexec_load on Fedora

Kernel commits cbc2fe9d9cb2 ("kexec_file: add kexec_file flag to control
debug printing") and a85ee18c7900 ("kexec_file: print out debugging
message if required") added debug messages to the kexec_file_load system
call when option -d is provided to the kexec user space tool. As
kexec_file_load is the default and option -d is set by default these
messages are always printed when a crash kernel is loaded. This not only
clutters the kernel log but also potentially leaks confidential kernel
information to users. As the messages are printed to the kernel log, not
stderr, the redirection to /var/log/kdump.log won't catch them. This
will become even more problematic as for RHEL10 the kernel will be built
without support for the kexec_load system call. So kexec_file_load will
be the only choice in the future.

The redirection also caused confusion in a recent bug report. There a
user moved a working /etc/sysconfig/kdump from ppc to s390 with
KEXEC_ARGS containing the --dt-no-old-root option. This option is arch
specific and does not exist on s390. Thus the kexec-tools failed with an
'unrecognized option' error followed by the usage(). The problem was
that the 'unrecognized option' error is printed to stderr, which got
redirected to /var/log/kdump.log, while the usage() is printed to
stdout, which ended up in the systemd journal. This caused confusion as
the user only checked the journal and found the usage() without any
error message.

Thus remove the default -d option and the redirection of stderr to
/var/log/kdump.log for the kexec-tools user space tool.

This commit ultimately reverts 88a8b94 ("kdumpctl: add the '-d' option to
enable the kexec loading debugging messages").

Signed-off-by: Philipp Rudo <prudo@redhat.com>
@prudo1 prudo1 force-pushed the fixes/remove_default_debug_option branch from 5924a11 to 0296834 Compare June 26, 2024 12:29
@prudo1 prudo1 mentioned this pull request Jun 26, 2024
@prudo1
Copy link
Collaborator Author

prudo1 commented Jun 26, 2024

Dropped all but the first commit and moved the commit that rewrites load_kdump to PR #18

@baoquan-he
Copy link
Collaborator

This MR looks good to me, ACK.

@daveyoung
Copy link
Contributor

daveyoung commented Jul 1, 2024 via email

@baoquan-he
Copy link
Collaborator

Hi Baoquan, ACK if not visible in the pr interface, there is an "approve" action so it can be easier visible to the maintainer. But anyway a reminder, @coiby Xu @.***> which way do you prefer?

I tried to approve, while I couldn't find button to do.

@coiby
Copy link
Member

coiby commented Jul 4, 2024

Hi @daveyoung,

Yes, an approve action makes it easier for me to tell which PR has been reviewed. So yes, I prefer an approve action. In the future, I may also experiment with merging PR automatically when for example two people have approved the PR.

@coiby coiby merged commit 3028529 into rhkdump:main Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants