core: enabling logCollector by default for coredump collection #11163

gauravsitlani · 2022-10-18T11:03:30Z

Signed-off-by: gauravsitlani gauravsitlani@riseup.net

Description of your changes:

To enable log collector by default for getting coredump for troubleshooting in case of segfaults.

Which issue is resolved by this Pull Request:
Resolves #10788 #11151

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

subhamkrai

do we require a similar change in cluster-on-pvc.yaml ?

also a thought,
we were discussing adding something similar in krew plugin so can we add something in krew and let users decide if they require this advance debugging?

also, can we make periodicity hourly since this is upstream?

above are just my thought, feel free to add your comment and skip the changes

deploy/charts/rook-ceph-cluster/values.yaml

parth-gr · 2022-10-18T11:44:16Z

PLease add a commit message in the commit explaining why we need it by default.

Documentation/CRDs/Cluster/ceph-cluster-crd.md

gauravsitlani · 2022-10-18T13:19:42Z

@parth-gr updated

deploy/examples/cluster.yaml

subhamkrai

LGTM

galexrt · 2022-10-19T14:26:26Z

Documentation/CRDs/Cluster/ceph-cluster-crd.md

@@ -65,8 +65,8 @@ For more details on the mons and when to choose a number other than `3`, see the
  * `disable`: is set to `true`, the crash collector will not run on any node where a Ceph daemon runs
  * `daysToRetain`: specifies the number of days to keep crash entries in the Ceph cluster. By default the entries are kept indefinitely.
 * `logCollector`: The settings for log collector daemon.
-  * `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. (default: false)
-  * `periodicity`: how often to rotate daemon's log. (default: 24h). Specified with a time suffix which may be 'h' for hours or 'd' for days. **Rotating too often will slightly impact the daemon's performance since the signal briefly interrupts the program.**
+  * `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. The coredump files will be generated in `/var/lib/systemd/coredump` directory on the host where the pod is running in case a daemon terminates with a segfault. (default: `true`)


Are we sure this is the location for all underlying OSes/kernels? What happens when the kernel.core_pattern is not pointing to the coredumpctl/ just a path/directory.

@galexrt @travisn please let me know if the new wording sounds good based on our discussion or we can make it better ?

travisn

one more place to update

deploy/examples/cluster-on-pvc.yaml

travisn · 2022-10-19T17:23:45Z

Documentation/CRDs/Cluster/ceph-cluster-crd.md

@@ -65,8 +65,8 @@ For more details on the mons and when to choose a number other than `3`, see the
  * `disable`: is set to `true`, the crash collector will not run on any node where a Ceph daemon runs
  * `daysToRetain`: specifies the number of days to keep crash entries in the Ceph cluster. By default the entries are kept indefinitely.
 * `logCollector`: The settings for log collector daemon.
-  * `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. (default: false)
-  * `periodicity`: how often to rotate daemon's log. (default: 24h). Specified with a time suffix which may be 'h' for hours or 'd' for days. **Rotating too often will slightly impact the daemon's performance since the signal briefly interrupts the program.**
+  * `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. The coredump files will be generated in `/var/lib/systemd/coredump` directory on the host depending on the underlying OS location for coredumps where the pod is running in case a daemon terminates with a segfault. (default: `true`)


Looks good, just a small suggestion

Suggested change

* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. The coredump files will be generated in `/var/lib/systemd/coredump` directory on the host depending on the underlying OS location for coredumps where the pod is running in case a daemon terminates with a segfault. (default: `true`)

* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. In case a daemon terminates with a segfault, the coredump files will be commonly be generated in `/var/lib/systemd/coredump` directory on the host, depending on the underlying OS location. (default: `true`)

sure, updated.

travisn

Actually, would be great if you could also enable the log collector in the tests, should just need to add it here.

enabling logCollector by default will enable the coredump generation in case a process terminates with a segmentation fault. Signed-off-by: gauravsitlani <gauravsitlani@riseup.net>

gauravsitlani · 2022-10-19T17:36:18Z

@travisn sure, just added it there. Let me know if it looks good

travisn · 2022-10-19T17:36:51Z

@travisn sure, just added it there. Let me know if it looks good

Looks good thanks, i'll just wait to approve after the CI finishes

core: enabling logCollector by default for coredump collection (backport #11163)

gauravsitlani added the troubleshooting label Oct 18, 2022

gauravsitlani changed the title ~~core: enabling log collector by default~~ core: enabling logCollector by default Oct 18, 2022

gauravsitlani changed the title ~~core: enabling logCollector by default~~ core: enabling logCollector by default for coredump collection Oct 18, 2022

gauravsitlani requested review from subhamkrai and travisn October 18, 2022 11:07

subhamkrai requested changes Oct 18, 2022

View reviewed changes

deploy/charts/rook-ceph-cluster/values.yaml Show resolved Hide resolved

galexrt requested changes Oct 18, 2022

View reviewed changes

Documentation/CRDs/Cluster/ceph-cluster-crd.md Outdated Show resolved Hide resolved

Documentation/CRDs/Cluster/ceph-cluster-crd.md Outdated Show resolved Hide resolved

gauravsitlani requested a review from subhamkrai October 18, 2022 13:52

travisn requested changes Oct 18, 2022

View reviewed changes

deploy/examples/cluster.yaml Outdated Show resolved Hide resolved

deploy/examples/cluster.yaml Outdated Show resolved Hide resolved

gauravsitlani requested a review from travisn October 19, 2022 14:19

subhamkrai approved these changes Oct 19, 2022

View reviewed changes

galexrt reviewed Oct 19, 2022

View reviewed changes

travisn requested changes Oct 19, 2022

View reviewed changes

deploy/examples/cluster-on-pvc.yaml Outdated Show resolved Hide resolved

travisn reviewed Oct 19, 2022

View reviewed changes

travisn approved these changes Oct 19, 2022

View reviewed changes

travisn requested changes Oct 19, 2022

View reviewed changes

core: enabling logCollector by default for coredump

cc02399

enabling logCollector by default will enable the coredump generation in case a process terminates with a segmentation fault. Signed-off-by: gauravsitlani <gauravsitlani@riseup.net>

galexrt approved these changes Oct 19, 2022

View reviewed changes

travisn approved these changes Oct 19, 2022

View reviewed changes

travisn added the backport-release-1.10 label Oct 19, 2022

travisn merged commit d695a0f into rook:master Oct 19, 2022

mergify bot mentioned this pull request Oct 19, 2022

core: enabling logCollector by default for coredump collection (backport #11163) #11181

Merged

travisn added a commit that referenced this pull request Oct 19, 2022

Merge pull request #11181 from rook/mergify/bp/release-1.10/pr-11163

2241e61

core: enabling logCollector by default for coredump collection (backport #11163)

travisn mentioned this pull request Oct 26, 2022

docs: added steps to collect coredump and perf troubleshooting info #11213

Merged

7 tasks

gauravsitlani deleted the gdb-coredump-tests branch October 28, 2022 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: enabling logCollector by default for coredump collection #11163

core: enabling logCollector by default for coredump collection #11163

gauravsitlani commented Oct 18, 2022 •

edited

subhamkrai left a comment

parth-gr commented Oct 18, 2022

gauravsitlani commented Oct 18, 2022

subhamkrai left a comment

galexrt Oct 19, 2022 •

edited

gauravsitlani Oct 19, 2022

travisn left a comment

travisn Oct 19, 2022

gauravsitlani Oct 19, 2022

travisn left a comment

gauravsitlani commented Oct 19, 2022

travisn commented Oct 19, 2022

core: enabling logCollector by default for coredump collection #11163

core: enabling logCollector by default for coredump collection #11163

Conversation

gauravsitlani commented Oct 18, 2022 • edited

subhamkrai left a comment

Choose a reason for hiding this comment

parth-gr commented Oct 18, 2022

gauravsitlani commented Oct 18, 2022

subhamkrai left a comment

Choose a reason for hiding this comment

galexrt Oct 19, 2022 • edited

Choose a reason for hiding this comment

gauravsitlani Oct 19, 2022

Choose a reason for hiding this comment

travisn left a comment

Choose a reason for hiding this comment

travisn Oct 19, 2022

Choose a reason for hiding this comment

gauravsitlani Oct 19, 2022

Choose a reason for hiding this comment

travisn left a comment

Choose a reason for hiding this comment

gauravsitlani commented Oct 19, 2022

travisn commented Oct 19, 2022

gauravsitlani commented Oct 18, 2022 •

edited

galexrt Oct 19, 2022 •

edited