Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: enabling logCollector by default for coredump collection #11163

Merged
merged 1 commit into from
Oct 19, 2022
Merged

core: enabling logCollector by default for coredump collection #11163

merged 1 commit into from
Oct 19, 2022

Conversation

gauravsitlani
Copy link
Member

@gauravsitlani gauravsitlani commented Oct 18, 2022

Signed-off-by: gauravsitlani gauravsitlani@riseup.net

Description of your changes:

To enable log collector by default for getting coredump for troubleshooting in case of segfaults.

Which issue is resolved by this Pull Request:
Resolves #10788 #11151

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@gauravsitlani gauravsitlani changed the title core: enabling log collector by default core: enabling logCollector by default Oct 18, 2022
@gauravsitlani gauravsitlani changed the title core: enabling logCollector by default core: enabling logCollector by default for coredump collection Oct 18, 2022
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we require a similar change in cluster-on-pvc.yaml ?

also a thought,
we were discussing adding something similar in krew plugin so can we add something in krew and let users decide if they require this advance debugging?

also, can we make periodicity hourly since this is upstream?

above are just my thought, feel free to add your comment and skip the changes

deploy/charts/rook-ceph-cluster/values.yaml Show resolved Hide resolved
@parth-gr
Copy link
Member

PLease add a commit message in the commit explaining why we need it by default.

Documentation/CRDs/Cluster/ceph-cluster-crd.md Outdated Show resolved Hide resolved
Documentation/CRDs/Cluster/ceph-cluster-crd.md Outdated Show resolved Hide resolved
@gauravsitlani
Copy link
Member Author

@parth-gr updated

deploy/examples/cluster.yaml Outdated Show resolved Hide resolved
deploy/examples/cluster.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -65,8 +65,8 @@ For more details on the mons and when to choose a number other than `3`, see the
* `disable`: is set to `true`, the crash collector will not run on any node where a Ceph daemon runs
* `daysToRetain`: specifies the number of days to keep crash entries in the Ceph cluster. By default the entries are kept indefinitely.
* `logCollector`: The settings for log collector daemon.
* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. (default: false)
* `periodicity`: how often to rotate daemon's log. (default: 24h). Specified with a time suffix which may be 'h' for hours or 'd' for days. **Rotating too often will slightly impact the daemon's performance since the signal briefly interrupts the program.**
* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. The coredump files will be generated in `/var/lib/systemd/coredump` directory on the host where the pod is running in case a daemon terminates with a segfault. (default: `true`)
Copy link
Member

@galexrt galexrt Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is the location for all underlying OSes/kernels? What happens when the kernel.core_pattern is not pointing to the coredumpctl/ just a path/directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galexrt @travisn please let me know if the new wording sounds good based on our discussion or we can make it better ?

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more place to update

deploy/examples/cluster-on-pvc.yaml Outdated Show resolved Hide resolved
@@ -65,8 +65,8 @@ For more details on the mons and when to choose a number other than `3`, see the
* `disable`: is set to `true`, the crash collector will not run on any node where a Ceph daemon runs
* `daysToRetain`: specifies the number of days to keep crash entries in the Ceph cluster. By default the entries are kept indefinitely.
* `logCollector`: The settings for log collector daemon.
* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. (default: false)
* `periodicity`: how often to rotate daemon's log. (default: 24h). Specified with a time suffix which may be 'h' for hours or 'd' for days. **Rotating too often will slightly impact the daemon's performance since the signal briefly interrupts the program.**
* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. The coredump files will be generated in `/var/lib/systemd/coredump` directory on the host depending on the underlying OS location for coredumps where the pod is running in case a daemon terminates with a segfault. (default: `true`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a small suggestion

Suggested change
* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. The coredump files will be generated in `/var/lib/systemd/coredump` directory on the host depending on the underlying OS location for coredumps where the pod is running in case a daemon terminates with a segfault. (default: `true`)
* `enabled`: if set to `true`, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option `log_to_file` will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. In case a daemon terminates with a segfault, the coredump files will be commonly be generated in `/var/lib/systemd/coredump` directory on the host, depending on the underlying OS location. (default: `true`)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated.

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, would be great if you could also enable the log collector in the tests, should just need to add it here.

enabling logCollector by default will enable the coredump
generation in case a process terminates with a segmentation fault.

Signed-off-by: gauravsitlani <gauravsitlani@riseup.net>
@gauravsitlani
Copy link
Member Author

@travisn sure, just added it there. Let me know if it looks good

@travisn
Copy link
Member

travisn commented Oct 19, 2022

@travisn sure, just added it there. Let me know if it looks good

Looks good thanks, i'll just wait to approve after the CI finishes

@travisn travisn merged commit d695a0f into rook:master Oct 19, 2022
travisn added a commit that referenced this pull request Oct 19, 2022
core: enabling logCollector by default for coredump collection (backport #11163)
@gauravsitlani gauravsitlani deleted the gdb-coredump-tests branch October 28, 2022 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How To Retrieve Ceph Pod Coredump?
5 participants