Support log rotation of CephCSI pods #12809

Madhu-1 · 2023-08-29T07:11:08Z

Is this a bug report or feature request?

Feature Request

Provide a way to preserve CSI logs or support log rotation for the cephcsi pods.

What should the feature do:

Should preserve the logs of cephcsi pods for better debugging

What is use case behind this feature:

In most of the long-running clusters the csi logs will get flushed by kubernetes, we need to have a way to preserve the old CSI logs to a certain period/size so that we can check what had happened in the cluster.

github-actions · 2023-10-28T20:01:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Madhu-1 · 2024-03-21T08:43:08Z

@parth-gr any update on this one?

parth-gr · 2024-03-21T08:57:31Z

@Madhu-1 this got skipped, but this is something important to have in,

csi logs will get flushed by kubernetes

Is there any barrier, or why it flushes? If we even apply log rotation we need to understand the root cause why it flushes.

parth-gr · 2024-03-21T12:03:30Z

So taking a closer look, I saw the kubelet has the default values set for the log files size, containerLogMaxSize (default 10Mi) and containerLogMaxFiles (default 5)
Which can be adjusted by kubelet config file https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
But that configuration will be node specific, and we don't want to change them as we don't want other pods to use the same config.

So solution is to have Using a sidecar container with the logging agent I see the similar approach we do for the ceph pods.

The ceph configurations are set by the cephcluster CR,
So for this do we need the environment variables in the rook-ceph-operator-config Configmap where we sets most of the csi configuration related values.

data:
  LogCollectorEnable: "true"
  Periodicity: string
  MaxLogSize: *resource.Quantity

parth-gr · 2024-03-21T15:41:43Z

on a offline discussion we Santosh,
We thought we can also use the similar values what we are adding to the cephcluster CR, for the csi-pods too.
So we dont need separate env variables plus no need separate validation of values

parth-gr · 2024-03-28T09:14:28Z

We the recent offline discussion we have to wait until we finalize whether this should be owned by rook or separately by csi,
similar too #13963 (comment)

travisn · 2024-05-03T18:17:23Z

The ceph daemons enable the log rotation with the following approach:

The ceph daemon writes both to a file, as well as stdout/stderr
K8s captures the pod logs from the output to stdout/stderr
There is a log rotate sidecar on each ceph daemon that will rotate the files that are written to disk on the host path.

Does CSI have an option to write the logs to a file, similar to the ceph pods? If not, we will need a rework from the csi pods to write it to a file, which would then allow for log rotation. I assume that's a very large work item to rework the logging in csi.

The only alternative to reworking the logging is to write less to the logs so they don't rotate as often. Is this issue only occurring when the log level is turned up higher to level 5? I imagine this does not happen at the default upstream value of level 0. @Madhu-1 Has it been considered to change more critical logging to a lower level, so the log won't be filled up with non-critical info? Or perhaps even the most verbose log messages can be shortened, while preserving the important troubleshooting info.

Madhu-1 · 2024-05-06T10:15:19Z

@travisn yes each csi sidecar and cephcsi driver is having below options

  -log_backtrace_at value
    	when logging hits line file:N, emit a stack trace
  -log_dir string
    	If non-empty, write log files in this directory (no effect when -logtostderr=true)
  -log_file string
    	If non-empty, use this log file (no effect when -logtostderr=true)
  -log_file_max_size uint
    	Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
  -logtostderr
    	log to standard error instead of files (default true)

These options are not set by default, these options need to be set as per the requirement.

Is this issue only occurring when the log level is turned up higher to level 5? I imagine this does not happen at the default upstream value of level 0. @Madhu-1 Has it been considered to change more critical logging to a lower level, so the log won't be filled up with non-critical info? Or perhaps even the most verbose log messages can be shortened, while preserving the important troubleshooting info.

CSI logs are very critical to analyze any cases, in most cases customer/users does PVC and snapshot options very frequently and even some or automated ones. we need to rotate the logs to keep to a certain extend based on the user configured values.

travisn · 2024-05-06T16:26:28Z

Is it possible to log both to stderr and to the file? Only then can we rotate files, as well as see the pod logs. But the descriptions indicate they are mutually exclusive?

  -log_file string
    	If non-empty, use this log file (no effect when -logtostderr=true)

Madhu-1 · 2024-05-06T16:30:46Z

Is it possible to log both to stderr and to the file? Only then can we rotate files, as well as see the pod logs. But the descriptions indicate they are mutually exclusive?
  -log_file string
    	If non-empty, use this log file (no effect when -logtostderr=true)

There is one more flag for it

-alsologtostderr=false
		Logs are written to standard error as well as to files.

https://pkg.go.dev/k8s.io/klog/v2#section-documentation is having all the required details.

Madhu-1 added the feature label Aug 29, 2023

parth-gr self-assigned this Aug 29, 2023

github-actions bot added the wontfix label Oct 28, 2023

Madhu-1 added keepalive and removed wontfix labels Oct 30, 2023

travisn removed the keepalive label Mar 27, 2024

parth-gr mentioned this issue May 20, 2024

doc: Update the roadmap for 1.15 #14229

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support log rotation of CephCSI pods #12809

Support log rotation of CephCSI pods #12809

Madhu-1 commented Aug 29, 2023

github-actions bot commented Oct 28, 2023

Madhu-1 commented Mar 21, 2024

parth-gr commented Mar 21, 2024

parth-gr commented Mar 21, 2024

parth-gr commented Mar 21, 2024

parth-gr commented Mar 28, 2024

travisn commented May 3, 2024 •

edited

Madhu-1 commented May 6, 2024

travisn commented May 6, 2024

Madhu-1 commented May 6, 2024

Support log rotation of CephCSI pods #12809

Support log rotation of CephCSI pods #12809

Comments

Madhu-1 commented Aug 29, 2023

github-actions bot commented Oct 28, 2023

Madhu-1 commented Mar 21, 2024

parth-gr commented Mar 21, 2024

parth-gr commented Mar 21, 2024

parth-gr commented Mar 21, 2024

parth-gr commented Mar 28, 2024

travisn commented May 3, 2024 • edited

Madhu-1 commented May 6, 2024

travisn commented May 6, 2024

Madhu-1 commented May 6, 2024

travisn commented May 3, 2024 •

edited