Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support log rotation of CephCSI pods #12809

Open
Madhu-1 opened this issue Aug 29, 2023 · 10 comments
Open

Support log rotation of CephCSI pods #12809

Madhu-1 opened this issue Aug 29, 2023 · 10 comments
Assignees
Labels

Comments

@Madhu-1
Copy link
Member

Madhu-1 commented Aug 29, 2023

Is this a bug report or feature request?

  • Feature Request

Provide a way to preserve CSI logs or support log rotation for the cephcsi pods.

What should the feature do:

Should preserve the logs of cephcsi pods for better debugging

What is use case behind this feature:

In most of the long-running clusters the csi logs will get flushed by kubernetes, we need to have a way to preserve the old CSI logs to a certain period/size so that we can check what had happened in the cluster.

@parth-gr parth-gr self-assigned this Aug 29, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@Madhu-1
Copy link
Member Author

Madhu-1 commented Mar 21, 2024

@parth-gr any update on this one?

@parth-gr
Copy link
Member

@Madhu-1 this got skipped, but this is something important to have in,

csi logs will get flushed by kubernetes

Is there any barrier, or why it flushes? If we even apply log rotation we need to understand the root cause why it flushes.

@parth-gr
Copy link
Member

So taking a closer look, I saw the kubelet has the default values set for the log files size, containerLogMaxSize (default 10Mi) and containerLogMaxFiles (default 5)
Which can be adjusted by kubelet config file https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
But that configuration will be node specific, and we don't want to change them as we don't want other pods to use the same config.

So solution is to have Using a sidecar container with the logging agent I see the similar approach we do for the ceph pods.

The ceph configurations are set by the cephcluster CR,
So for this do we need the environment variables in the rook-ceph-operator-config Configmap where we sets most of the csi configuration related values.

data:
  LogCollectorEnable: "true"
  Periodicity: string
  MaxLogSize: *resource.Quantity

@parth-gr
Copy link
Member

on a offline discussion we Santosh,
We thought we can also use the similar values what we are adding to the cephcluster CR, for the csi-pods too.
So we dont need separate env variables plus no need separate validation of values

@travisn travisn removed the keepalive label Mar 27, 2024
@parth-gr
Copy link
Member

We the recent offline discussion we have to wait until we finalize whether this should be owned by rook or separately by csi,
similar too #13963 (comment)

@travisn
Copy link
Member

travisn commented May 3, 2024

The ceph daemons enable the log rotation with the following approach:

  1. The ceph daemon writes both to a file, as well as stdout/stderr
  2. K8s captures the pod logs from the output to stdout/stderr
  3. There is a log rotate sidecar on each ceph daemon that will rotate the files that are written to disk on the host path.

Does CSI have an option to write the logs to a file, similar to the ceph pods? If not, we will need a rework from the csi pods to write it to a file, which would then allow for log rotation. I assume that's a very large work item to rework the logging in csi.

The only alternative to reworking the logging is to write less to the logs so they don't rotate as often. Is this issue only occurring when the log level is turned up higher to level 5? I imagine this does not happen at the default upstream value of level 0. @Madhu-1 Has it been considered to change more critical logging to a lower level, so the log won't be filled up with non-critical info? Or perhaps even the most verbose log messages can be shortened, while preserving the important troubleshooting info.

@Madhu-1
Copy link
Member Author

Madhu-1 commented May 6, 2024

@travisn yes each csi sidecar and cephcsi driver is having below options

  -log_backtrace_at value
    	when logging hits line file:N, emit a stack trace
  -log_dir string
    	If non-empty, write log files in this directory (no effect when -logtostderr=true)
  -log_file string
    	If non-empty, use this log file (no effect when -logtostderr=true)
  -log_file_max_size uint
    	Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
  -logtostderr
    	log to standard error instead of files (default true)

These options are not set by default, these options need to be set as per the requirement.

Is this issue only occurring when the log level is turned up higher to level 5? I imagine this does not happen at the default upstream value of level 0. @Madhu-1 Has it been considered to change more critical logging to a lower level, so the log won't be filled up with non-critical info? Or perhaps even the most verbose log messages can be shortened, while preserving the important troubleshooting info.

CSI logs are very critical to analyze any cases, in most cases customer/users does PVC and snapshot options very frequently and even some or automated ones. we need to rotate the logs to keep to a certain extend based on the user configured values.

@travisn
Copy link
Member

travisn commented May 6, 2024

Is it possible to log both to stderr and to the file? Only then can we rotate files, as well as see the pod logs. But the descriptions indicate they are mutually exclusive?

  -log_file string
    	If non-empty, use this log file (no effect when -logtostderr=true)

@Madhu-1
Copy link
Member Author

Madhu-1 commented May 6, 2024

Is it possible to log both to stderr and to the file? Only then can we rotate files, as well as see the pod logs. But the descriptions indicate they are mutually exclusive?

  -log_file string
    	If non-empty, use this log file (no effect when -logtostderr=true)

There is one more flag for it

-alsologtostderr=false
		Logs are written to standard error as well as to files.

https://pkg.go.dev/k8s.io/klog/v2#section-documentation is having all the required details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants