Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use docker log rotation mechanism instead of logrotate #40634

Merged

Conversation

crassirostris
Copy link

@crassirostris crassirostris commented Jan 27, 2017

This is a solution for #38495.

Instead of rotating logs using logrotate tool, which is configured quite rigidly, this PR makes docker responsible for the rotation and makes it possible to configure docker logging parameters. It solves the following problems:

It's still far from ideal, for example setting logging options per pod, as suggested in #15478 would be much more flexible, but latter approach requires deep changes, including changes in API, which may be in vain because of CRI and long-term vision for logging.

Changes include:

  • Change in salt. It's possible to configure docker log parameters, using variables in pillar. They're exported from env variables on gce, but for different cloud provider they have to be exported first.
  • Change in configure-helper.sh scripts for those os on gce that don't use salt + default values exposed via env variables

This change may be problematic for kubelet logs functionality with CRI enabled, that will be tackled in the follow-up PR, if confirmed.

CC @piosz @Random-Liu @yujuhong @dashpole @dchen1107 @vishh @kubernetes/sig-node-pr-reviews

On GCI by default logrotate is disabled for application containers in favor of rotation mechanism provided by docker logging driver.

@crassirostris crassirostris added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 27, 2017
@crassirostris crassirostris added this to the v1.6 milestone Jan 27, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 27, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jan 27, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@vishh vishh assigned dashpole and unassigned roberthbailey Jan 28, 2017
@vishh
Copy link
Contributor

vishh commented Jan 28, 2017

@dashpole can you do an initial review?

@dashpole
Copy link
Contributor

Sure

@dashpole
Copy link
Contributor

Do we need to do anything for providers other than GCE?

@crassirostris
Copy link
Author

@dashpole If provider uses salt, it will pick up this change automatically

Otherwise this change doesn't influence provider, but its owners can make mirroring changes. As for ability to configure parameters using environment of kube-up.sh for different providers, as I've mentioned in the PR text, it requires copying several variables to pillar, which can also be done by provider's owners.

@dashpole
Copy link
Contributor

Just to confirm, this only disables logrotate for docker, right?
One thing we may want to discuss are the defaults. The default in this PR is 5 rotations of 10mb each.
@crassirostris were you able to re-run your experiment to see if the missing log lines problem was fixed?
Code changes lgtm

@yujuhong
Copy link
Contributor

This PR changes all OS images in GCE clusters to use docker's native log rotation. Could we instead change only the configuration relevant to GKE (i.e., GCI)?

Also ping @dchen1107, who wanted to take a look.

@crassirostris
Copy link
Author

@mikedanese It needs your approval

@piosz
Copy link
Member

piosz commented Feb 22, 2017

@mikedanese @roberthbailey we need this for 1.6

@roberthbailey
Copy link
Contributor

roberthbailey commented Feb 23, 2017

i approve conditional on @dchen1107 giving this an lgtm

@piosz
Copy link
Member

piosz commented Feb 23, 2017

@roberthbailey thanks!

@dchen1107
Copy link
Member

Sorry for the late response. Thought we already agreed upon the scope (limited to GCI image only) and the solution through an offline discussions.

/lgtm and
/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: Crassirostris, dchen1107

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2017
@crassirostris
Copy link
Author

Applying lgtm from Dawn's comment

@crassirostris crassirostris added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 24, 2017
@crassirostris
Copy link
Author

@k8s-bot kubemark e2e test this

@crassirostris
Copy link
Author

@k8s-bot kops aws e2e test this

@crassirostris
Copy link
Author

crassirostris commented Feb 24, 2017

@k8s-bot kubemark e2e test this kubernetes/test-infra#2012

@crassirostris
Copy link
Author

@k8s-bot cvm gce e2e test this
@k8s-bot kubemark e2e test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit b18bad1 into kubernetes:master Feb 27, 2017
@kfox1111
Copy link

I'm guessing this is probably a rare edge case, but may possibly be hit if the log server being shipped to is down for a while. What happens if the log shipper was restarted before all the old data was shipped and after the symlinks were updated? The shipper would no longer know about the old file and loose data?

@crassirostris
Copy link
Author

@kfox1111 Yes, that situation is possible, you have to be ready.

Moreover, if log shipper for some reason doesn't keep a track of a log file for some time and rotation happens twice, some potion of the logs is lost too. That doesn't depend on the way log files are rotated, or the way logs are written, it still may happen with journald or logrotate.

@krmayankk
Copy link

@crassirostris is this fix only for GCE based k8s deployments or for bare metal as well ?

@dashpole
Copy link
Contributor

dashpole commented May 9, 2017

Its only for GCE based deployments

@piosz
Copy link
Member

piosz commented May 11, 2017

Due to some objections that I don't remember (@crassirostris can explain) we introduced this change only for GCE using GCI/COS. You can you similar approach in your deployment.

@crassirostris
Copy link
Author

B/c it would be too disturbing otherwise, with possible long-lasting implications in the setups we don't control and don't test

@alexbrand
Copy link
Contributor

Hey @crassirostris,

The logging documentation here says:

An important consideration in node-level logging is implementing log rotation, so that logs don’t consume all available storage on the node. Kubernetes uses the logrotate tool to implement log rotation.
Kubernetes performs log rotation daily, or if the log file grows beyond 10MB in size. Each rotation belongs to a single container; if the container repeatedly fails or the pod is evicted, all previous rotations for the container are lost. By default, Kubernetes keeps up to five logging rotations per container.

Are these docs correct? It seems like rotation is not happening on a cluster I setup outside of GCE. I am wondering if those docs are just out of date, or if I am misunderstanding something.

@crassirostris
Copy link
Author

@alexbrand The documentation is little bit obsolete for the COS image on GCP, otherwise (e.g. debian on GCP or ubuntu on AWS) it's actually true. BUT it only applies to clusters brought up by kube-up.sh script, I think that's your problem. Generally, kubernetes per se doesn't handle log rotation (as for now, that's yet to be discussed), you have to set it up independently in your installation, e.g. by configuring logrotate script in crone, as kube-up.sh does it by default or by configuring the parameters of Docker the way it's done in cos on GCP now. I hope that answers your question.

Sorry for that misunderstanding, I'll patch the documentation. Thanks a lot for pointing that out!

@alexbrand
Copy link
Contributor

@crassirostris Got it! That is what I kinda assumed, but wanted to make sure. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet