Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add log file tailing and logrotate support #1264

Merged
merged 1 commit into from
May 18, 2016

Conversation

sjenning
Copy link
Contributor

@sjenning sjenning commented May 4, 2016

Fixes #1248

Add some "tail -F" support (via polling and reopening) so that cadvisor can continue to track kernel log messages across log rotations.

Looked at doing this with fsnotify but 1) fsnotify depends on golang.org/x/sys/unix and that is about 10k LOC and 2) I couldn't find a tail implementation that actually worked with fsnotify.

@k8s-bot
Copy link
Collaborator

k8s-bot commented May 4, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

}, nil
}

// initializes an OomParser object. Returns and OomParser object and an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and/an/

@timstclair
Copy link
Contributor

Can we just use inotify?

glog.Errorf("Open failed on %s", t.filename)
return nil, err
}
t.file.Seek(0, os.SEEK_END)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in behavior, as we won't get old events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is true. i figured it was bug before, sending oom events for things that may have happened days ago in the logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things from days ago are probably not relevant, but events from seconds ago might be. Either way, I think it's out of scope for this PR (feel free to leave a TODO)

@sjenning
Copy link
Contributor Author

sjenning commented May 4, 2016

@timstclair i'm looking into using inotify. might be less of a hassle than fsnotify was.

@timstclair timstclair self-assigned this May 4, 2016
@sjenning
Copy link
Contributor Author

sjenning commented May 5, 2016

@timstclair @ncdc updated with inotify instead of polling

return
}
if reopen {
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this will break you out of the outer loop. I'd recommend adding some unit tests :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it breaks me out of the for loop starting line 90 which is what i want. i've done tests so i know it works. i'll have to think about how to unit test it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, had it backwards

@sjenning
Copy link
Contributor Author

@timstclair new update that i think captures all the previous comments. PTAL when you can.

t.watcher, err = inotify.NewWatcher()
if err != nil {
glog.Errorf("Inotify init failed on %s: %v", t.filename, err)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do i avoid these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change this line to return t, err or return nil, err. Also change the return type to (*Tail, error) since the variable names don't add anything in that case.

IMHO there are only 2 reasons to give return types names:

  1. If the return type is something like (int, int, int, int), names distinguish the different ints (though I would usually prefer a struct for this)
  2. If you need to modify a return value in a defer statement (again, this should be avoided if possible)

@timstclair
Copy link
Contributor

Which parts are you having trouble with? The race condition can be avoided by protecting the reader with a mutex, and you can punt on the unit test for now.

@timstclair
Copy link
Contributor

FYI, we're hoping to cut a v0.23.2 release tomorrow (2016-05-18) for kubernetes 1.3, and I'd like to get this change in. Please ping me if there's anything I can help with to move this along.

@sjenning
Copy link
Contributor Author

I'll fix it up asap. Update within the hour.
On May 17, 2016 7:58 PM, "Tim St. Clair" notifications@github.com wrote:

FYI, we're hoping to cut a v0.23.2 release tomorrow (2016-05-18) for
kubernetes 1.3, and I'd like to get this change in. Please ping me if
there's anything I can help with to move this along.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1264 (comment)

@timstclair
Copy link
Contributor

Great, thanks!

@sjenning
Copy link
Contributor Author

@timstclair I'm close to having this done. Another half hour.

@sjenning
Copy link
Contributor Author

@timstclair i've pushed an update. i reworked watchFile, but now the mutex handling is nasty in there. recommendations welcome.

@k8s-bot
Copy link
Collaborator

k8s-bot commented May 18, 2016

Jenkins GCE e2e

Build/test passed for commit a058132.

@k8s-bot
Copy link
Collaborator

k8s-bot commented May 18, 2016

Jenkins GCE e2e

Build/test passed for commit 559dce2.

@k8s-bot
Copy link
Collaborator

k8s-bot commented May 18, 2016

Jenkins GCE e2e

Build/test passed for commit 8d80aac.


type Tail struct {
reader *bufio.Reader
sync.Mutex // protects reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this readerLock (if it's embedded it's less clear what it's protecting)

@sjenning
Copy link
Contributor Author

@timstclair ok, i incorporated your comments and introduced a readerState so that the locking can be cleaner. the real purpose is to avoid the race between a reader and the log file opening.

@k8s-bot
Copy link
Collaborator

k8s-bot commented May 18, 2016

Jenkins GCE e2e

Build/test passed for commit 4790ea2.

glog.V(4).Infof("Log file %s moved/deleted", t.filename)
t.readerLock.Lock()
defer t.readerLock.Unlock()
t.readerState = readerStateOpening
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to set reader to nil here as well? I don't think setting the state does anything otherwise.

@timstclair
Copy link
Contributor

Thanks, this looks much better. Just a couple small things, then LGTM.

@timstclair
Copy link
Contributor

Taking this as-is, will follow up with a couple fixes. Thanks!

@timstclair timstclair merged commit 381f24b into google:master May 18, 2016
timstclair pushed a commit to timstclair/cadvisor that referenced this pull request May 18, 2016
timstclair pushed a commit that referenced this pull request May 18, 2016
@sjenning sjenning deleted the log-rotate-support branch June 13, 2016 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants