Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-open the log file by filesystem notify #43359

Merged
merged 17 commits into from
Jul 18, 2024
Merged

Re-open the log file by filesystem notify #43359

merged 17 commits into from
Jul 18, 2024

Conversation

marcoandredinis
Copy link
Contributor

@marcoandredinis marcoandredinis commented Jun 21, 2024

This PR implements a handler for logfile re-open by subscription on the filesystem notifications, once file renamed (which usually happen with logrotate default configuration), log file must be reopened. Added configuration flag to enable or disable the watcher, might be useful if graceful restart should be used instead.

changelog: Re-open log file by filesystem notify rename/remove (useful for logrotate).

Related:
#6589
#9468
https://github.com/gravitational/customer-sensitive-requests/issues/155

Co-authored-by: @vapopov

This PR implements an handler for the CONT signal.
The handler will close and re-open the log file.
If the log destination is not a file, this is a no-op.

Useful for logrotate postrotate hook.
Copy link

🤖 Vercel preview here: https://docs-rjalvj2mc-goteleport.vercel.app/docs/ver/preview

Copy link

🤖 Vercel preview here: https://docs-fvl6lzscr-goteleport.vercel.app/docs/ver/preview

@vapopov
Copy link
Contributor

vapopov commented Jun 24, 2024

@marcoandredinis I'm going to change the logic a bit, to test it within SIGHUP signal instead, it might be useful in case we have some active sessions that produce the logs and in the case with graceful restart, we should guarantee that the logs from still active sessions not going to split into two log files, so process keep writing in the same file until we can restart

@vapopov
Copy link
Contributor

vapopov commented Jun 24, 2024

@marcoandredinis I've checked the implementation with SIGHUP, SIGCONT you cover both cases so everything looks good

@espadolini
Copy link
Contributor

Does that imply that any logrotate hook has to send SIGCONT to all the teleport processes that might've spawned by HUP?

Is that true of all the exec and port forward processes too?

Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vapopov @marcoandredinis @espadolini I just had a thought regarding this - instead of hogging another signal here (which are a scarce resource for us), what do you think about adding a new "post logrotate" command to the new debug service that @gabrielcorado recently implemented?

It is enabled by default and already has the logic for changing log level for a running process so having ability to reopen log files seems like a natural addition. We could add another API handler to it and a corresponding CLI command, say teleport debug post-logrotate that would be set as a post-rotate hook for logrotate.

WDYGT?

@espadolini
Copy link
Contributor

The debug service doesn't have a way to make all running Teleport processes reopen their log file, which is possible instead with a killall teleport (which isn't particularly great either, admittedly, but anyone relying on logrotate in this day and age is probably fine with the racy behavior).

@r0mant
Copy link
Collaborator

r0mant commented Jun 26, 2024

The debug service doesn't have a way to make all running Teleport processes reopen their log file, which is possible instead with a killall teleport (which isn't particularly great either, admittedly, but anyone relying on logrotate in this day and age is probably fine with the racy behavior).

That's fair. I'm slightly concerned about us just picking random signals for purposes they're not originally intended for meaning e.g. CONT being supposed to resume a suspended process (not saying we have or will have a use-case for it in teleport but still feels a bit off) but I don't see any other signals that would be more appropriate. Interestingly enough, it appears you can actually send payload with the signal using -q flag (which sends it using sigqueue rather than kill according to man page) but Go doesn't support it.

@marcoandredinis marcoandredinis marked this pull request as draft June 26, 2024 10:40
Copy link

🤖 Vercel preview here: https://docs-kkdhvly56-goteleport.vercel.app/docs/ver/preview

lib/utils/log/file_writer.go Outdated Show resolved Hide resolved
lib/utils/log/file_writer.go Show resolved Hide resolved
lib/utils/log/file_writer.go Outdated Show resolved Hide resolved
Comment on lines +78 to +79
s.lock.Lock()
defer s.lock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a separate lock for the write synchronization and for the rest of the FileSharedWriter machinery?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we really need it, because we can't write when in goroutine reopen is triggered, same for closing the file. Only RunWatcherReopen kind of independent of the writing process, but we call it before file shared writer is assigned to the logger (it might be even moved to constructor)

Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm once you remove the extra knob and with a couple minor suggestions

docs/pages/reference/helm-reference/teleport-cluster.mdx Outdated Show resolved Hide resolved
lib/utils/log/file_writer.go Outdated Show resolved Hide resolved
lib/utils/log/file_writer.go Outdated Show resolved Hide resolved
if !ok {
return
}
slog.ErrorContext(context.Background(), "Error received on logger watcher", "error", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we pass the context to all slog functions from the caller?

Copy link
Contributor

@vapopov vapopov Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately in configuration process (where we init FileSharedWriter) we don't have any context, I wanted to use slog.Error, but it blacklisted in linter test. One way is add it in general configuration function

func Configure(clf *CommandLineFlags, cfg *servicecfg.Config, legacyAppFlags bool) error
func ApplyFileConfig(fc *FileConfig, cfg *servicecfg.Config) error
func applyLogConfig(loggerConfig Log, cfg *servicecfg.Config) error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant you can pass it here from RunWatcherReopen, and RunWatcherReopen can in turn accept the context itself from its caller in applyLogConfig - there you can pass context.Background(), but then if we need to plumb a proper context here later if would be easier to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got you, it was initially like that, then I removed within watcher closing by context cancelation

@r0mant I've made watcher always enabled, and removed this config flag

Copy link

🤖 Vercel preview here: https://docs-55x5lzgjq-goteleport.vercel.app/docs/ver/preview

Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @vapopov I would only backport to v16, just to be safe.

@vapopov vapopov added this pull request to the merge queue Jul 18, 2024
Merged via the queue into master with commit 56587ff Jul 18, 2024
44 checks passed
@vapopov vapopov deleted the marco/logrotate branch July 18, 2024 21:57
@public-teleport-github-review-bot

@marcoandredinis See the table below for backport results.

Branch Result
branch/v16 Failed

vapopov added a commit that referenced this pull request Jul 18, 2024
* Re-open the log file on OS Signal trigger

This PR implements an handler for the CONT signal.
The handler will close and re-open the log file.
If the log destination is not a file, this is a no-op.

Useful for logrotate postrotate hook.

* signal replaced with the fsnotify

* code review changes:
- refactored to launch watcher with process context
- make file shared writer as singleton
- add test for logrotate case

* add locking for write, reopen

* - configuration for log section update
- test simplification

* code review changes:
- add closing for file shared writer
- make runWatcherFunc not exported

* linter fixes

* change default file mode for log file
simplify setting global watcher

* terminate watcher after override global one
added test for validating global override

* make log file name static for shared writer

* replace logic to use finalizer instead

* fix possible locking write while reopen is triggered

* wrap internal error, fix incorrect unlock

* drop configuration flag, make watcher always enabled

* tune context passing

---------

Co-authored-by: Vadym Popov <vadym.popov@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants