-
Notifications
You must be signed in to change notification settings - Fork 14
[log] [metric] Log Setting API, More TRACE Logs, and Gatherer Support #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…patibility Signed-off-by: hfuss <hayden.fuss@kaleido.io>
…for logging Signed-off-by: hfuss <hayden.fuss@kaleido.io>
…lementation Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
…amically via monitoring server; all dbsql logs to trace Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
EnriqueL8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @onelapahead - I agree with moving those logs to Trace, I've felt the same level of pain debugging logs and being polluted by that.
I have a few questions on the way the log level is set and some confusions there
| func (as *apiServer[T]) createMuxRouter(ctx context.Context) (*mux.Router, error) { | ||
| r := mux.NewRouter().UseEncodedPath() | ||
| hf := as.handlerFactory() | ||
| hf := as.handlerFactory(logrus.InfoLevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So by default it's info level? Not sure I understand this one, I thought it would use the log level in the apiServer as object?
| if logLevel != "" { | ||
| ctx := log.WithLogFields(req.Context(), "new_level", logLevel) | ||
| log.L(ctx).Warn("changing log level", logLevel) | ||
| log.SetLevel(logLevel) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate the value is one of the levels supported?
|
|
||
| // TODO allow for toggling formatting (json, text), sampling, etc. | ||
|
|
||
| return http.StatusAccepted, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no log level provided, shouldn't we return malformed?
| r := mux.NewRouter().UseEncodedPath() | ||
| hf := as.handlerFactory() // TODO separate factory for monitoring ?? | ||
| // This ensures logs aren't polluted with monitoring API requests such as metrics or probes | ||
| hf := as.handlerFactory(logrus.TraceLevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to set info level? instead of trace?
| LogLevel *logrus.Level | ||
| DefaultRequestTimeout time.Duration | ||
| MaxTimeout time.Duration | ||
| DefaultFilterLimit uint64 | ||
| MaxFilterSkip uint64 | ||
| MaxFilterLimit uint64 | ||
| HandleYAML bool | ||
| PassthroughHeaders []string | ||
| AlwaysPaginate bool | ||
| SupportFieldRedaction bool | ||
| BasePath string | ||
| BasePathParams []*PathParam | ||
|
|
||
| logLevel logrus.Level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have two?
This is a slightly philosophical PR - but working on codebases using ff-common, I've found I'm often failing to find the log message I need, and often fighting to filter out all the noise of the logs I rarely need in production (but do find helpful in development). Adapted from #200 originally.
In short - I think we log too much at the
debuglevel, especially in these libraries which should mostly logtrace, and not enough at theinfolevel. I worry we depend too much ondebugto "see everything", and we can't guarantee users will actually havedebugon when they hit a new problem. We need to make sureinfohas "enough" (which is very hard to known) to debug know issues / bugs / edge cases, and compensate additionally with metrics and tracing. For unknown issues / networking / database-related issues - thats wheretracecan be enabled temporarily to help triage.Between human SREs, log aggregation systems, and agentic AI - there is a cost to every byte of waste in terms of storage, processing, and context. A "less is more" mindset is necessary. Especially as we consider adopting more modern, performant logging frameworks like https://github.com/uber-go/zap which encourage structured logging and sampling.
All that said - this PR proposes two significant changes:
ffapi,ffresty,fftls, anddbsqltotraceto avoid logging 10-100s of lines per API request (especially when TLS and databases are in play)PUTAPI on the monitoring server:tracelogs referred to above, w/o continually over logging.Additionally, exposes the
prometheus.Gatherwithin the metrics managers' Prometheus registry to allow for custom metrics exporting and filtering.