Skip to content

Conversation

@onelapahead
Copy link
Contributor

@onelapahead onelapahead commented Nov 30, 2025

This is a slightly philosophical PR - but working on codebases using ff-common, I've found I'm often failing to find the log message I need, and often fighting to filter out all the noise of the logs I rarely need in production (but do find helpful in development). Adapted from #200 originally.

In short - I think we log too much at the debug level, especially in these libraries which should mostly log trace, and not enough at the info level. I worry we depend too much on debug to "see everything", and we can't guarantee users will actually have debug on when they hit a new problem. We need to make sure info has "enough" (which is very hard to known) to debug know issues / bugs / edge cases, and compensate additionally with metrics and tracing. For unknown issues / networking / database-related issues - thats where trace can be enabled temporarily to help triage.

Between human SREs, log aggregation systems, and agentic AI - there is a cost to every byte of waste in terms of storage, processing, and context. A "less is more" mindset is necessary. Especially as we consider adopting more modern, performant logging frameworks like https://github.com/uber-go/zap which encourage structured logging and sampling.

All that said - this PR proposes two significant changes:

  1. It decreases a lot of internal logs for ffapi, ffresty, fftls, and dbsql to trace to avoid logging 10-100s of lines per API request (especially when TLS and databases are in play)
  2. Allows for dynamically configuring the log level of a process using a new PUT API on the monitoring server:
    curl http://localhost:6000/logging?level=<info|trace|debug|...>
    This then means a user can change the log level in a process to see more details if needed, especially if they need to see the now trace logs referred to above, w/o continually over logging.

Additionally, exposes the prometheus.Gather within the metrics managers' Prometheus registry to allow for custom metrics exporting and filtering.

…patibility

Signed-off-by: hfuss <hayden.fuss@kaleido.io>
…for logging

Signed-off-by: hfuss <hayden.fuss@kaleido.io>
…lementation

Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
…amically via monitoring server; all dbsql logs to trace

Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
@onelapahead onelapahead requested a review from a team as a code owner November 30, 2025 23:55
Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Copy link
Contributor

@EnriqueL8 EnriqueL8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @onelapahead - I agree with moving those logs to Trace, I've felt the same level of pain debugging logs and being polluted by that.

I have a few questions on the way the log level is set and some confusions there

func (as *apiServer[T]) createMuxRouter(ctx context.Context) (*mux.Router, error) {
r := mux.NewRouter().UseEncodedPath()
hf := as.handlerFactory()
hf := as.handlerFactory(logrus.InfoLevel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by default it's info level? Not sure I understand this one, I thought it would use the log level in the apiServer as object?

Comment on lines +404 to +408
if logLevel != "" {
ctx := log.WithLogFields(req.Context(), "new_level", logLevel)
log.L(ctx).Warn("changing log level", logLevel)
log.SetLevel(logLevel)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate the value is one of the levels supported?


// TODO allow for toggling formatting (json, text), sampling, etc.

return http.StatusAccepted, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no log level provided, shouldn't we return malformed?

r := mux.NewRouter().UseEncodedPath()
hf := as.handlerFactory() // TODO separate factory for monitoring ??
// This ensures logs aren't polluted with monitoring API requests such as metrics or probes
hf := as.handlerFactory(logrus.TraceLevel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to set info level? instead of trace?

Comment on lines 69 to 82
LogLevel *logrus.Level
DefaultRequestTimeout time.Duration
MaxTimeout time.Duration
DefaultFilterLimit uint64
MaxFilterSkip uint64
MaxFilterLimit uint64
HandleYAML bool
PassthroughHeaders []string
AlwaysPaginate bool
SupportFieldRedaction bool
BasePath string
BasePathParams []*PathParam

logLevel logrus.Level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have two?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants