-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self-hosted: Allow collector to receive logs via OpenTelemetry #544
Conversation
This is mainly intended to support Kubernetes, by using fluentbit with the opentelemetry output plugin to send pod logs to the collector. The logs received can optionally be filtered by pod name, or via label selectors. Note this requires the log output to be jsonlog (Postgres 15+), with optional K8s context added, or plain logs without any additional context. An example fluentbit output configuration: [OUTPUT] name opentelemetry match kube.*postgres* host 127.0.0.1 port 4318 With a corresponding collector configuration: db_log_otel_server = 127.0.0.1:4318 When utilizing CloudNativePG, a filter like the following can be used to filter only to the primary's logs for a given cluster: db_log_otel_k8s_labels = cnpg.io/cluster=cluster-example,cnpg.io/instanceRole=primary
I also created some draft of docs: https://pganalyze-we-add-selfho-wbo8ie.herokuapp.com/docs/log-insights/setup/opentelemetry |
pganalyze-collector-setup | ||
/pganalyze-collector | ||
/pganalyze-collector-helper | ||
/pganalyze-collector-setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing the contrib/helm/pganalyze-collector
dir to be ignored. Adding /
will enforce that only the root dir one is going to be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, but looks good (with the caveat that I am not familiar with otel logs). Does this work with --test
and --test-logs
? If not, what's our plan there?
return true | ||
} | ||
} else { | ||
prefixedLogger.PrintWarning("Pod specification for OTel server not valid (need zero or one / separator): \"%s\", skipping log record\n", server.Config.LogOtelK8SPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should put this in config/read.go
? It seems silly to warn about this for every log line. I think we could also consider erroring out on this at startup, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also SplitN
this while parsing, and set values on Config that don't correspond to config settings directly (one for name, one for namespace)? I doubt it's a bottleneck here, but that might be easier to follow. Lines 102-114 could become
if server.Config.LogOtelK8SPod != "" {
if server.Config.LogOtelK8SPodNamespace != "" && server.Config.LogOtelK8SPodNamespace != k8sNamespaceName {
return true
}
if server.Config.LogOtelK8SPod != k8sPodName {
return true
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea! I refactored this, and also added the test for the label/selector matching part as it was actually in TODO of the original PR. Ideally we want to use the k8s code for this I'd say, but that'll involve lots of pulling deps so I decided not to do that at least in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you push up those changes? I don't see them in this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I swear I did update this (well I did everything to prepare this, which is already pushed), but I don't see them too 🙃 Now I'm modifying, maybe I did in fact forget to update this part 🤦 Thanks for noticing! Update pushed 👍
return true | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, I think it'd be better parse this into a structured form when reading the config, and make it easier to work with for consumers.
if detailLine != nil { | ||
parsedLogStream <- state.ParsedLogStreamItem{Identifier: server.Config.Identifier, LogLine: *detailLine} | ||
} | ||
} else if logger == "" && hasErrorSeverity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we requiring error_severity
? It might be good to add a comment here or above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This part is written by Lukas and actually don't have good context. This is about the following part:
Note this requires the log output to be jsonlog (Postgres 15+), with optional K8s context added, or plain logs without any additional context.
It is "plan logs without any additional context". I actually haven't tested this case. I think error_severity
is somewhat "must be there" JSON key as a JSON format logs, therefore if that key exists, we can assume that it's a Postgres log? @lfittl does it sound right?
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-JSONLOG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I updated the comment assuming that it's correct, but happy to update)
Thanks for the review, @msakrejda !
No it does not work, and the plan is to support with the branch that I have in my local machine 😬 (aka planning to do that as a follow-up) |
var K8sSelectorRegexp = regexp.MustCompile(`\s*([^!=\s]+)\s*([!=]+)\s*([^\s]+)\s*`) | ||
|
||
// CheckLabelSelectorMismatch checks if selectors do not match the given labels | ||
func CheckLabelSelectorMismatch(labels map[string]string, selectors []string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually called anywhere (other than the tests)? I didn't see any call sites.
It might also be clearer to invert the condition here (i.e., CheckLabelSelectorMatch
or CheckLabelSelectorMatches
or maybe just SelectorsMatchLabels
) and then negate it in the caller.
Separately, since we're dealing with multiple selectors and multiple labels, it might be good to include the full expected behavior (i.e., if I'm reading this correctly, that every selector must match its corresponding label, if any) in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually called anywhere (other than the tests)? I didn't see any call sites.
No, CheckLabelSelectorMismatch
is not called (though K8sSelectorRegexp
is called, and that's the reason behind of making this util). I'd like to use the k8s code for this and refactor, so that I can retire this whole util. As this was a good chunk, I decided to keep them together (also makes easier to write tests focusing on this).
It might also be clearer to invert the condition here (i.e.,
CheckLabelSelectorMatch
orCheckLabelSelectorMatches
or maybe justSelectorsMatchLabels
) and then negate it in the caller.
I thought about it, but the nice part of the current logic is that it can early return easily when something doesn't match (also logic is a bit simpler). Well, given neither labels and selectors won't be that long, maybe I don't need to worry about too much for the performance?
Separately, since we're dealing with multiple selectors and multiple labels, it might be good to include the full expected behavior (i.e., if I'm reading this correctly, that every selector must match its corresponding label, if any) in the comment.
I'll point to k8s label selector docs (that I linked in the other comment) 👍 FYI, the short answer to this is the following:
In the case of multiple requirements, all must be satisfied so the comma separator acts as a logical AND (&&) operator.
Successor of #503
This is mainly intended to support Kubernetes, by using fluentbit with the opentelemetry output plugin to send pod logs to the collector. The logs received can optionally be filtered by pod name, or via label selectors.
Note this requires the log output to be jsonlog (Postgres 15+), with optional K8s context added, or plain logs without any additional context.
When the collector is deployed to k8s using the helm chart, update
service
of the values.yaml tocreate: true
so that the service of otel server will be created.An example fluentbit output configuration:
With a corresponding collector configuration:
When utilizing CloudNativePG, a filter like the following can be used to filter only to the primary's logs for a given cluster: