New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PodMonitor #2566
Add PodMonitor #2566
Conversation
8b782ad
to
0fc55a4
Compare
Could you take a look @brancz, @metalmatze? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review. Thanks a ton for working on this!
Documentation/api.md
Outdated
| Field | Description | Scheme | Required | | ||
| ----- | ----------- | ------ | -------- | | ||
| jobLabel | The label to use to retrieve the job name from. | string | false | | ||
| targetLabels | TargetLabels transfers labels on the Kubernetes Service onto the target. | []string | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should just be what podTargetLabels
is no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Will change
pkg/prometheus/promcfg_test.go
Outdated
map[string]*monitoringv1.ServiceMonitor{ | ||
"testservicemonitor1": &monitoringv1.ServiceMonitor{ | ||
nil, | ||
map[string]*monitoringv1.PodMonitor{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want the test that was here previously. Could you extract this into a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Documentation/api.md
Outdated
| jobLabel | The label to use to retrieve the job name from. | string | false | | ||
| targetLabels | TargetLabels transfers labels on the Kubernetes Service onto the target. | []string | false | | ||
| podTargetLabels | PodTargetLabels transfers labels on the Kubernetes Pod onto the target. | []string | false | | ||
| endpoints | A list of endpoints allowed as part of this PodMonitor. | [][Endpoint](#endpoint) | true | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little difficult, at this point I'm not comfortable introducing everything the Endpoint
struct offers in terms of configuration, for the same reasons why #2524 is happening. I'd prefer to have a duplicate struct for this, that does not have the problematic fields that are the goal to be deprecated as part of the overall effort of #2524. I'd suggest that we just copy the current Endpoint
struct and for now remove the TLSConfig
and basic auth options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed. We may also want to be cautious here in future naming as endpoints
is an existing term in the K8s world and is closely related to the concept of services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe PodMetricsEndpoint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PodMetricsEndpoint
sounds good
pkg/apis/monitoring/v1/types.go
Outdated
// https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#metadata | ||
// +k8s:openapi-gen=false | ||
metav1.ObjectMeta `json:"metadata,omitempty"` | ||
// Specification of desired Pod selection for target discrovery by Prometheus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/discrovery/discovery/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/apis/monitoring/v1/types.go
Outdated
type PodMonitorSpec struct { | ||
// The label to use to retrieve the job name from. | ||
JobLabel string `json:"jobLabel,omitempty"` | ||
// TargetLabels transfers labels on the Kubernetes Service onto the target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be agnostic of a Kubernetes Service. Do we mean Kubernetes Pod
here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. fixed.
pkg/prometheus/operator.go
Outdated
} | ||
|
||
podMonitors := []string{} | ||
for k := range res { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of looping again here, we can add the keys to the slice on line 1533, when we are already looping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
pkg/prometheus/promcfg.go
Outdated
if version.Major == 1 && version.Minor < 7 { | ||
// Filter targets based on the namespace selection configuration. | ||
// By default we only discover services within the namespace of the | ||
// ServiceMonitor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ServiceMonitor/PodMonitor/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
pkg/prometheus/promcfg.go
Outdated
|
||
// By default, generate a safe job name from the pod name. We also keep | ||
// this around if a jobLabel is set in case the targets don't actually have a | ||
// value for it. A single service may potentially have multiple metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/service/pod/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
@@ -583,6 +581,41 @@ type PodMonitorSpec struct { | |||
SampleLimit uint64 `json:"sampleLimit,omitempty"` | |||
} | |||
|
|||
// PodMetricsEndpoint defines a scrapeable endpoint of a Kubernetes Pod serving Prometheus metrics. | |||
// +k8s:openapi-gen=true | |||
type PodMetricsEndpoint struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PodMetricsEndpoint
for a PodMonitor
copies the Endpoint
struct used in a ServiceMonitor
. This is used to disentangle both. Ok with this @brancz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good already!
pkg/apis/monitoring/v1/types.go
Outdated
// Timeout after which the scrape is ended | ||
ScrapeTimeout string `json:"scrapeTimeout,omitempty"` | ||
// TLS configuration to use when scraping the endpoint | ||
TLSConfig *TLSConfig `json:"tlsConfig,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to leave this out for now, due to the previously mentioned reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. removed
pkg/apis/monitoring/v1/types.go
Outdated
// TLS configuration to use when scraping the endpoint | ||
TLSConfig *TLSConfig `json:"tlsConfig,omitempty"` | ||
// File to read bearer token for scraping targets. | ||
BearerTokenFile string `json:"bearerTokenFile,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as TLSConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
pkg/apis/monitoring/v1/types.go
Outdated
HonorLabels bool `json:"honorLabels,omitempty"` | ||
// BasicAuth allow an endpoint to authenticate over basic authentication | ||
// More info: https://prometheus.io/docs/operating/configuration/#endpoints | ||
BasicAuth *BasicAuth `json:"basicAuth,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as TLSConfig and BearerTokenFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Mh. Not sure why the CI runs into a timeout: https://travis-ci.org/coreos/prometheus-operator/jobs/526384616. |
Interesting. First time I'm seeing this, and it doesn't look like master is having this problem. I think it might have to do with this PR. Have you tried running the latest state on a new cluster? |
Strange: It times out for all commits but at different tests. |
I don't think it matters at what tests it timeouts. Looking at the fact that it needs 50min and we have a limit of 55min for those tests, I suppose that something deadlocks and doesn't finish. |
I'm convinced. Must be related. I look into it. |
I cannot see what's causing the timeout in the e2e test. Would appreciate any hint @brancz @metalmatze @squat 🙏 |
This needs to be regenerated. Please run |
Thanks @paulfantom. |
Done @paulfantom. All tests are green 🎉 |
rebased and fixed e2e test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the conflict. Once rebased and green, this lgtm 👍 Awesome work! Thank you so much!
{Key: "target_label", Value: "namespace"}, | ||
}, | ||
{ | ||
{Key: "source_labels", Value: []string{"__meta_kubernetes_pod_container_name"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: we should add this in the ServiceMonitor as well 👍
Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
c120c31
to
beda9d9
Compare
Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
Rebased and resolved conflicts. LGTY @brancz? |
lgtm 👍 probably worth one more review as it's such a large one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked over the non-generated code.
LGTM from my side.
I'd suggest to just move forward with this and then do follow up PRs!
This is super exciting! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks a lot! 🎉
Thanks @brancz @metalmatze @s-urbaniak @paulfantom @squat :tada Let me know if anything related needs adjustment. Happy to fix. |
As described in #38.
Still work in progess. To be continued after easter holidays