Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PodMonitor #2566

Merged
merged 10 commits into from Jun 11, 2019
Merged

Add PodMonitor #2566

merged 10 commits into from Jun 11, 2019

Conversation

auhlig
Copy link
Contributor

@auhlig auhlig commented Apr 20, 2019

As described in #38.
Still work in progess. To be continued after easter holidays

@auhlig auhlig force-pushed the podmonitor branch 8 times, most recently from 8b782ad to 0fc55a4 Compare April 29, 2019 09:05
@auhlig auhlig changed the title **WIP** Add PodMonitor Add PodMonitor Apr 29, 2019
@auhlig
Copy link
Contributor Author

auhlig commented Apr 29, 2019

Could you take a look @brancz, @metalmatze?

Copy link
Contributor

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review. Thanks a ton for working on this!

cc @mxinden @metalmatze

| Field | Description | Scheme | Required |
| ----- | ----------- | ------ | -------- |
| jobLabel | The label to use to retrieve the job name from. | string | false |
| targetLabels | TargetLabels transfers labels on the Kubernetes Service onto the target. | []string | false |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just be what podTargetLabels is no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Will change

map[string]*monitoringv1.ServiceMonitor{
"testservicemonitor1": &monitoringv1.ServiceMonitor{
nil,
map[string]*monitoringv1.PodMonitor{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want the test that was here previously. Could you extract this into a separate test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

| jobLabel | The label to use to retrieve the job name from. | string | false |
| targetLabels | TargetLabels transfers labels on the Kubernetes Service onto the target. | []string | false |
| podTargetLabels | PodTargetLabels transfers labels on the Kubernetes Pod onto the target. | []string | false |
| endpoints | A list of endpoints allowed as part of this PodMonitor. | [][Endpoint](#endpoint) | true |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little difficult, at this point I'm not comfortable introducing everything the Endpoint struct offers in terms of configuration, for the same reasons why #2524 is happening. I'd prefer to have a duplicate struct for this, that does not have the problematic fields that are the goal to be deprecated as part of the overall effort of #2524. I'd suggest that we just copy the current Endpoint struct and for now remove the TLSConfig and basic auth options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed. We may also want to be cautious here in future naming as endpoints is an existing term in the K8s world and is closely related to the concept of services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe PodMetricsEndpoint ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodMetricsEndpoint sounds good

// https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#metadata
// +k8s:openapi-gen=false
metav1.ObjectMeta `json:"metadata,omitempty"`
// Specification of desired Pod selection for target discrovery by Prometheus.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/discrovery/discovery/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

type PodMonitorSpec struct {
// The label to use to retrieve the job name from.
JobLabel string `json:"jobLabel,omitempty"`
// TargetLabels transfers labels on the Kubernetes Service onto the target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be agnostic of a Kubernetes Service. Do we mean Kubernetes Pod here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. fixed.

}

podMonitors := []string{}
for k := range res {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of looping again here, we can add the keys to the slice on line 1533, when we are already looping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if version.Major == 1 && version.Minor < 7 {
// Filter targets based on the namespace selection configuration.
// By default we only discover services within the namespace of the
// ServiceMonitor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ServiceMonitor/PodMonitor/g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// By default, generate a safe job name from the pod name. We also keep
// this around if a jobLabel is set in case the targets don't actually have a
// value for it. A single service may potentially have multiple metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/service/pod/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -583,6 +581,41 @@ type PodMonitorSpec struct {
SampleLimit uint64 `json:"sampleLimit,omitempty"`
}

// PodMetricsEndpoint defines a scrapeable endpoint of a Kubernetes Pod serving Prometheus metrics.
// +k8s:openapi-gen=true
type PodMetricsEndpoint struct {
Copy link
Contributor Author

@auhlig auhlig Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PodMetricsEndpoint for a PodMonitor copies the Endpoint struct used in a ServiceMonitor. This is used to disentangle both. Ok with this @brancz?

Copy link
Contributor

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good already!

// Timeout after which the scrape is ended
ScrapeTimeout string `json:"scrapeTimeout,omitempty"`
// TLS configuration to use when scraping the endpoint
TLSConfig *TLSConfig `json:"tlsConfig,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave this out for now, due to the previously mentioned reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. removed

// TLS configuration to use when scraping the endpoint
TLSConfig *TLSConfig `json:"tlsConfig,omitempty"`
// File to read bearer token for scraping targets.
BearerTokenFile string `json:"bearerTokenFile,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as TLSConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

HonorLabels bool `json:"honorLabels,omitempty"`
// BasicAuth allow an endpoint to authenticate over basic authentication
// More info: https://prometheus.io/docs/operating/configuration/#endpoints
BasicAuth *BasicAuth `json:"basicAuth,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as TLSConfig and BearerTokenFile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@auhlig
Copy link
Contributor Author

auhlig commented Apr 30, 2019

Mh. Not sure why the CI runs into a timeout: https://travis-ci.org/coreos/prometheus-operator/jobs/526384616.
Unrelated?

@brancz
Copy link
Contributor

brancz commented Apr 30, 2019

Interesting. First time I'm seeing this, and it doesn't look like master is having this problem. I think it might have to do with this PR. Have you tried running the latest state on a new cluster?

@auhlig
Copy link
Contributor Author

auhlig commented Apr 30, 2019

Strange: It times out for all commits but at different tests.
I tested the initial state in a cluster. Let me do that for the latest one as well.

@metalmatze
Copy link
Member

I don't think it matters at what tests it timeouts. Looking at the fact that it needs 50min and we have a limit of 55min for those tests, I suppose that something deadlocks and doesn't finish.
I would guess that this is a test in this PR, so worth trying those tests in isolation.

@auhlig
Copy link
Contributor Author

auhlig commented Apr 30, 2019

I'm convinced. Must be related. I look into it.
Thanks for the feedback so far

@auhlig
Copy link
Contributor Author

auhlig commented May 10, 2019

I cannot see what's causing the timeout in the e2e test. Would appreciate any hint @brancz @metalmatze @squat 🙏

@paulfantom
Copy link
Member

This needs to be regenerated. Please run make generate-in-docker and commit generated files from example/ directory.

@auhlig
Copy link
Contributor Author

auhlig commented May 21, 2019

Thanks @paulfantom.
I fixed the e2e test and rebased. Noticed that I have to re-generate but was interrupted by a meeting.
Will do in the evening, then this PR should be ready for a final review.

@auhlig
Copy link
Contributor Author

auhlig commented May 22, 2019

Done @paulfantom. All tests are green 🎉
Could you take a look again?
/cc @brancz @metalmatze @squat

@auhlig
Copy link
Contributor Author

auhlig commented Jun 6, 2019

rebased and fixed e2e test

@auhlig
Copy link
Contributor Author

auhlig commented Jun 6, 2019

If you're okay with 788a6b0, I'd rebase & resolve the conflicts again @brancz

Copy link
Contributor

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the conflict. Once rebased and green, this lgtm 👍 Awesome work! Thank you so much!

{Key: "target_label", Value: "namespace"},
},
{
{Key: "source_labels", Value: []string{"__meta_kubernetes_pod_container_name"}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: we should add this in the ServiceMonitor as well 👍

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
@auhlig
Copy link
Contributor Author

auhlig commented Jun 11, 2019

Rebased and resolved conflicts. LGTY @brancz?

@brancz
Copy link
Contributor

brancz commented Jun 11, 2019

lgtm 👍 probably worth one more review as it's such a large one

@s-urbaniak @metalmatze @squat @paulfantom ?

Copy link
Member

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked over the non-generated code.
LGTM from my side.

I'd suggest to just move forward with this and then do follow up PRs!

This is super exciting! 🎉

Copy link
Contributor

@s-urbaniak s-urbaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks a lot! 🎉

@s-urbaniak s-urbaniak merged commit ca400fd into prometheus-operator:master Jun 11, 2019
@auhlig
Copy link
Contributor Author

auhlig commented Jun 11, 2019

Thanks @brancz @metalmatze @s-urbaniak @paulfantom @squat :tada

Let me know if anything related needs adjustment. Happy to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants