Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi list watch #969

Merged
merged 1 commit into from
Dec 4, 2019
Merged

Conversation

brancz
Copy link
Member

@brancz brancz commented Nov 13, 2019

What this PR does / why we need it:

Previously kube-state-metrics created multiple ListerWatchers, with
multiple reflectors for a single store. This caused that on "replace"
events, the entire store was replaced with the content of a single
reflector instead of the union of all reflectors acting on a single
store.

This patch changes the architecture to have a single reflector that is
fed by a MultiListerWatcher acting on a single store, which as a result
fixes the replace actions.

This needs a little bit more testing on my side, but I wanted to open this already to give others the chance to validate as well.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #942

@lilic @tariq1890

@paxbit as the original reporter, could you try this out on your setup?

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 13, 2019
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2019
@brancz
Copy link
Member Author

brancz commented Nov 13, 2019

My apologies, somehow I recall this working yesterday, but now I'm getting mod errors that look the same as in #949.

@paxbit
Copy link

paxbit commented Nov 14, 2019

@brancz Ok, thanks! I'll try as soon as you're happy regarding #949.
Also atm. I seem unable to build ksm from source using make container as it fails with:
go: k8s.io/autoscaler/vertical-pod-autoscaler@v0.0.0-20191114093134-0e82aa045fa1 requires k8s.io/api@v0.0.0: reading k8s.io/api/go.mod at revision v0.0.0: unknown revision v0.0.0
I'm not much of a go guy and cannot spend time digging into whats causing this. Do you have any suggestions?

@tariq1890
Copy link
Contributor

@brancz Can you rebase this?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 15, 2019
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 15, 2019
@brancz
Copy link
Member Author

brancz commented Nov 15, 2019

Rebased, and now everything should build fine @paxbit @tariq1890 .

@brancz
Copy link
Member Author

brancz commented Nov 15, 2019

Putting on hold until we verify that it works as expected.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2019
go.mod Outdated
@@ -3,7 +3,9 @@ module k8s.io/kube-state-metrics
require (
github.com/brancz/gojsontoyaml v0.0.0-20190425155809-e8bd32d46b3d
github.com/campoy/embedmd v1.0.0
github.com/coreos/prometheus-operator v0.34.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather heavy dependency. I'd actually prefer either of the two approaches below:

i) Move the MultiListWatcher to its own project/module.
OR
ii) We could copy the implementation code over to this project. If we choose this approach, we can tweak it a bit to use klog instead of a NoOp logger implementation of a foreign (in this project's context) log project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preference for approach 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i) I don't think we would move this into its own project from the prometheus-operator as we in prometheus-operator would find it hard to maintain that and bump deps there etc.

ii) so someone, I think istio did this :D with this exact pkg, they copied it over to their project a while ago, but never kept up with our bug fixes and they ended up with a huge bug we fixed. What I mean is, we would need to maintain both pkgs then.

While I agree being dependent on prom-operator is a huge thing, I would like to see how much code we would have to maintain in that pkg in here? @brancz how much work do you think this would be to move it to ksm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s already a well encapsulated package there so copying it over is trivial.

go.mod Outdated
github.com/dgryski/go-jump v0.0.0-20170409065014-e1f439676b57
github.com/go-kit/kit v0.9.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend not importing this.

go.mod Outdated
k8s.io/klog v1.0.0
)

replace (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't need these anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that we can remove these. Once you add the revision hashes I'd mentioned earlier, run go mod tidy and go mod vendor, it should work.

go reflector.Run(b.ctx.Done())
}
lwf := func(ns string) cache.ListerWatcher { return listWatchFunc(b.kubeClient, ns) }
lw := listwatch.MultiNamespaceListerWatcher(log.NewNopLogger(), b.namespaces, nil, lwf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure about injecting a Noop logger. There definitely is value in logging the ops here. This goes back to my first comment.

@paxbit
Copy link

paxbit commented Nov 19, 2019

@brancz @lilic Built it today and it is running for 20m without showing the old behavior. Will observe it over time today till some time after lunch. If it's still ok at 4pm CET today I'll post a LGTM later.

@lilic
Copy link
Member

lilic commented Nov 19, 2019

Thanks @paxbit for testing!

@paxbit
Copy link

paxbit commented Nov 19, 2019

@lilic @brancz Still getting coherent results after ~3.5h runtime. LGTM. Thanks :) Appreciate the support!

@brancz brancz force-pushed the multi-list-watch branch 2 times, most recently from 8c9a396 to c617c36 Compare November 19, 2019 23:23
@@ -0,0 +1,269 @@
// This was originally copied from the Prometheus Operator project. Due to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a license breach if we just add this as 2019 Copyright of Kube-state-metrics? Considering my earlier review comments, we would be modifying the signature of ListWatch method to avoid the go-kit dependency and other changes should trickle down.

Thoughts? cc @lilic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this, or like suggested above just vendor this in. We want this in ASAP to fix the bug, so I think just merging it ASAP and fixing things in a future release would be best, wdyt @brancz @tariq1890

@brancz brancz force-pushed the multi-list-watch branch 4 times, most recently from 0fd9275 to 9b34f39 Compare December 4, 2019 09:52
@brancz
Copy link
Member Author

brancz commented Dec 4, 2019

Fixed all comments I believe. @lilic @tariq1890 can you give this a re-review?

@tariq1890
Copy link
Contributor

@brancz I pulled down your branch and confirmed that you don't need the replace directives. Providing my last set of review comments.

go.mod Outdated
@@ -15,8 +16,17 @@ require (
k8s.io/api v0.0.0-20191112020540-7f9008e52f64
k8s.io/apimachinery v0.0.0-20191111054156-6eb29fdf75dc
k8s.io/autoscaler/vertical-pod-autoscaler v0.0.0-20191115143342-4cf961056038
k8s.io/client-go v0.0.0-20191109102209-3c0d1af94be5
k8s.io/client-go v12.0.0+incompatible
Copy link
Contributor

@tariq1890 tariq1890 Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
k8s.io/client-go v12.0.0+incompatible
k8s.io/client-go v0.0.0-20191114101535-6c5935290e33

go.mod Outdated
@@ -15,8 +16,17 @@ require (
k8s.io/api v0.0.0-20191112020540-7f9008e52f64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
k8s.io/api v0.0.0-20191112020540-7f9008e52f64
k8s.io/api v0.0.0-20191114100352-16d7abae0d2a

Previously kube-state-metrics created multiple ListerWatchers, with
multiple reflectors for a single store. This caused that on "replace"
events, the entire store was replaced with the content of a single
reflector instead of the union of all reflectors acting on a single
store.

This patch changes the architecture to have a single reflector that is
fed by a MultiListerWatcher acting on a single store, which as a result
fixes the replace actions.
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 4, 2019
@brancz
Copy link
Member Author

brancz commented Dec 4, 2019

Reverted the mod updates. Please take a look.

Copy link
Contributor

@tariq1890 tariq1890 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 4, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, tariq1890

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@brancz
Copy link
Member Author

brancz commented Dec 4, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 4, 2019
@k8s-ci-robot k8s-ci-robot merged commit a39d9ee into kubernetes:master Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Observing More Than One Ns Leads to Inconsistent Metrics Being Exported
5 participants