Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s Discoverer never stops sending updates even when there are no changes. #4518

Closed
krasi-georgiev opened this issue Aug 20, 2018 · 2 comments

Comments

@krasi-georgiev
Copy link
Contributor

as part of troubleshooting #4124 I figred that the k8s provider never stops sending updates.

if you put some printf's you will see that it keeps sending updated even when there are no changes in the k8s cluster.

case tgs, ok := <-updates:
// Handle the case that a target provider exits and closes the channel
// before the context is done.
if !ok {
return
}
m.updateGroup(poolKey, tgs)
m.recentlyUpdatedMtx.Lock()
m.recentlyUpdated = true
m.recentlyUpdatedMtx.Unlock()
}

if you add some printf's in the same corresponding block in v2.0.0 it stops sending updates when there are no changes.

select {
case <-ctx.Done():
case initial, ok := <-updates:
// Handle the case that a target provider exits and closes the channel
// before the context is done.
if !ok {
break
}
// First set of all targets the provider knows.
for _, tgroup := range initial {
ts.setTargetGroup(name, tgroup)
}
case <-time.After(5 * time.Second):
// Initial set didn't arrive. Act as if it was empty
// and wait for updates later on.
}
wg.Done()
// Start listening for further updates.
for {
select {
case <-ctx.Done():
return
case tgs, ok := <-updates:
// Handle the case that a target provider exits and closes the channel
// before the context is done.
if !ok {
return
}
for _, tg := range tgs {
ts.update(name, tg)
}
}
}
}(name, prov)

I am thinking something changed at some point when refactoring the k8s code, updating the k8s client or fixing the bug with the namespaces.

ping @cofyc , @simonpasquier

@krasi-georgiev
Copy link
Contributor Author

note: to replicate you need to remove the locking as there is another bug that blocks there.

m.recentlyUpdatedMtx.Lock() 
m.recentlyUpdated = true 
m.recentlyUpdatedMtx.Unlock() 

@krasi-georgiev
Copy link
Contributor Author

the change happened in https://github.com/prometheus/prometheus/pull/3660/files so that we get updates even when the deployment is scaled to 0. In other words so that we send an update when all pods have been deleted.

The k8s klient keeps sending updates for the kube-controller-manager and the only change is that the ResourceVersion has changed.

old

&Endpoints{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:kube-controller-manager,GenerateName:,Namespace:kube-system,SelfLink:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,UID:f76b6132-a268-11e8-b8b3-08002709c498,ResourceVersion:363220,Generation:0,CreationTimestamp:2018-08-18 01:00:34 +0300 EEST,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{control-plane.alpha.kubernetes.io/leader: {"holderIdentity":"minikube_cc9a294c-a268-11e8-9149-08002709c498","leaseDurationSeconds":15,"acquireTime":"2018-08-17T22:00:34Z","renewTime":"2018-08-26T12:45:07Z","leaderTransitions":0},},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Subsets:[],}

new

&Endpoints{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:kube-controller-manager,GenerateName:,Namespace:kube-system,SelfLink:/api/v1/namespaces/kube-system/endpoints/kube-controller-manager,UID:f76b6132-a268-11e8-b8b3-08002709c498,ResourceVersion:363222,Generation:0,CreationTimestamp:2018-08-18 01:00:34 +0300 EEST,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{control-plane.alpha.kubernetes.io/leader: {"holderIdentity":"minikube_cc9a294c-a268-11e8-9149-08002709c498","leaseDurationSeconds":15,"acquireTime":"2018-08-17T22:00:34Z","renewTime":"2018-08-26T12:45:09Z","leaderTransitions":0},},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Subsets:[],}

There are 2 options to fix this.

  1. keep the last update in the SD manager and compare if lastUpdate==newUpdate than stop there and don't send anything to the scrape manager.
  2. Find out if we can set some options in the the k8s client to ignore these updates.

option 1 would solve the problem generally even if the other providers have the same issue.
option 2 would improve overall performance as we stop a lot of unnecessary processing at an earlier stage.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant