Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zookeeper connection leak in Discovery #5093

Open
ioanvapi opened this Issue Jan 14, 2019 · 3 comments

Comments

Projects
None yet
3 participants
@ioanvapi
Copy link

ioanvapi commented Jan 14, 2019

What did you do?
Used prometheus with zookeeper for service discovery then reloaded prometheus (SIGHUP) several times.

What did you expect to see?
From pprof perspective, prometheus keeps the same amount of goroutines and a similar number of process file descriptors related to the zookeeepr connections.

What did you see instead? Under which circumstances?
After several prometheus service reload ("killall -SIGHUP prometheus") we observed the number of files descriptors for zookeeper connections are increased. Also, the number of goroutines related to the Discovery.Run() increased in /debug/pprof/goroutine.
We created a cron that reloads the prometheus service at every 20 mins and we observed the above behaviour.

Environment

  • System information:

    Linux 3.10.0-862.14.4.el7.x86_64 x86_64

  • Prometheus version:

    prometheus, version 2.6.0 (branch: HEAD, revision: dbd1d58)
    build user: root@bf5760470f13
    build date: 20181217-15:14:46
    go version: go1.11.3

  • Code in prometheus/discovery/zookeeper/zookeeper.go:

func (d *Discovery) Run(ctx context.Context, ch chan<- []*targetgroup.Group) {
	defer func() {
		for _, tc := range d.treeCaches {
			tc.Stop()
		}
		// Drain event channel in case the treecache leaks goroutines otherwise.
		for range d.updates {
		}
		d.conn.Close()
	}()
  • Pprof/goroutines
7 @ 0x42dfab 0x43df5d 0x14e1213 0x45bcc1
#	0x14e1212	github.com/prometheus/prometheus/discovery/zookeeper.(*Discovery).Run+0x122	/app/discovery/zookeeper/zookeeper.go:175
  • Description
    In the above code we observe the d.conn.Close() is invoked if the above for range over the channel d.updates is terminated. Since the d.updates channel is never closed in the code then the following happens:
    • d.conn.Close() is never invoked and the result is a leak of zookeeper connections
    • defer func() is never exited and the result is a leak of goroutine that invoked the Run() method
@ioanvapi

This comment has been minimized.

Copy link
Author

ioanvapi commented Jan 14, 2019

I have replaced the below code in func (d *Discovery) Run():

        for range d.updates {
         }
         d.conn.Close()

with the code:

         L:
		for {
			select {
			case <-d.updates:
			default:
				break L
			}
		}
		d.conn.Close()

and we did not experience the leaks described in the issue. The purpose of this code is to consume any event from the d.updates channel and in case there is no event then it will default and break out from the for loop. After that, d.conn.Close() will be invoked and the defer function will exit.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jan 14, 2019

@ioanvapi Do you want open a PR for this with the changes you described above?

@ioanvapi

This comment has been minimized.

Copy link
Author

ioanvapi commented Jan 14, 2019

I have a fork and I can ask a merge request if my solution seams to be ok.

ioanvapi added a commit to ioanvapi/prometheus that referenced this issue Jan 14, 2019

fix for zookeeper connection leak
issue: prometheus#5093 (comment)
Signed-off-by: Vapirovschi <vapirovschi@optymyze.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.