Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client gets stuck #322

Closed
GiedriusS opened this issue Jul 31, 2023 · 5 comments · Fixed by thanos-io/thanos#6589
Closed

Client gets stuck #322

GiedriusS opened this issue Jul 31, 2023 · 5 comments · Fixed by thanos-io/thanos#6589

Comments

@GiedriusS
Copy link

GiedriusS commented Jul 31, 2023

Somehow during timeouts, the Rueidis client gets stuck.

From /debug/pprof:

2 @ 0x43bb36 0x406f9d 0x406a98 0x19f3908 0x19eb1fa 0x19eac33 0x19d7e10 0x46f981
#	0x19f3907	github.com/redis/rueidis.(*pipe).Do+0x567			/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:808
#	0x19eb1f9	github.com/redis/rueidis.(*mux).pipeline+0xd9			/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:228
#	0x19eac32	github.com/redis/rueidis.(*mux).Do+0xd2				/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:185
#	0x19d7e0f	github.com/redis/rueidis.(*clusterClient)._refresh.func1+0x6f	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:160

5 @ 0x43bb36 0x406f9d 0x406a98 0x19f41ad 0x19eb451 0x19ead37 0x19dac48 0x19da025 0x1a0e6a5 0x1a1853d 0x1a18e32 0x1a1f434 0xa89724 0x46f981
#	0x19f41ac	github.com/redis/rueidis.(*pipe).DoMulti+0x54c							/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:917
#	0x19eb450	github.com/redis/rueidis.(*mux).pipelineMulti+0xd0						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:237
#	0x19ead36	github.com/redis/rueidis.(*mux).DoMulti+0x76							/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:196
#	0x19dac47	github.com/redis/rueidis.(*clusterClient).doMulti+0xe7						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:533
#	0x19da024	github.com/redis/rueidis.(*clusterClient).DoMulti+0x624						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:446
#	0x1a0e6a4	github.com/thanos-io/thanos/pkg/cacheutil.(*RedisClient).SetMulti+0x324				/home/giedrius/dev/thanos/pkg/cacheutil/redis_client.go:258
#	0x1a1853c	github.com/thanos-io/thanos/pkg/cache.(*RedisCache).Store+0x1c					/home/giedrius/dev/thanos/pkg/cache/redis.go:55
#	0x1a18e31	github.com/thanos-io/thanos/pkg/cache.TracingCache.Store+0x31					/home/giedrius/dev/thanos/pkg/cache/tracing_cache.go:25
#	0x1a1f433	github.com/thanos-io/thanos/pkg/store/cache.(*CachingBucket).fetchMissingSubranges.func1+0x8f3	/home/giedrius/dev/thanos/pkg/store/cache/caching_bucket.go:475
#	0xa89723	golang.org/x/sync/errgroup.(*Group).Go.func1+0x63						/home/giedrius/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75


26 @ 0x43bb36 0x406f9d 0x406a98 0x19f3908 0x19eb1fa 0x19eac33 0x19d8f4b 0x19d8d9b 0x1a0e01c 0x1a2668d 0x1ae2e7c 0x1adf475 0x1ad06e5 0x1ad6679 0xa89724 0x46f981
#	0x19f3907	github.com/redis/rueidis.(*pipe).Do+0x567							/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:808
#	0x19eb1f9	github.com/redis/rueidis.(*mux).pipeline+0xd9							/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:228
#	0x19eac32	github.com/redis/rueidis.(*mux).Do+0xd2								/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:185
#	0x19d8f4a	github.com/redis/rueidis.(*clusterClient).do+0xea						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:357
#	0x19d8d9a	github.com/redis/rueidis.(*clusterClient).Do+0x7a						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:345
#	0x1a0e01b	github.com/thanos-io/thanos/pkg/cacheutil.(*RedisClient).SetAsync+0x21b				/home/giedrius/dev/thanos/pkg/cacheutil/redis_client.go:239
#	0x1a2668c	github.com/thanos-io/thanos/pkg/store/cache.(*RemoteIndexCache).StoreExpandedPostings+0x1ec	/home/giedrius/dev/thanos/pkg/store/cache/memcached.go:132
#	0x1ae2e7b	github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).storeExpandedPostingsToCache+0xdb	/home/giedrius/dev/thanos/pkg/store/bucket.go:2635
#	0x1adf474	github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).ExpandedPostings+0xd54		/home/giedrius/dev/thanos/pkg/store/bucket.go:2302
#	0x1ad06e4	github.com/thanos-io/thanos/pkg/store.(*blockSeriesClient).ExpandPostings+0x84			/home/giedrius/dev/thanos/pkg/store/bucket.go:969
#	0x1ad6678	github.com/thanos-io/thanos/pkg/store.(*BucketStore).Series.func2+0x378				/home/giedrius/dev/thanos/pkg/store/bucket.go:1329
#	0xa89723	golang.org/x/sync/errgroup.(*Group).Go.func1+0x63						/home/giedrius/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75

46 @ 0x43bb36 0x434497 0x469ae9 0x4e7bd2 0x4e8fb9 0x4e8fa7 0x58e429 0x5a1ea5 0x568ebf 0x56968c 0x19fbfaf 0x19f033f 0x19ef79e 0x46f981
#	0x469ae8	internal/poll.runtime_pollWait+0x88			/usr/lib/go-1.20/src/runtime/netpoll.go:306
#	0x4e7bd1	internal/poll.(*pollDesc).wait+0x31			/usr/lib/go-1.20/src/internal/poll/fd_poll_runtime.go:84
#	0x4e8fb8	internal/poll.(*pollDesc).waitRead+0x298		/usr/lib/go-1.20/src/internal/poll/fd_poll_runtime.go:89
#	0x4e8fa6	internal/poll.(*FD).Read+0x286				/usr/lib/go-1.20/src/internal/poll/fd_unix.go:167
#	0x58e428	net.(*netFD).Read+0x28					/usr/lib/go-1.20/src/net/fd_posix.go:55
#	0x5a1ea4	net.(*conn).Read+0x44					/usr/lib/go-1.20/src/net/net.go:183
#	0x568ebe	bufio.(*Reader).fill+0xfe				/usr/lib/go-1.20/src/bufio/bufio.go:106
#	0x56968b	bufio.(*Reader).ReadByte+0x2b				/usr/lib/go-1.20/src/bufio/bufio.go:265
#	0x19fbfae	github.com/redis/rueidis.readNextMessage+0x8e		/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/resp.go:255
#	0x19f033e	github.com/redis/rueidis.(*pipe)._backgroundRead+0x1be	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:428
#	0x19ef79d	github.com/redis/rueidis.(*pipe)._background+0x7d	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:298



60 @ 0x43bb36 0x40618e 0x405d3d 0x19f0cfb 0x19ef79e 0x46f981
#	0x19f0cfa	github.com/redis/rueidis.(*pipe)._backgroundRead+0xb7a	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:526
#	0x19ef79d	github.com/redis/rueidis.(*pipe)._background+0x7d	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:298

60 @ 0x43bb36 0x406f9d 0x406a98 0x19f3908 0x19f1a1f 0x46f981
#	0x19f3907	github.com/redis/rueidis.(*pipe).Do+0x567			/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:808
#	0x19f1a1e	github.com/redis/rueidis.(*pipe).backgroundPing.func1+0x5e	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:549

56 @ 0x43bb36 0x406f9d 0x406a98 0x19f3908 0x19eb1fa 0x19eac33 0x19d8f4b 0x19d8d9b 0x1a0e01c 0x1a26cec 0x1ae615f 0x1ae5b54 0xa89724 0x46f981
#	0x19f3907	github.com/redis/rueidis.(*pipe).Do+0x567						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:808
#	0x19eb1f9	github.com/redis/rueidis.(*mux).pipeline+0xd9						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:228
#	0x19eac32	github.com/redis/rueidis.(*mux).Do+0xd2							/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:185
#	0x19d8f4a	github.com/redis/rueidis.(*clusterClient).do+0xea					/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:357
#	0x19d8d9a	github.com/redis/rueidis.(*clusterClient).Do+0x7a					/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:345
#	0x1a0e01b	github.com/thanos-io/thanos/pkg/cacheutil.(*RedisClient).SetAsync+0x21b			/home/giedrius/dev/thanos/pkg/cacheutil/redis_client.go:239
#	0x1a26ceb	github.com/thanos-io/thanos/pkg/store/cache.(*RemoteIndexCache).StoreSeries+0x18b	/home/giedrius/dev/thanos/pkg/store/cache/memcached.go:163
#	0x1ae615e	github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).loadSeries+0x51e		/home/giedrius/dev/thanos/pkg/store/bucket.go:2963
#	0x1ae5b53	github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).PreloadSeries.func2+0x93	/home/giedrius/dev/thanos/pkg/store/bucket.go:2913
#	0xa89723	golang.org/x/sync/errgroup.(*Group).Go.func1+0x63					/home/giedrius/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75

82 @ 0x43bb36 0x44be1e 0x19f16e5 0x46f981
#	0x19f16e4	github.com/redis/rueidis.(*pipe).backgroundPing+0x104	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:541

81 @ 0x43bb36 0x406f9d 0x406ad8 0x1312d56 0x46f981
#	0x1312d55	github.com/minio/minio-go/v7.(*Client).GetObject.func1+0x1d5	/home/giedrius/go/pkg/mod/github.com/minio/minio-go/v7@v7.0.45/api-get-object.go:78

76 @ 0x43bb36 0x406f9d 0x406a98 0x19f3908 0x19eb1fa 0x19eac33 0x19d8f4b 0x19d8d9b 0x1a0e01c 0x1a25c70 0x1ae48a3 0xa89724 0x46f981
#	0x19f3907	github.com/redis/rueidis.(*pipe).Do+0x567						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:808
#	0x19eb1f9	github.com/redis/rueidis.(*mux).pipeline+0xd9						/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:228
#	0x19eac32	github.com/redis/rueidis.(*mux).Do+0xd2							/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/mux.go:185
#	0x19d8f4a	github.com/redis/rueidis.(*clusterClient).do+0xea					/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:357
#	0x19d8d9a	github.com/redis/rueidis.(*clusterClient).Do+0x7a					/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/cluster.go:345
#	0x1a0e01b	github.com/thanos-io/thanos/pkg/cacheutil.(*RedisClient).SetAsync+0x21b			/home/giedrius/dev/thanos/pkg/cacheutil/redis_client.go:239
#	0x1a25c6f	github.com/thanos-io/thanos/pkg/store/cache.(*RemoteIndexCache).StorePostings+0x1ef	/home/giedrius/dev/thanos/pkg/store/cache/memcached.go:82
#	0x1ae48a2	github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).fetchPostings.func3+0x642	/home/giedrius/dev/thanos/pkg/store/bucket.go:2776
#	0xa89723	golang.org/x/sync/errgroup.(*Group).Go.func1+0x63					/home/giedrius/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75

187 @ 0x43bb36 0x406f9d 0x406a98 0x19f4a25 0x46f981
#	0x19f4a24	github.com/redis/rueidis.(*pipe).DoMulti.func2+0x44	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:934

106 @ 0x43bb36 0x46b92c 0x46b90c 0x4786ac 0x19fcbb7 0x19efeaa 0x19efc05 0x46f981
#	0x46b90b	sync.runtime_notifyListWait+0x12b			/usr/lib/go-1.20/src/runtime/sema.go:527
#	0x4786ab	sync.(*Cond).Wait+0x8b					/usr/lib/go-1.20/src/sync/cond.go:70
#	0x19fcbb6	github.com/redis/rueidis.(*ring).WaitForWrite+0x96	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/ring.go:111
#	0x19efea9	github.com/redis/rueidis.(*pipe)._backgroundWrite+0x269	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:376
#	0x19efc04	github.com/redis/rueidis.(*pipe)._background.func1+0x24	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:294

Most notably in our setup we are passing context.Background() as the context to the Set commands:

https://github.com/thanos-io/thanos/blob/2606855c922285649d46ef472b02c6ea20e31fe2/pkg/cacheutil/redis_client.go#L252

But, we set ConnWriteTimeout to 3 seconds. Inside of the rueidis client I can find SetDeadline so that should set both write/read timeouts?

All in all, it looks like the client depends on the context having a deadline? 🤔 is this correct?

@rueian
Copy link
Collaborator

rueian commented Jul 31, 2023

Hi @GiedriusS,

When rueidis operates in pipelining mode, it actually doesn't use SetDeatline but uses periodic PINGs to detect timeouts:

rueidis/pipe.go

Lines 556 to 560 in d6ca19e

tm := time.NewTimer(p.timeout)
go func() { ch <- p.Do(context.Background(), cmds.PingCmd).NonRedisError() }()
select {
case <-tm.C:
err = context.DeadlineExceeded

where the p.timeout is the ConnWriteTimeout. Once a PING fails to return in time, the connection will be closed.

Though I am not sure where you were getting stuck now, the line:

#	0x19f0cfa	github.com/redis/rueidis.(*pipe)._backgroundRead+0xb7a	/home/giedrius/go/pkg/mod/github.com/redis/rueidis@v1.0.10-go1.18/pipe.go:526

looks the most suspicious to me. I will do more investigation on it.

@rueian
Copy link
Collaborator

rueian commented Jul 31, 2023

Hi @GiedriusS,

Just noticed that the suspicious line is outdated. Could you try a newer version?

@GiedriusS
Copy link
Author

Let me try updating tomorrow and update you if this reoccurs.

@rueian
Copy link
Collaborator

rueian commented Aug 5, 2023

Hi @GiedriusS,

I hope you're doing well. Did the issue reoccur again?

After revisiting the old code and the suspicious line you provided, I believe the issue was already fixed by the PR #291, where I changed the previous timeout handling of DoMultit from

	resp := resultsp.Get(len(multi), len(multi)) // get a resp container
	...
abort:
	go func(i int) {
		for ; i < len(resp.s); i++ { // the resp container might already be recycled, its length has changed
			<-ch
		}
	}(i)
	err := newErrResult(ctx.Err())
	for ; i < len(resp.s); i++ {
		resp.s[i] = err
	}
	return resp // return the resp container to the upper layer, and the upper layer will recycle it

to this

	resp := resultsp.Get(len(multi), len(multi)) // get a resp container
	...
abort:
	go func(i int, resp *redisresults) {
		<-ch
		resultsp.Put(resp) // recycle the old resp container
	}(i, resp)
	resp = resultsp.Get(len(multi), len(multi)) // get a new resp container
	err := newErrResult(ctx.Err())
	for ; i < len(resp.s); i++ {
		resp.s[i] = err
	}
	return resp

With this change, the cleanup goroutine doesn't depends on len(resp.s) anymore.

GiedriusS added a commit to thanos-io/thanos that referenced this issue Aug 7, 2023
Update client to fix redis/rueidis#322.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
GiedriusS added a commit to thanos-io/thanos that referenced this issue Aug 7, 2023
* go.mod: update Redis client

Update client to fix redis/rueidis#322.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* *: run go mod tidy

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS
Copy link
Author

No, it helped. I also updated the client upstream in thanos-io/thanos#6589. 🙇 thank you for your help

GiedriusS added a commit to vinted/thanos that referenced this issue Aug 10, 2023
* go.mod: update Redis client

Update client to fix redis/rueidis#322.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* *: run go mod tidy

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants