The cache has been hit, but the time taken to read the key will exceed 10ms #89

xiluoxi · 2022-08-29T04:24:31Z

Usually, this occurs after the service has been running for a long time.I guess it might be after the cache is full.

xiluoxi · 2022-08-29T07:59:59Z

This will be better when I restart the service.

rueian · 2022-08-29T10:32:50Z

@xiluoxi, thank you for reporting this. I will look into that as soon as possible.

rueian · 2022-08-29T17:01:46Z

Hi @xiluoxi, please try the new v0.0.74. The memory leak in the LRU cache should be fixed.

xiluoxi · 2022-09-01T10:06:55Z

@rueian After the service runs for a period of time, the memory usage will still rise.

xiluoxi · 2022-09-01T10:11:58Z

In the test, the memory increased abnormally, from 4.1% to 21%.Next, the performance of rueidis improved

rueian · 2022-09-01T10:22:22Z

In the test, the memory increased abnormally, from 4.1% to 21%.Next, the performance of rueidis improved

Hi @xiluoxi, just want to clarify. Do you expect that it keeps using 4.1% of memory? How much time did it take to occupying 21% of memory?

Next, the performance of rueidis improved

Do you mean that though there is still memory leak issue, but the latency issue is solved?

xiluoxi · 2022-09-02T02:52:58Z

The latency issue still exists。

xiluoxi · 2022-09-02T03:07:33Z

At the same time, a new problem has emerged. In a highly concurrent write scenario，the memory usage will increase rapidly when redis fails or processes slowly.

xiluoxi · 2022-09-02T03:09:31Z

The latency issue still exists。

In a highly concurrent read scenario,The latency > 100ms

xiluoxi · 2022-09-02T03:18:29Z

There are two rueidis clients connect to different redis servers in my service, one for reading and one for writing.I'm not sure they can influence each other.

rueian · 2022-09-02T03:35:12Z

They should not affect each other. What is the relationship between these two Redis servers? Are they Redis Cluster?

BTW, Are you using DoCache or DoMultiCache to send commands?

xiluoxi · 2022-09-02T04:56:44Z

They should not affect each other. What is the relationship between these two Redis servers? Are they Redis Cluster?

BTW, Are you using DoCache or DoMultiCache to send commands?

No Redis cluster, and they are on two different servers. using DoCache

rueian · 2022-09-04T07:03:07Z

Hi @xiluoxi,

At the same time, a new problem has emerged. In a highly concurrent write scenario, the memory usage will increase rapidly when redis fails or processes slowly.

This may be caused by the fact that, currently, the command builder does not reuse the command buffer of previously failed commands due to some racing problems. This may take some time to improve.

In a highly concurrent read scenario,The latency > 100ms

I have done some tests on Google Cloud but I am still not able to simulate your situation.

I created two instances in the same zone of Google Cloud, their spec:

n2d-highcpu-4 (4core, 4G ram, AMD Rome, ip: 10.140.0.52)
n2-highcpu-8 (8core, 8G ram, Intel Cascade Lake, ip: 10.140.0.51)

The first machine was running Redis 7.0.4 + Prometheus + Grafana
The second machine was running the following program with rueidis v0.0.75 and compiled with go 1.19:

package main

import (
	"context"
	"fmt"
	"math/rand"
	"net/http"
	"strconv"
	"time"

	"github.com/go-redis/redis/v9"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"github.com/rueian/rueidis"
)

func prepData(n int) []string {
	data := make([]string, n)
	for i := range data {
		data[i] = strconv.Itoa(i)
	}
	rand.Shuffle(len(data), func(i, j int) { data[i], data[j] = data[j], data[i] })
	return data
}

const (
	keyCount   = 1000000 
	readers    = 8
	writers    = 2
	useGoRedis = false // please change it
	cacheSize  = 512 * (1 << 20) // 512 MB
	addr       = "10.140.0.52:6379" // please change it
)

func main() {
	rand.Seed(time.Now().UnixNano())
	bucket := []float64{250, 500, 750, 1000, 2500, 5000, 7500, 10000, 25000, 50000, 75000, 100000, 250000, 500000, 750000, 1000000}

	wl := promauto.NewHistogram(prometheus.HistogramOpts{Name: "micro_write_latency", Buckets: bucket})
	rl := promauto.NewHistogram(prometheus.HistogramOpts{Name: "micro_read_latency", Buckets: bucket})

	go func() {
		http.Handle("/metrics", promhttp.Handler())
		http.ListenAndServe(":2112", nil)
	}()

	rc, err := rueidis.NewClient(rueidis.ClientOption{
		InitAddress:       []string{addr},
		CacheSizeEachConn: cacheSize,
	})
	if err != nil {
		panic(err)
	}

	gc := redis.NewUniversalClient(&redis.UniversalOptions{
		Addrs: []string{addr},
	})

	ctx := context.Background()

	goredisWrite := func(key, data string) error {
		return gc.Set(ctx, key, data, 0).Err()
	}
	goredisRead := func(key string) error {
		return gc.Get(ctx, key).Err()
	}
	rueidisWrite := func(key, data string) error {
		return rc.Do(ctx, rc.B().Set().Key(key).Value(data).Build()).Error()
	}
	rueidisCache := func(key string) error {
		return rc.DoCache(ctx, rc.B().Get().Key(key).Cache(), time.Hour).Error()
	}

	var wfn func(key, data string) error
	var rfn func(key string) error

	if useGoRedis {
		wfn = goredisWrite
		rfn = goredisRead
	} else {
		wfn = rueidisWrite
		rfn = rueidisCache
	}

	writeFn := func(keys, data []string) {
		for i, k := range keys {
			ts := time.Now()
			err := wfn(k, data[i])
			wl.Observe(float64(time.Since(ts).Microseconds()))
			if err != nil {
				panic(err)
			}
		}
	}
	readFn := func(keys []string) {
		for _, k := range keys {
			ts := time.Now()
			err := rfn(k)
			rl.Observe(float64(time.Since(ts).Microseconds()))
			if err != nil {
				panic(err)
			}
		}
	}

	{
		keys := prepData(keyCount)
		data := prepData(keyCount)
		commands := make(rueidis.Commands, len(keys))
		for i := range commands {
			commands[i] = rc.B().Set().Key(keys[i]).Value(data[i]).Build()
		}
		ts := time.Now()
		for _, resp := range rc.DoMulti(ctx, commands...) {
			if err := resp.Error(); err != nil {
				panic(err)
			}
		}
		fmt.Println("ready", time.Since(ts))
	}

	if useGoRedis {
		rc.Close()
	} else {
		gc.Close()
	}

	for i := 0; i < writers; i++ {
		go func() {
			keys := prepData(keyCount)
			data := prepData(keyCount)
			for {
				writeFn(keys, data)
			}
		}()
	}
	for i := 0; i < readers; i++ {
		go func() {
			keys := prepData(keyCount)
			for {
				readFn(keys)
			}
		}()
	}
	time.Sleep(time.Hour)
}

This program records metrics of 8 concurrent readers and 2 concurrent writers that keep reading and writing 1000000 keys.

The result of goredis v9:

The result of rueidis v0.0.75 with an additional 512MB client-side cache:

While indeed rueidis used more memory for client-side caching, it achieved 14x read throughput over goredis (887874/61978) in this case with P99 latencies <0.5ms and no memory leak.

Would you mind sharing more details about your machine/network spec and traffic patterns like concurrency, read/write ratio, cache-hit ratio as well as avg key/value size? So that I can help you and find the causes of your problem.

xiluoxi · 2022-09-06T01:46:48Z

You can try this case, read and write the same key with high concurrency, mainly reading.

rueian · 2022-09-07T15:40:40Z

You can try this case, read and write the same key with high concurrency, mainly reading.

Hi @xiluoxi, the previous simulation I posted is reading and writing the set of keys with high concurrency, mainly reading.
Would you mind sharing more details about your machine spec? For example how many CPUs do you have on one machine? And it will be also helpful to know your key/value size respectively.

rueian · 2022-09-10T16:11:56Z

Hi @xiluoxi,

There are 3 new fields of the rueidis.ClientOption that can affect performance introduced in the v0.0.76, which are ReadBufferEachConn, WriteBufferEachConn, and PipelineMultiplex.

Increasing the ReadBufferEachConn and WriteBufferEachConn will require more memory and save more TCP system calls.
Increasing the PipelineMultiplex will use more TCP connections to pipeline commands to one redis node. This will use more CPU but could lower the latencies and cache contention.

You can try to increase or decrease them to see how they will affect performance and find better values for your case.

This is the result of the same code and same machines of the previous simulation but with the v0.0.76:

Now it achieves 28x read throughput and latencies are also improved but with more gorotines used.

xiluoxi · 2022-09-13T01:53:35Z

Hi @xiluoxi,

There are 3 new fields of the rueidis.ClientOption that can affect performance introduced in the v0.0.76, which are ReadBufferEachConn, WriteBufferEachConn, and PipelineMultiplex.

Increasing the ReadBufferEachConn and WriteBufferEachConn will require more memory and save more TCP system calls. Increasing the PipelineMultiplex will more TCP connections to pipeline commands to one redis node. This will use more CPU but could lower the latencies and cache contention.

You can try to increase or decrease them to see how they will affect performance and find better values for your case.

This is the result of the same code and same machines of the previous simulation but with the v0.0.76: Now it achieves 28x read throughput and latencies are also improved but with more gorotines used.

Thanks,I will try.

rueian closed this as completed Oct 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The cache has been hit, but the time taken to read the key will exceed 10ms #89

The cache has been hit, but the time taken to read the key will exceed 10ms #89

xiluoxi commented Aug 29, 2022

xiluoxi commented Aug 29, 2022

rueian commented Aug 29, 2022

rueian commented Aug 29, 2022

xiluoxi commented Sep 1, 2022

xiluoxi commented Sep 1, 2022

rueian commented Sep 1, 2022

xiluoxi commented Sep 2, 2022

xiluoxi commented Sep 2, 2022

xiluoxi commented Sep 2, 2022

xiluoxi commented Sep 2, 2022

rueian commented Sep 2, 2022 •

edited

Loading

xiluoxi commented Sep 2, 2022

rueian commented Sep 4, 2022 •

edited

Loading

xiluoxi commented Sep 6, 2022

rueian commented Sep 7, 2022

rueian commented Sep 10, 2022 •

edited

Loading

xiluoxi commented Sep 13, 2022

The cache has been hit, but the time taken to read the key will exceed 10ms #89

The cache has been hit, but the time taken to read the key will exceed 10ms #89

Comments

xiluoxi commented Aug 29, 2022

xiluoxi commented Aug 29, 2022

rueian commented Aug 29, 2022

rueian commented Aug 29, 2022

xiluoxi commented Sep 1, 2022

xiluoxi commented Sep 1, 2022

rueian commented Sep 1, 2022

xiluoxi commented Sep 2, 2022

xiluoxi commented Sep 2, 2022

xiluoxi commented Sep 2, 2022

xiluoxi commented Sep 2, 2022

rueian commented Sep 2, 2022 • edited Loading

xiluoxi commented Sep 2, 2022

rueian commented Sep 4, 2022 • edited Loading

xiluoxi commented Sep 6, 2022

rueian commented Sep 7, 2022

rueian commented Sep 10, 2022 • edited Loading

xiluoxi commented Sep 13, 2022

rueian commented Sep 2, 2022 •

edited

Loading

rueian commented Sep 4, 2022 •

edited

Loading

rueian commented Sep 10, 2022 •

edited

Loading