Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The cache has been hit, but the time taken to read the key will exceed 10ms #89

Closed
xiluoxi opened this issue Aug 29, 2022 · 17 comments
Closed

Comments

@xiluoxi
Copy link

xiluoxi commented Aug 29, 2022

Usually, this occurs after the service has been running for a long time.I guess it might be after the cache is full.

@xiluoxi
Copy link
Author

xiluoxi commented Aug 29, 2022

This will be better when I restart the service.

@rueian
Copy link
Collaborator

rueian commented Aug 29, 2022

@xiluoxi, thank you for reporting this. I will look into that as soon as possible.

@rueian
Copy link
Collaborator

rueian commented Aug 29, 2022

Hi @xiluoxi, please try the new v0.0.74. The memory leak in the LRU cache should be fixed.

@xiluoxi
Copy link
Author

xiluoxi commented Sep 1, 2022

@rueian After the service runs for a period of time, the memory usage will still rise.

@xiluoxi
Copy link
Author

xiluoxi commented Sep 1, 2022

In the test, the memory increased abnormally, from 4.1% to 21%.Next, the performance of rueidis improved

@rueian
Copy link
Collaborator

rueian commented Sep 1, 2022

In the test, the memory increased abnormally, from 4.1% to 21%.Next, the performance of rueidis improved

Hi @xiluoxi, just want to clarify. Do you expect that it keeps using 4.1% of memory? How much time did it take to occupying 21% of memory?

Next, the performance of rueidis improved

Do you mean that though there is still memory leak issue, but the latency issue is solved?

@xiluoxi
Copy link
Author

xiluoxi commented Sep 2, 2022

The latency issue still exists。

@xiluoxi
Copy link
Author

xiluoxi commented Sep 2, 2022

At the same time, a new problem has emerged. In a highly concurrent write scenario,the memory usage will increase rapidly when redis fails or processes slowly.

@xiluoxi
Copy link
Author

xiluoxi commented Sep 2, 2022

The latency issue still exists。

In a highly concurrent read scenario,The latency > 100ms

@xiluoxi
Copy link
Author

xiluoxi commented Sep 2, 2022

There are two rueidis clients connect to different redis servers in my service, one for reading and one for writing.I'm not sure they can influence each other.

@rueian
Copy link
Collaborator

rueian commented Sep 2, 2022

They should not affect each other. What is the relationship between these two Redis servers? Are they Redis Cluster?

BTW, Are you using DoCache or DoMultiCache to send commands?

@xiluoxi
Copy link
Author

xiluoxi commented Sep 2, 2022

They should not affect each other. What is the relationship between these two Redis servers? Are they Redis Cluster?

BTW, Are you using DoCache or DoMultiCache to send commands?

No Redis cluster, and they are on two different servers. using DoCache

@rueian
Copy link
Collaborator

rueian commented Sep 4, 2022

Hi @xiluoxi,

At the same time, a new problem has emerged. In a highly concurrent write scenario, the memory usage will increase rapidly when redis fails or processes slowly.

This may be caused by the fact that, currently, the command builder does not reuse the command buffer of previously failed commands due to some racing problems. This may take some time to improve.

In a highly concurrent read scenario,The latency > 100ms

I have done some tests on Google Cloud but I am still not able to simulate your situation.

I created two instances in the same zone of Google Cloud, their spec:

  1. n2d-highcpu-4 (4core, 4G ram, AMD Rome, ip: 10.140.0.52)
  2. n2-highcpu-8 (8core, 8G ram, Intel Cascade Lake, ip: 10.140.0.51)

The first machine was running Redis 7.0.4 + Prometheus + Grafana
The second machine was running the following program with rueidis v0.0.75 and compiled with go 1.19:

package main

import (
	"context"
	"fmt"
	"math/rand"
	"net/http"
	"strconv"
	"time"

	"github.com/go-redis/redis/v9"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"github.com/rueian/rueidis"
)

func prepData(n int) []string {
	data := make([]string, n)
	for i := range data {
		data[i] = strconv.Itoa(i)
	}
	rand.Shuffle(len(data), func(i, j int) { data[i], data[j] = data[j], data[i] })
	return data
}

const (
	keyCount   = 1000000 
	readers    = 8
	writers    = 2
	useGoRedis = false // please change it
	cacheSize  = 512 * (1 << 20) // 512 MB
	addr       = "10.140.0.52:6379" // please change it
)

func main() {
	rand.Seed(time.Now().UnixNano())
	bucket := []float64{250, 500, 750, 1000, 2500, 5000, 7500, 10000, 25000, 50000, 75000, 100000, 250000, 500000, 750000, 1000000}

	wl := promauto.NewHistogram(prometheus.HistogramOpts{Name: "micro_write_latency", Buckets: bucket})
	rl := promauto.NewHistogram(prometheus.HistogramOpts{Name: "micro_read_latency", Buckets: bucket})

	go func() {
		http.Handle("/metrics", promhttp.Handler())
		http.ListenAndServe(":2112", nil)
	}()

	rc, err := rueidis.NewClient(rueidis.ClientOption{
		InitAddress:       []string{addr},
		CacheSizeEachConn: cacheSize,
	})
	if err != nil {
		panic(err)
	}

	gc := redis.NewUniversalClient(&redis.UniversalOptions{
		Addrs: []string{addr},
	})

	ctx := context.Background()

	goredisWrite := func(key, data string) error {
		return gc.Set(ctx, key, data, 0).Err()
	}
	goredisRead := func(key string) error {
		return gc.Get(ctx, key).Err()
	}
	rueidisWrite := func(key, data string) error {
		return rc.Do(ctx, rc.B().Set().Key(key).Value(data).Build()).Error()
	}
	rueidisCache := func(key string) error {
		return rc.DoCache(ctx, rc.B().Get().Key(key).Cache(), time.Hour).Error()
	}

	var wfn func(key, data string) error
	var rfn func(key string) error

	if useGoRedis {
		wfn = goredisWrite
		rfn = goredisRead
	} else {
		wfn = rueidisWrite
		rfn = rueidisCache
	}

	writeFn := func(keys, data []string) {
		for i, k := range keys {
			ts := time.Now()
			err := wfn(k, data[i])
			wl.Observe(float64(time.Since(ts).Microseconds()))
			if err != nil {
				panic(err)
			}
		}
	}
	readFn := func(keys []string) {
		for _, k := range keys {
			ts := time.Now()
			err := rfn(k)
			rl.Observe(float64(time.Since(ts).Microseconds()))
			if err != nil {
				panic(err)
			}
		}
	}

	{
		keys := prepData(keyCount)
		data := prepData(keyCount)
		commands := make(rueidis.Commands, len(keys))
		for i := range commands {
			commands[i] = rc.B().Set().Key(keys[i]).Value(data[i]).Build()
		}
		ts := time.Now()
		for _, resp := range rc.DoMulti(ctx, commands...) {
			if err := resp.Error(); err != nil {
				panic(err)
			}
		}
		fmt.Println("ready", time.Since(ts))
	}

	if useGoRedis {
		rc.Close()
	} else {
		gc.Close()
	}

	for i := 0; i < writers; i++ {
		go func() {
			keys := prepData(keyCount)
			data := prepData(keyCount)
			for {
				writeFn(keys, data)
			}
		}()
	}
	for i := 0; i < readers; i++ {
		go func() {
			keys := prepData(keyCount)
			for {
				readFn(keys)
			}
		}()
	}
	time.Sleep(time.Hour)
}

This program records metrics of 8 concurrent readers and 2 concurrent writers that keep reading and writing 1000000 keys.

The result of goredis v9:
goredis-8-2

The result of rueidis v0.0.75 with an additional 512MB client-side cache:
rueidis-8-2

While indeed rueidis used more memory for client-side caching, it achieved 14x read throughput over goredis (887874/61978) in this case with P99 latencies <0.5ms and no memory leak.

Would you mind sharing more details about your machine/network spec and traffic patterns like concurrency, read/write ratio, cache-hit ratio as well as avg key/value size? So that I can help you and find the causes of your problem.

@xiluoxi
Copy link
Author

xiluoxi commented Sep 6, 2022

You can try this case, read and write the same key with high concurrency, mainly reading.

@rueian
Copy link
Collaborator

rueian commented Sep 7, 2022

You can try this case, read and write the same key with high concurrency, mainly reading.

Hi @xiluoxi, the previous simulation I posted is reading and writing the set of keys with high concurrency, mainly reading.
Would you mind sharing more details about your machine spec? For example how many CPUs do you have on one machine? And it will be also helpful to know your key/value size respectively.

@rueian
Copy link
Collaborator

rueian commented Sep 10, 2022

Hi @xiluoxi,

There are 3 new fields of the rueidis.ClientOption that can affect performance introduced in the v0.0.76, which are ReadBufferEachConn, WriteBufferEachConn, and PipelineMultiplex.

Increasing the ReadBufferEachConn and WriteBufferEachConn will require more memory and save more TCP system calls.
Increasing the PipelineMultiplex will use more TCP connections to pipeline commands to one redis node. This will use more CPU but could lower the latencies and cache contention.

You can try to increase or decrease them to see how they will affect performance and find better values for your case.

This is the result of the same code and same machines of the previous simulation but with the v0.0.76:
v0 0 76
Now it achieves 28x read throughput and latencies are also improved but with more gorotines used.

@xiluoxi
Copy link
Author

xiluoxi commented Sep 13, 2022

Hi @xiluoxi,

There are 3 new fields of the rueidis.ClientOption that can affect performance introduced in the v0.0.76, which are ReadBufferEachConn, WriteBufferEachConn, and PipelineMultiplex.

Increasing the ReadBufferEachConn and WriteBufferEachConn will require more memory and save more TCP system calls. Increasing the PipelineMultiplex will more TCP connections to pipeline commands to one redis node. This will use more CPU but could lower the latencies and cache contention.

You can try to increase or decrease them to see how they will affect performance and find better values for your case.

This is the result of the same code and same machines of the previous simulation but with the v0.0.76: v0 0 76 Now it achieves 28x read throughput and latencies are also improved but with more gorotines used.

Thanks,I will try.

@rueian rueian closed this as completed Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants