Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates a custom memcached client to reuse memory #285

Closed
cyriltovena opened this issue Sep 29, 2021 · 24 comments
Closed

Creates a custom memcached client to reuse memory #285

cyriltovena opened this issue Sep 29, 2021 · 24 comments

Comments

@cyriltovena
Copy link
Contributor

The current memcached client always throw away the buffer which creates a lot of allocations.

It would be great to have a reusable buffer, although this might be difficult since there's a lot of cache interface used along the way.

@ortuman
Copy link
Contributor

ortuman commented Oct 6, 2021

Just to clarify, by current memcached client you refer to github.com/bradfitz/gomemcache module or some internal interface?

@cyriltovena
Copy link
Contributor Author

I refer to this one yes. but it's hidden down below into mimir.

@ortuman
Copy link
Contributor

ortuman commented Oct 6, 2021

Just took a look and seems like the memcached client connection already has a RW buffer which is being reused on every operation. Maybe be I'm overseeing something or the buffer drop happens elsewhere?

@cyriltovena
Copy link
Contributor Author

It's more items data sent back to users of the library take a profile of the store gateway, it should be one of the most alloc. The author left a todo if I remember correctly too.

@ortuman
Copy link
Contributor

ortuman commented Oct 6, 2021

gotcha! 👍

@cyriltovena
Copy link
Contributor Author

This one https://github.com/bradfitz/gomemcache/blob/a41fca850d0b6f392931a78cbae438803ea0b886/memcache/memcache.go#L509

Would be awesome to have a way to reuse them when the item is not needed anymore.

@ortuman
Copy link
Contributor

ortuman commented Oct 6, 2021

Would be awesome to have a way to reuse them when the item is not needed anymore.

Definitely agree. I think this is a point of improvement that could have a quite positive impact.

Maybe we can simply add a Close() error method to Item type to be explicitly called by the invoker. When called we put the buffer back into a shared pool. This has the downside of having to manually invoke this method after every Get or GetMulti operation, but in the worst case the GC will eventually reclaim the buffer instead of putting it back into the pool. Not too bad.

WDYT?

@cyriltovena
Copy link
Contributor Author

That's the spirit I think the hardest part is the usage.

If you go for it Prometheus code base as a pool that is bucketed and should work nicely. May be buckets needs to be configurable

@ortuman
Copy link
Contributor

ortuman commented Oct 7, 2021

When you get a chance, take a look at these changes:

https://github.com/ortuman/gomemcache/commit/8bb94a59999fc52836308966af8ec058d3d706f0

Also I've done some benchmarking and seen a slightly better result when reusing buffers. I guess the true value of this solution will be more noticeable in the long term, by minimizing the number of GC pauses..

➜  memcache git:(master) ✗ go test -bench=BenchmarkItemValuePool -benchtime=20s

goos: darwin
goarch: arm64
pkg: github.com/bradfitz/gomemcache/memcache
BenchmarkItemValuePool/benchmark-no-buffer-reuse-8         	  492558	     45478 ns/op
BenchmarkItemValuePool/benchmark-buffer-reuse-8            	  549673	     44162 ns/op
PASS
ok  	github.com/bradfitz/gomemcache/memcache	54.019s

@cyriltovena
Copy link
Contributor Author

The benchmark is wrong they both make the slice.

I think you should:

@ortuman
Copy link
Contributor

ortuman commented Oct 11, 2021

 The benchmark is wrong they both make the slice.

The first approach does not implicitly invoke Close method, whereas the second one does. This causes no buffer object to be reused in the first benchmark case.

Anyway, I like the idea of using a bucket bufffer pool instead. This would help to reduce fragmentation, and thus reducing GC interventions.

Take a look at ortuman/gomemcache@71beaed139c365c02400465e32bf80614da6247a and tell me how you feel about it.

Also, here's a table of preliminary benchmark results:

goos: darwin
goarch: arm64
pkg: github.com/bradfitz/gomemcache/memcache
BenchmarkItemValuePool
BenchmarkItemValuePool/benchmark-no-buffer-reuse
BenchmarkItemValuePool/benchmark-no-buffer-reuse-8         	   22761	     45378 ns/op
BenchmarkItemValuePool/benchmark-buffer-reuse
BenchmarkItemValuePool/benchmark-buffer-reuse-8            	   27661	     39797 ns/op
PASS

@cyriltovena
Copy link
Contributor Author

It's better but I don't think you need a bytes.Buffer you can use the slice and re-sliced it to length required.

@ortuman
Copy link
Contributor

ortuman commented Oct 11, 2021

Oh! You're 100% right. Just changed it @ https://github.com/ortuman/gomemcache/commit/ab4be5bd29b01f1f7687db3b4333890a514ac6a2

@pracucci
Copy link
Collaborator

Thanks @ortuman for working on this! Working on your personal repo is fine for experimenting, but if we'll end up vendoring this in Mimir then please fork gomemcached under the grafana organization and open a PR with your changes so we can do a proper review. Thanks!

@ortuman
Copy link
Contributor

ortuman commented Oct 13, 2021

Sure thing! As you mentioned I just wanted to experiment on my own repo before doing a formal fork. If this goes on I'll follow your suggestion. Thanks!

@bboreham
Copy link
Contributor

Note Mimir currently uses https://github.com/themihai/gomemcache fork, in order to add a circuit-breaker (only in query-frontend right now).

@colega
Copy link
Contributor

colega commented Oct 28, 2021

👋 Hey was passing by, while investigating a different issue on dev, I noticed on a profile that a lot of time is spent in the scanGetResponseLine function, and it looks unoptimised in several different ways.

The most obvious one is that we could use

n, err := fmt.Fscanf(bytes.NewReader(line), pattern, dest...)

Instead of

n, err := fmt.Sscanf(string(line), pattern, dest...)

To avoid copying every single line just to scan it.

Apart from that, I'd consider implementing a custom parser for that line format instead of relying on fmt.Scanf which seems to underperform here (I'm not familiar with the memcached response format, but do we really need to decode unicode here? It seems that a lot of time is spent just checking whether the next rune is a space)

@colega
Copy link
Contributor

colega commented Jan 11, 2022

Just for reference, it would be great to have the GATS command implemented too: #718

@colega
Copy link
Contributor

colega commented Feb 3, 2022

Some more data, was checking continous profiling on dev and realised that store-gateway spends 12% of its time just parsing the memcached response strings:

flamegraph_visual_03_01_2022

@bboreham
Copy link
Contributor

To avoid copying every single line just to scan it.

I wanted to check the size of this problem, so I grabbed another profile.

Total: 6.89TB
ROUTINE ======================== github.com/bradfitz/gomemcache/memcache.scanGetResponseLine in bradfitz/gomemcache/memcache/memcache.go
   52.24GB   104.95GB (flat, cum)  1.49% of Total
         .          .    500:   }
         .          .    501:}
         .          .    502:
         .          .    503:// scanGetResponseLine populates it and returns the declared size of the item.
         .          .    504:// It does not read the bytes of the item.
    6.08GB     6.08GB    505:func scanGetResponseLine(line []byte, it *Item) (size int, err error) {
         .          .    506:   pattern := "VALUE %s %d %d %d\r\n"
         .          .    507:   dest := []interface{}{&it.Key, &it.Flags, &size, &it.casid}
         .          .    508:   if bytes.Count(line, space) == 3 {
         .          .    509:           pattern = "VALUE %s %d %d\r\n"
         .          .    510:           dest = dest[:3]
         .          .    511:   }
   46.16GB    98.87GB    512:   n, err := fmt.Sscanf(string(line), pattern, dest...)
         .          .    513:   if err != nil || n != len(dest) {
  512.01kB   512.01kB    514:           return -1, fmt.Errorf("memcache: unexpected line in get response: %q", line)
         .          .    515:   }
         .          .    516:   return size, nil
         .          .    517:}

So copying the line is ~0.7% of all allocations; probably not the big win.

Agreed that a custom parser can perform better.

@bboreham
Copy link
Contributor

bboreham commented Feb 16, 2022

I coded scanGetResponseLine() using IndexByte() instead of SScanf().

bradfitz/gomemcache@master...bboreham:faster-scanline

name                   old time/op    new time/op    delta
ScanGetResponseLine-4    4.06µs ± 1%    0.13µs ± 4%  -96.89%  (p=0.008 n=5+5)

name                   old alloc/op   new alloc/op   delta
ScanGetResponseLine-4      128B ± 0%       24B ± 0%  -81.25%  (p=0.008 n=5+5)

name                   old allocs/op  new allocs/op  delta
ScanGetResponseLine-4      7.00 ± 0%      1.00 ± 0%  -85.71%  (p=0.008 n=5+5)

(Note I still copy the line to a string, because we need a string to put in Item.Key. I'm copying more than is strictly needed but it simplifies the code)

@colega
Copy link
Contributor

colega commented Feb 21, 2022

I coded scanGetResponseLine() using IndexByte() instead of SScanf().

Awesome!

Since we're using a fork, can you send a PR to the fork? However... The fork hasn't been updated in 4 years, so maybe it's just easier for you to bring in the changes required.

@bboreham
Copy link
Contributor

<fx: time passes...>

I have created https://github.com/grafana/gomemcache/ with the above change.

@56quarters
Copy link
Contributor

We're now pooling memory in Mimir when reading from the chunks cache, passing a request-scoped pool from the CachingBucket to underlying cache clients: c6a4d93

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants