Creates a custom memcached client to reuse memory #285

cyriltovena · 2021-09-29T13:59:54Z

The current memcached client always throw away the buffer which creates a lot of allocations.

It would be great to have a reusable buffer, although this might be difficult since there's a lot of cache interface used along the way.

ortuman · 2021-10-06T07:56:48Z

Just to clarify, by current memcached client you refer to github.com/bradfitz/gomemcache module or some internal interface?

cyriltovena · 2021-10-06T09:52:34Z

I refer to this one yes. but it's hidden down below into mimir.

ortuman · 2021-10-06T10:52:35Z

Just took a look and seems like the memcached client connection already has a RW buffer which is being reused on every operation. Maybe be I'm overseeing something or the buffer drop happens elsewhere?

cyriltovena · 2021-10-06T11:20:44Z

It's more items data sent back to users of the library take a profile of the store gateway, it should be one of the most alloc. The author left a todo if I remember correctly too.

ortuman · 2021-10-06T11:38:50Z

gotcha! 👍

cyriltovena · 2021-10-06T11:59:53Z

This one https://github.com/bradfitz/gomemcache/blob/a41fca850d0b6f392931a78cbae438803ea0b886/memcache/memcache.go#L509

Would be awesome to have a way to reuse them when the item is not needed anymore.

ortuman · 2021-10-06T13:24:44Z

Would be awesome to have a way to reuse them when the item is not needed anymore.

Definitely agree. I think this is a point of improvement that could have a quite positive impact.

Maybe we can simply add a Close() error method to Item type to be explicitly called by the invoker. When called we put the buffer back into a shared pool. This has the downside of having to manually invoke this method after every Get or GetMulti operation, but in the worst case the GC will eventually reclaim the buffer instead of putting it back into the pool. Not too bad.

WDYT?

cyriltovena · 2021-10-06T16:22:37Z

That's the spirit I think the hardest part is the usage.

If you go for it Prometheus code base as a pool that is bucketed and should work nicely. May be buckets needs to be configurable

ortuman · 2021-10-07T13:20:04Z

When you get a chance, take a look at these changes:

https://github.com/ortuman/gomemcache/commit/8bb94a59999fc52836308966af8ec058d3d706f0

Also I've done some benchmarking and seen a slightly better result when reusing buffers. I guess the true value of this solution will be more noticeable in the long term, by minimizing the number of GC pauses..

➜  memcache git:(master) ✗ go test -bench=BenchmarkItemValuePool -benchtime=20s

goos: darwin
goarch: arm64
pkg: github.com/bradfitz/gomemcache/memcache
BenchmarkItemValuePool/benchmark-no-buffer-reuse-8         	  492558	     45478 ns/op
BenchmarkItemValuePool/benchmark-buffer-reuse-8            	  549673	     44162 ns/op
PASS
ok  	github.com/bradfitz/gomemcache/memcache	54.019s

cyriltovena · 2021-10-11T08:08:04Z

The benchmark is wrong they both make the slice.

I think you should:

use a https://github.com/prometheus/prometheus/blob/main/pkg/pool/pool.go
allocate inside the Get and not outside. ( The one allocating should be the one who knows the size)
The pool can be global to all client IMO.

ortuman · 2021-10-11T10:17:37Z

The benchmark is wrong they both make the slice.

The first approach does not implicitly invoke Close method, whereas the second one does. This causes no buffer object to be reused in the first benchmark case.

Anyway, I like the idea of using a bucket bufffer pool instead. This would help to reduce fragmentation, and thus reducing GC interventions.

Take a look at ortuman/gomemcache@71beaed139c365c02400465e32bf80614da6247a and tell me how you feel about it.

Also, here's a table of preliminary benchmark results:

goos: darwin
goarch: arm64
pkg: github.com/bradfitz/gomemcache/memcache
BenchmarkItemValuePool
BenchmarkItemValuePool/benchmark-no-buffer-reuse
BenchmarkItemValuePool/benchmark-no-buffer-reuse-8         	   22761	     45378 ns/op
BenchmarkItemValuePool/benchmark-buffer-reuse
BenchmarkItemValuePool/benchmark-buffer-reuse-8            	   27661	     39797 ns/op
PASS

cyriltovena · 2021-10-11T12:13:01Z

It's better but I don't think you need a bytes.Buffer you can use the slice and re-sliced it to length required.

ortuman · 2021-10-11T14:24:51Z

Oh! You're 100% right. Just changed it @ https://github.com/ortuman/gomemcache/commit/ab4be5bd29b01f1f7687db3b4333890a514ac6a2

pracucci · 2021-10-11T15:21:17Z

Thanks @ortuman for working on this! Working on your personal repo is fine for experimenting, but if we'll end up vendoring this in Mimir then please fork gomemcached under the grafana organization and open a PR with your changes so we can do a proper review. Thanks!

ortuman · 2021-10-13T07:46:45Z

Sure thing! As you mentioned I just wanted to experiment on my own repo before doing a formal fork. If this goes on I'll follow your suggestion. Thanks!

bboreham · 2021-10-27T09:36:32Z

Note Mimir currently uses https://github.com/themihai/gomemcache fork, in order to add a circuit-breaker (only in query-frontend right now).

colega · 2021-10-28T09:21:03Z

👋 Hey was passing by, while investigating a different issue on dev, I noticed on a profile that a lot of time is spent in the scanGetResponseLine function, and it looks unoptimised in several different ways.

The most obvious one is that we could use

n, err := fmt.Fscanf(bytes.NewReader(line), pattern, dest...)

Instead of

n, err := fmt.Sscanf(string(line), pattern, dest...)

To avoid copying every single line just to scan it.

Apart from that, I'd consider implementing a custom parser for that line format instead of relying on fmt.Scanf which seems to underperform here (I'm not familiar with the memcached response format, but do we really need to decode unicode here? It seems that a lot of time is spent just checking whether the next rune is a space)

colega · 2022-01-11T08:08:45Z

Just for reference, it would be great to have the GATS command implemented too: #718

colega · 2022-02-03T16:04:24Z

Some more data, was checking continous profiling on dev and realised that store-gateway spends 12% of its time just parsing the memcached response strings:

bboreham · 2022-02-15T17:42:26Z

To avoid copying every single line just to scan it.

I wanted to check the size of this problem, so I grabbed another profile.

Total: 6.89TB
ROUTINE ======================== github.com/bradfitz/gomemcache/memcache.scanGetResponseLine in bradfitz/gomemcache/memcache/memcache.go
   52.24GB   104.95GB (flat, cum)  1.49% of Total
         .          .    500:   }
         .          .    501:}
         .          .    502:
         .          .    503:// scanGetResponseLine populates it and returns the declared size of the item.
         .          .    504:// It does not read the bytes of the item.
    6.08GB     6.08GB    505:func scanGetResponseLine(line []byte, it *Item) (size int, err error) {
         .          .    506:   pattern := "VALUE %s %d %d %d\r\n"
         .          .    507:   dest := []interface{}{&it.Key, &it.Flags, &size, &it.casid}
         .          .    508:   if bytes.Count(line, space) == 3 {
         .          .    509:           pattern = "VALUE %s %d %d\r\n"
         .          .    510:           dest = dest[:3]
         .          .    511:   }
   46.16GB    98.87GB    512:   n, err := fmt.Sscanf(string(line), pattern, dest...)
         .          .    513:   if err != nil || n != len(dest) {
  512.01kB   512.01kB    514:           return -1, fmt.Errorf("memcache: unexpected line in get response: %q", line)
         .          .    515:   }
         .          .    516:   return size, nil
         .          .    517:}

So copying the line is ~0.7% of all allocations; probably not the big win.

Agreed that a custom parser can perform better.

bboreham · 2022-02-16T11:37:45Z

I coded scanGetResponseLine() using IndexByte() instead of SScanf().

bradfitz/gomemcache@master...bboreham:faster-scanline

name                   old time/op    new time/op    delta
ScanGetResponseLine-4    4.06µs ± 1%    0.13µs ± 4%  -96.89%  (p=0.008 n=5+5)

name                   old alloc/op   new alloc/op   delta
ScanGetResponseLine-4      128B ± 0%       24B ± 0%  -81.25%  (p=0.008 n=5+5)

name                   old allocs/op  new allocs/op  delta
ScanGetResponseLine-4      7.00 ± 0%      1.00 ± 0%  -85.71%  (p=0.008 n=5+5)

(Note I still copy the line to a string, because we need a string to put in Item.Key. I'm copying more than is strictly needed but it simplifies the code)

colega · 2022-02-21T14:23:55Z

I coded scanGetResponseLine() using IndexByte() instead of SScanf().

Awesome!

Since we're using a fork, can you send a PR to the fork? However... The fork hasn't been updated in 4 years, so maybe it's just easier for you to bring in the changes required.

bboreham · 2022-08-12T14:28:37Z

<fx: time passes...>

I have created https://github.com/grafana/gomemcache/ with the above change.

56quarters · 2023-02-01T20:21:53Z

We're now pooling memory in Mimir when reading from the chunks cache, passing a request-scoped pool from the CachingBucket to underlying cache clients: c6a4d93

colega mentioned this issue Jan 11, 2022

Increase memcached index caching TTL #718

Merged

3 tasks

This was referenced Aug 15, 2022

Memcached client: optimize response parsing #2720

Merged

Faster scanning of memcache responses bradfitz/gomemcache#154

Open

This was referenced Aug 21, 2022

Add optional buffer pool for item values grafana/gomemcache#5

Merged

Use sync pool in gomemcache #2831

Closed

bboreham mentioned this issue Dec 21, 2022

store-gateway: Use a pooling memory allocator for loading chunks from memcached #3772

Closed

7 tasks

56quarters closed this as completed Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creates a custom memcached client to reuse memory #285

Creates a custom memcached client to reuse memory #285

cyriltovena commented Sep 29, 2021

ortuman commented Oct 6, 2021 •

edited

Loading

cyriltovena commented Oct 6, 2021

ortuman commented Oct 6, 2021

cyriltovena commented Oct 6, 2021

ortuman commented Oct 6, 2021

cyriltovena commented Oct 6, 2021

ortuman commented Oct 6, 2021

cyriltovena commented Oct 6, 2021

ortuman commented Oct 7, 2021

cyriltovena commented Oct 11, 2021

ortuman commented Oct 11, 2021

cyriltovena commented Oct 11, 2021

ortuman commented Oct 11, 2021

pracucci commented Oct 11, 2021

ortuman commented Oct 13, 2021

bboreham commented Oct 27, 2021

colega commented Oct 28, 2021

colega commented Jan 11, 2022

colega commented Feb 3, 2022

bboreham commented Feb 15, 2022

bboreham commented Feb 16, 2022 •

edited

Loading

colega commented Feb 21, 2022 •

edited

Loading

bboreham commented Aug 12, 2022

56quarters commented Feb 1, 2023

Creates a custom memcached client to reuse memory #285

Creates a custom memcached client to reuse memory #285

Comments

cyriltovena commented Sep 29, 2021

ortuman commented Oct 6, 2021 • edited Loading

cyriltovena commented Oct 6, 2021

ortuman commented Oct 6, 2021

cyriltovena commented Oct 6, 2021

ortuman commented Oct 6, 2021

cyriltovena commented Oct 6, 2021

ortuman commented Oct 6, 2021

cyriltovena commented Oct 6, 2021

ortuman commented Oct 7, 2021

cyriltovena commented Oct 11, 2021

ortuman commented Oct 11, 2021

cyriltovena commented Oct 11, 2021

ortuman commented Oct 11, 2021

pracucci commented Oct 11, 2021

ortuman commented Oct 13, 2021

bboreham commented Oct 27, 2021

colega commented Oct 28, 2021

colega commented Jan 11, 2022

colega commented Feb 3, 2022

bboreham commented Feb 15, 2022

bboreham commented Feb 16, 2022 • edited Loading

colega commented Feb 21, 2022 • edited Loading

bboreham commented Aug 12, 2022

56quarters commented Feb 1, 2023

ortuman commented Oct 6, 2021 •

edited

Loading

bboreham commented Feb 16, 2022 •

edited

Loading

colega commented Feb 21, 2022 •

edited

Loading