Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Page offset buffers not repooled #372

Closed
mdisibio opened this issue Oct 13, 2022 · 6 comments · Fixed by #398
Closed

Page offset buffers not repooled #372

mdisibio opened this issue Oct 13, 2022 · 6 comments · Fixed by #398
Assignees
Labels
question Further information is requested

Comments

@mdisibio
Copy link
Contributor

Related to memory profiling done in #370 , it looks like page offset buffers are attempted to be pooled, but it is not working as expected. Suspect this logic which swaps the slices.

Testing methodology
Added some simple instrumentation to bufferPool to track outstanding allocs via sum(gets) - sum(puts). Added a print statement here to print the before and after capacities when the offset buffer is swapped.

** Results **
The pageOffsetsBufferPool allocs much more than expected, several GB. The other pools have reasonable allocs (1K to ~30MB).

When obuf.data is swapped to a slice of different size, the new slice is 0-length. Unsure if this is specific to our schema or relevant, but wanted to mention.

prevCap := cap(obuf.data)
obuf.data = unsafecast.Uint32ToBytes(pageOffsets)
nowCap := cap(obuf.data)
if prevCap != nowCap {
    fmt.Println("offset buffer swapped", prevCap, nowCap)
}

prints:

offset buffer swapped 4194304 0
offset buffer swapped 2097152 0
offset buffer swapped 2097152 0
offset buffer swapped 2097152 0

** Solution**
Tried this simple fix, but it doesn't quite work. Our tests surface some data inconsistencies afterwards.

if obuf != nil {
    obuf = &buffer{
          pool: obuf.pool,
          data: unsafecast.Uint32ToBytes(pageOffsets),
    }
}

Hacking a little more does work, but I'm not familiar enough with this area to put a proper solution. This change does result in 50+% fewer allocs.

if obuf != nil {
    if cap(obuf.data) != cap(pageOffsets)*4 {
          obuf = &buffer{pool: obuf.pool}
    }
    obuf.data = unsafecast.Uint32ToBytes(pageOffsets)
}
@achille-roussel achille-roussel self-assigned this Oct 18, 2022
@achille-roussel achille-roussel added the question Further information is requested label Oct 18, 2022
@achille-roussel
Copy link
Contributor

Hello @mdisibio

I believe this issue might be isolated to Go 1.17 and the use of unsafe.Slice in unsafecast.BytesToUint32 which loses the capacity information.

A different fix could be this:

diff --git a/column.go b/column.go
index d511623..520ee77 100644
--- a/column.go
+++ b/column.go
@@ -636,7 +636,7 @@ func (c *Column) decodeDataPage(header DataPageHeader, numValues int, repetition
        if pageKind == ByteArray {
                obuf = buffers.get(4 * (numValues + 1))
                defer obuf.unref()
-               pageOffsets = unsafecast.BytesToUint32(obuf.data)
+               pageOffsets = unsafecast.BytesToUint32(obuf.data[:cap(obuf.data)])
        }

        values := pageType.NewValues(pageValues, pageOffsets)
@@ -647,7 +647,8 @@ func (c *Column) decodeDataPage(header DataPageHeader, numValues int, repetition
                vbuf.data = pageValues
        }
        if obuf != nil {
-               obuf.data = unsafecast.Uint32ToBytes(pageOffsets)
+               obuf.data = unsafecast.Uint32ToBytes(pageOffsets[:cap(pageOffsets)])
+               obuf.data = obuf.data[:4*len(pageOffsets)]
        }
        if err != nil {
                return nil, err

I would recommend upgrading to 1.18+ tho, the unsafecast package is written with generics then and does not suffer from the same issues.

And when Go 1.20 gets released we will need to have more serious discussions about dropping support for 1.17.

@mdisibio
Copy link
Contributor Author

Hi @achille-roussel afaik Tempo has been updated to 1.19, and I am running 1.19.1 locally. Tested the proposed lines above and didn't see an improvement. Happy to provide more info or try things out.

@achille-roussel
Copy link
Contributor

Thanks for confirming, sorry for making the wrong assumption here.

I'll dig further to understand why the capacity information gets lost.

Super happy to hear that you were able to do the upgrade to 1.19 as well!

@achille-roussel
Copy link
Contributor

I would be curious to know if using the Release/Retain functions introduced in #380 will modify the behavior you observed here!

@mdisibio
Copy link
Contributor Author

Hi @achille-roussel as discussed offline, the new Release/Retain functionality did not change the observed behavior. Spent some more effort in this area and was able to create a more formal code change. I think this also shows more clearly what I think is happening (page buffer being swapped and original lost), but I don't know the why.

	if obuf != nil {
		if unsafecast.PointerOf(obuf.data) != unsafecast.PointerOf(pageOffsets) {
			// Pageoffset buffer is being swapped.
			// Recreate obuf which allows the orignal to be unrefed and returned to the pool.
			obuf = &buffer{pool: obuf.pool, data: unsafecast.Uint32ToBytes(pageOffsets)}
		}
	}

This change yields significant memory alloc improvements on top of the benchmarks in Tempo #1818

name                                      old alloc/op   new alloc/op   delta
BackendBlockSearchTraces/noMatch-12          333MB ±11%      50MB ±17%  -84.97%  (p=0.000 n=10+10)
BackendBlockSearchTraces/partialMatch-12     605MB ± 2%     311MB ± 5%  -48.49%  (p=0.000 n=9+9)
BackendBlockSearchTraces/service.name-12    3.67MB ± 2%    1.51MB ± 3%  -58.73%  (p=0.000 n=10+10)

@mdisibio
Copy link
Contributor Author

mdisibio commented Nov 3, 2022

As discussed offline, the real root cause is that page offset buffers were being unnecessarily allocated for dictionary-encoded byteArray columns. Submitted #398.

Want to mention that #386 still yielded a minor improvement in cpu and may be worth merging (although was neutral on memory usage). Here is a benchmark comparison:

name                                      old time/op    new time/op    delta
BackendBlockSearchTraces/noMatch-12          204ms ± 6%     198ms ±31%    ~     (p=0.400 n=9+10)
BackendBlockSearchTraces/partialMatch-12     2.57s ± 2%     2.51s ± 2%  -2.38%  (p=0.002 n=9+8)
BackendBlockSearchTraces/service.name-12    1.57ms ±12%    1.46ms ±12%  -6.94%  (p=0.028 n=9+10)

name                                      old alloc/op   new alloc/op   delta
BackendBlockSearchTraces/noMatch-12          342MB ± 4%     332MB ± 4%  -2.96%  (p=0.011 n=9+8)
BackendBlockSearchTraces/partialMatch-12     601MB ± 1%     607MB ± 2%  +1.11%  (p=0.031 n=9+9)
BackendBlockSearchTraces/service.name-12    3.67MB ± 1%    3.69MB ± 2%    ~     (p=0.460 n=8+10)

name                                      old allocs/op  new allocs/op  delta
BackendBlockSearchTraces/noMatch-12           119k ± 1%      117k ± 4%  -1.73%  (p=0.021 n=8+8)
BackendBlockSearchTraces/partialMatch-12     18.0M ± 0%     18.0M ± 0%    ~     (p=0.438 n=8+9)
BackendBlockSearchTraces/service.name-12     39.2k ± 0%     39.2k ± 0%    ~     (p=0.403 n=10+10)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants