Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use arrowutils.Take to filter records #697

Closed
wants to merge 1 commit into from

Conversation

gernest
Copy link
Contributor

@gernest gernest commented Jan 21, 2024

Closes #672

Comment on lines 1770 to +1771
func Test_DB_Limiter(t *testing.T) {
t.Skip()
Copy link
Contributor Author

@gernest gernest Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is confusing and very flacky. I have confirmed a lot of iteration blocks triggers the memory exceed chunk and yet pass( recover() to oblivion ?) while other cases does the same and fail.

I would have rewritten it , but I couldn't understand what it actually does or tries to accomplish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's goal is to ensure that regardless of where the limiter panics that the test itself does not panic because the recover step catches it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to skip a test without a reason that links to a github issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to skip a test without a reason that links to a github issue.

@thorfour Makes sense, I really don't have substantial reason apart from the fact it starts to fail from 192 iteration and I couldn't find the correlation between the iterations and test case failure.

As I have mentioned earlier I confirmed iterations below 192 triggering the panic and yet the test pass ( This is observed even on main branch where all iterations trigger it and the test still pass)

Here is the stack trace.

panic: memory limit exceeded

goroutine 2433 [running]:
github.com/polarsignals/frostdb/query.(*LimitAllocator).Allocate(0xc0000c0e80?, 0x3767ab0?)
	/Volumes/code/frostdb/query/memory.go:58 +0x65
github.com/apache/arrow/go/v14/arrow/memory.(*Buffer).Reserve(0xc0001256d0, 0x10116c5?)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/memory/buffer.go:122 +0x44
github.com/apache/arrow/go/v14/arrow/memory.(*Buffer).resize(0xc0001256d0, 0x80, 0xc0?)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/memory/buffer.go:142 +0xe7
github.com/apache/arrow/go/v14/arrow/memory.(*Buffer).Resize(...)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/memory/buffer.go:131
github.com/apache/arrow/go/v14/arrow/array.(*Uint32Builder).init(0xc00037fb00, 0x20)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/array/numericbuilder.gen.go:1087 +0xac
github.com/apache/arrow/go/v14/arrow/array.(*Uint32Builder).Resize(0x2100420?, 0xc0005b0e01?)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/array/numericbuilder.gen.go:1106 +0x31
github.com/apache/arrow/go/v14/arrow/array.(*builder).reserve(0xc00037fb00?, 0x2?, 0xc0004f3e48?)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/array/builder.go:184 +0xe6
github.com/apache/arrow/go/v14/arrow/array.(*Uint32Builder).Reserve(0x230bb90?, 0xc0003e0a50?)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/array/numericbuilder.gen.go:1094 +0x31
github.com/apache/arrow/go/v14/arrow/array.(*dictionaryBuilder).Reserve(...)
	/Users/g.ernest/go/pkg/mod/github.com/asubiotto/arrow/go/v14@v14.0.0-20231129090719-b321865d34e9/arrow/array/dictionary.go:716
github.com/polarsignals/frostdb/pqarrow/arrowutils.takeDictColumn({0x2312020?, 0xc0003e14d0?}, 0xc000324af0, 0x0, {0xc00030b840, 0x2, 0xc0005b0f40?}, 0xc000052140)
	/Volumes/code/frostdb/pqarrow/arrowutils/sort.go:145 +0x1fc
github.com/polarsignals/frostdb/pqarrow/arrowutils.Take.func2()
	/Volumes/code/frostdb/pqarrow/arrowutils/sort.go:105 +0x33
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/Users/g.ernest/go/pkg/mod/golang.org/x/sync@v0.4.0/errgroup/errgroup.go:75 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 2427
	/Users/g.ernest/go/pkg/mod/golang.org/x/sync@v0.4.0/errgroup/errgroup.go:72 +0x96
FAIL	github.com/polarsignals/frostdb	2.935s
FAIL

I think maybe there is fundamental flaw in my implementation. Since I can't resolve this I think we should close this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's goal is to ensure that regardless of where the limiter panics that the test itself does not panic because the recover step catches it.

For some reason, I triggered the panic on the test!

@thorfour maybe you can help take a look? I didn't want to bother reviewer to go extra mile, but this case is way beyond me.

Just remove the t.Skip and run the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'd love to take a look. So it's exceeding the allocation limit (that's fine; the panic is expected). But we must be doing this outside of the recovery function which is...unexpected.

I'll take a a look today.

@gernest
Copy link
Contributor Author

gernest commented Jan 22, 2024

I will remove buildIndexRanges after this being merged since we no longer use it. My eyeball test expects significan improvement in throughput and memory .

I will create a simple benchmark, but I was hoping someone who can run bench_test.go benchmarks can help here to get a better picture. I can't run the benchmarks because they need corpus from parca that I don't have access to.

Copy link
Contributor

@thorfour thorfour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Can we get a benchmark here to make sure it's comparable?

Comment on lines 1770 to +1771
func Test_DB_Limiter(t *testing.T) {
t.Skip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's goal is to ensure that regardless of where the limiter panics that the test itself does not panic because the recover step catches it.

@thorfour
Copy link
Contributor

name            old time/op    new time/op    delta
Query/Merge-10     128µs ± 4%     123µs ± 2%  -3.99%  (p=0.001 n=10+8)
Query/Range-10     128µs ± 3%     131µs ± 7%    ~     (p=0.400 n=9+10)

name            old alloc/op   new alloc/op   delta
Query/Merge-10     186kB ± 0%     186kB ± 0%    ~     (p=0.090 n=9+10)
Query/Range-10     186kB ± 0%     186kB ± 0%    ~     (p=0.898 n=10+10)

name            old allocs/op  new allocs/op  delta
Query/Merge-10     1.40k ± 0%     1.40k ± 0%    ~     (all equal)
Query/Range-10     1.41k ± 0%     1.41k ± 0%    ~     (all equal)

ran some of the benchmarks that I could get working

@gernest
Copy link
Contributor Author

gernest commented Jan 22, 2024

name            old time/op    new time/op    delta
Query/Merge-10     128µs ± 4%     123µs ± 2%  -3.99%  (p=0.001 n=10+8)
Query/Range-10     128µs ± 3%     131µs ± 7%    ~     (p=0.400 n=9+10)

name            old alloc/op   new alloc/op   delta
Query/Merge-10     186kB ± 0%     186kB ± 0%    ~     (p=0.090 n=9+10)
Query/Range-10     186kB ± 0%     186kB ± 0%    ~     (p=0.898 n=10+10)

name            old allocs/op  new allocs/op  delta
Query/Merge-10     1.40k ± 0%     1.40k ± 0%    ~     (all equal)
Query/Range-10     1.41k ± 0%     1.41k ± 0%    ~     (all equal)

ran some of the benchmarks that I could get working

Interesting, It seems the numbers are similar. Considering my comment here #697 (comment). It was a good theory, maybe this course of execution didn't pan out. I don't mind if we put this to rest (by closing the PR).

I'm open for any suggestion.

@thorfour
Copy link
Contributor

name            old time/op    new time/op    delta
Query/Merge-10     128µs ± 4%     123µs ± 2%  -3.99%  (p=0.001 n=10+8)
Query/Range-10     128µs ± 3%     131µs ± 7%    ~     (p=0.400 n=9+10)

name            old alloc/op   new alloc/op   delta
Query/Merge-10     186kB ± 0%     186kB ± 0%    ~     (p=0.090 n=9+10)
Query/Range-10     186kB ± 0%     186kB ± 0%    ~     (p=0.898 n=10+10)

name            old allocs/op  new allocs/op  delta
Query/Merge-10     1.40k ± 0%     1.40k ± 0%    ~     (all equal)
Query/Range-10     1.41k ± 0%     1.41k ± 0%    ~     (all equal)

ran some of the benchmarks that I could get working

Interesting, It seems the numbers are similar. Considering my comment here #697 (comment). It was a good theory, maybe this course of execution didn't pan out. I don't mind if we put this to rest (by closing the PR).

I'm open for any suggestion.

I think I'd still like to merge this. It does show some improvement, and the code is simpler so if we can fix that test panic I'm fine with this.

@gernest gernest closed this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use compute.FilterRecord instead of bitmask and manually filtering
2 participants