Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: update parquet-go to include goroutine leak fix #178

Merged
merged 2 commits into from
Sep 15, 2022
Merged

Conversation

asubiotto
Copy link
Member

@asubiotto asubiotto commented Aug 30, 2022

This should alleviate our CI flakes where too many goroutines are alive at one
time.

There is also a fix pulled in for buffering bloom filter reads.

@asubiotto
Copy link
Member Author

Interesting failure mode on first CI run. Looks like an unrelated flake:

=== RUN   Test_Table_Concurrency/8192
race: limit on 8128 simultaneously alive goroutines is exceeded, dying
FAIL	github.com/polarsignals/frostdb	12.551s

The test only seems to spawn 8 goroutines. Trying to reproduce the failure locally.

@asubiotto
Copy link
Member Author

Looks like the cause if a bunch of leaked goroutines in asyncPage.init:

         Goroutine 105824 in state select, with github.com/segmentio/parquet-go.readPages on top of the stack:
        goroutine 105824 [select]:
        github.com/segmentio/parquet-go.readPages({0x105d992a0, 0xc06f880198}, 0xc06f23e180, 0xc06daf9ab0, 0xc06f23e0c0)
                /Users/asubiotto/go/pkg/mod/github.com/segmentio/parquet-go@v0.0.0-20220830163417-b03c0471ebb0/page.go:222 +0x168
        created by github.com/segmentio/parquet-go.(*asyncPages).init
                /Users/asubiotto/go/pkg/mod/github.com/segmentio/parquet-go@v0.0.0-20220830163417-b03c0471ebb0/page.go:161 +0x270
        ]

Looks like we're spawning 100k goroutines in a single test run. I think the likeliest thing is that this is a latent bug in our page lifecycle management that has only surfaced due to segmentio/parquet-go#297.

@asubiotto
Copy link
Member Author

The last commit fixes the asyncPage.init leak but CI is still failing. We are still leaking some goroutines (see #180) but this is an issue on main as well. I'm guessing that we're still spawning a ton of goroutines simultaneously when writing rows. I'm wondering if this is expected.

@asubiotto
Copy link
Member Author

Apologies for the excess notifications. I'm going to try peppering the code with some missing Close calls to see if we can at least alleviate the number of simultaneously alive goroutines.

@asubiotto
Copy link
Member Author

I found a potential goroutine leak in parquet (fixed in segmentio/parquet-go#337) which I think fixes our problem (at least makes the leak detector not scream at me when running without finalizers). I'll merge the upstream fix, update the version on this PR and then retry.

This should alleviate our CI flakes where too many goroutines are alive at one
time.

There is also a fix pulled in for buffering bloom filter reads.
The recent parquet version upgrade stressed a latent bug in our code. Lots of
places were missing a rows.Close call, resulting in leaked goroutines.
@asubiotto asubiotto changed the title *: update parquet-go to include indexedType fix *: update parquet-go to include goroutine leak fix Sep 15, 2022
@asubiotto asubiotto marked this pull request as ready for review September 15, 2022 07:12
@asubiotto
Copy link
Member Author

RFAL, I pulled in the upstream goroutine leak fix as well as the bloom filter read buffering @thorfour

@asubiotto asubiotto merged commit 52b4e60 into main Sep 15, 2022
@asubiotto asubiotto deleted the alfonso-parquet branch September 15, 2022 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants