New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestFullParquetReader>AbstractTestParquetReader.testDecimalBackedByINT32 #22130
Comments
I encountered this again today (https://github.com/prestodb/presto/actions/runs/8485176805/job/23249502331?pr=22064) Error: Tests run: 2956, Failures: 1, Errors: 0, Skipped: 90, Time elapsed: 2,788.605 s <<< FAILURE! - in TestSuite |
slightly longer stack trace:
|
Wait, this is not quite the same flake. This is in TestFullParquetReader.testDecimalBackedByFixedLenByteArray whereas the original was in testDecimalBackedByINT32 |
initialSlabSize and maxCapacityHint come from parquetProperties in DefaultV1ValuesWriterFactory:
initialSlabSize == parquetProperties.getInitialSlabSize() maxCapacityHint == parquetProperties.getPageSizeThreshold() These properties are set in ParquetOutputFormat.getRecordWriter with a ParquetProperties.builder pageSizeThreshold is set by
(Yes, it's pageSize in the builder but pageSizeThreshold later) initialSlabSize is set in the constrcutor by
|
I still don't see how this becomes flaky, though perhaps there's a bug in initialSlabSizeHeuristic? |
Is there any chance the configuration is shared between tests? |
I don't immediately see how but I think ParquetTester might be sharing mutable state between test methods. Making the test single threaded and using a new ParquetTester for each method might fix this. |
similar flake: TestFullParquetReader>AbstractTestParquetReader.testArrayOfMapOfStruct:383 » IllegalArgument maxCapacityHint can't be less than initialSlabSize 1024 100 https://github.com/prestodb/presto/actions/runs/8485176805/job/23249502331?pr=22064 |
#22390 tried to fix this by avoiding shared state but failed. That doesn't seem to be the issue here. More work is needed. |
Encountered in #22436 |
Meet again in
|
Meet a failure in another method (https://github.com/prestodb/presto/actions/runs/9006379854/job/24743797907?pr=22080#step:7:11522) |
Yes, I think once we understand why this one is flaking the fix will be obvious. So far no one's found the root cause. Given the broad range of places where this occurs I'm a little concerned that this is a real bug in the non test code somewhere. |
Yes, agree. It seems very likely a real bug in parquet module. |
@elharo, After check the code, I believe the root cause is (although I could not reproduce the test failure in my local, I wrote some code to simulate the situation, and finally caught the inconsistency), if we do not explicitly set a That's not thread safe, especially when setting different page sizes alternately, as when we invoke So, to fix the flaky test, I think set the max page size to 1024 rather than 100 when creating Any misunderstanding please let me know, thanks! cc: @tdcmeehan |
@hantangwangd this seems pretty plausible. I also see that Trino has made their Parquet tests single threaded for this exact issue (see: trinodb/trino#17022). |
@tdcmeehan I think this problem is not just a flaky test, that is, it's possible to happen in actual usage, although the probability is extremely low. Imagine two users, one of them has set the session property After studying the code of |
Is this a bug in parquet then? I don't immediately see where we could fix this in presto code, though we can certainly try to fix it in parquet. |
I believe it's a problem caused by improperly usage of |
Haven't fully traced through the code yet, but this one might be caused by an unfortunate choice of random values in the test.
The text was updated successfully, but these errors were encountered: