You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After enabling filter pushdown prestodb/presto#17161 on Presto, the Parquet reader code path is activated. However it fails to run any query. A simple query with filter yields the " EXC_BAD_ACCESS (code=EXC_I386_GPFLT)" error.
Query:
select sum(orderkey) from lineitem where partkey =1;
I have root caused the problem. It was because ParquetReader has a duckDB allocator as its member, and it's not managed by any smart pointers. Then:
RowVectorPtr HiveDataSource::next(uint64_t size) {
...
reader_.reset(); // This caused the the ParquetReader object to be destructed and its member allocator also destructed.
rowReader_.reset(); // ParquetRowReader is trying to destruct the buffers and it makes use of the just deleted allocator.
}
A fix will be sent soon. The idea is not to keep the allocator as a plain member in ParquetReader since it's not used in that class other than getting the types. Other alternative fixes maybe
revert the order of the reset() function. This works but it's not the right way.
Use smart pointer to manage the allocator object. This requires a lot of change in DuckDB code.
By looking at DuckDB implementation, it seems the allocator was intented to be alive throughout the query lifetime, and managed in the config that is static singleton. So I will try the approach to make the owner of the allocator ParquetRowReader instead of ParquetReader.
However I wonder why these readers are to be reset for every batch(1024 rows). Constructing and destructing these reader objects are costly operations and they should be reused for the split lifetime. @mbasmanova Is there any special reason it's done this way?
The text was updated successfully, but these errors were encountered:
After enabling filter pushdown prestodb/presto#17161 on Presto, the Parquet reader code path is activated. However it fails to run any query. A simple query with filter yields the " EXC_BAD_ACCESS (code=EXC_I386_GPFLT)" error.
Query:
select sum(orderkey) from lineitem where partkey =1;
The stack trace is
I have root caused the problem. It was because ParquetReader has a duckDB allocator as its member, and it's not managed by any smart pointers. Then:
A fix will be sent soon. The idea is not to keep the allocator as a plain member in ParquetReader since it's not used in that class other than getting the types. Other alternative fixes maybe
By looking at DuckDB implementation, it seems the allocator was intented to be alive throughout the query lifetime, and managed in the config that is static singleton. So I will try the approach to make the owner of the allocator ParquetRowReader instead of ParquetReader.
However I wonder why these readers are to be reset for every batch(1024 rows). Constructing and destructing these reader objects are costly operations and they should be reused for the split lifetime. @mbasmanova Is there any special reason it's done this way?
The text was updated successfully, but these errors were encountered: