Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet reader throws EXC_BAD_ACCESS on simple query with filter #846

Closed
yingsu00 opened this issue Jan 6, 2022 · 4 comments
Closed

Parquet reader throws EXC_BAD_ACCESS on simple query with filter #846

yingsu00 opened this issue Jan 6, 2022 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@yingsu00
Copy link
Collaborator

yingsu00 commented Jan 6, 2022

After enabling filter pushdown prestodb/presto#17161 on Presto, the Parquet reader code path is activated. However it fails to run any query. A simple query with filter yields the " EXC_BAD_ACCESS (code=EXC_I386_GPFLT)" error.

Query:
select sum(orderkey) from lineitem where partkey =1;

The stack trace is

facebook::velox::filesystems::(anonymous namespace)::registeredFileSystems() FileSystems.cpp:33
duckdb::AllocatedData::Reset() duckdb.cpp:4392
duckdb::AllocatedData::~AllocatedData() duckdb.cpp:4385
duckdb::AllocatedData::~AllocatedData() duckdb.cpp:4384
std::__1::default_delete<duckdb::AllocatedData>::operator()(duckdb::AllocatedData*) const memory:2084
std::__1::unique_ptr<duckdb::AllocatedData, std::__1::default_delete<duckdb::AllocatedData> >::reset(duckdb::AllocatedData*) memory:2345
std::__1::unique_ptr<duckdb::AllocatedData, std::__1::default_delete<duckdb::AllocatedData> >::~unique_ptr() memory:2299
std::__1::unique_ptr<duckdb::AllocatedData, std::__1::default_delete<duckdb::AllocatedData> >::~unique_ptr() memory:2299
duckdb::ResizeableBuffer::~ResizeableBuffer() parquet-amalgamation.hpp:7031
duckdb::ResizeableBuffer::~ResizeableBuffer() parquet-amalgamation.hpp:7031
duckdb::ParquetReaderScanState::~ParquetReaderScanState() parquet-amalgamation.hpp:7452
duckdb::ParquetReaderScanState::~ParquetReaderScanState() parquet-amalgamation.hpp:7452
facebook::velox::parquet::ParquetRowReader::~ParquetRowReader() ParquetReader.h:33
facebook::velox::parquet::ParquetRowReader::~ParquetRowReader() ParquetReader.h:33
facebook::velox::parquet::ParquetRowReader::~ParquetRowReader() ParquetReader.h:33
std::__1::default_delete<facebook::velox::dwio::common::RowReader>::operator()(facebook::velox::dwio::common::RowReader*) const memory:2084
std::__1::unique_ptr<facebook::velox::dwio::common::RowReader, std::__1::default_delete<facebook::velox::dwio::common::RowReader> >::reset(facebook::velox::dwio::common::RowReader*) memory:2345
facebook::velox::connector::hive::HiveDataSource::next(unsigned long long) HiveConnector.cpp:542
facebook::velox::exec::TableScan::getOutput() TableScan.cpp:95
facebook::velox::exec::Driver::runInternal(std::__1::shared_ptr<facebook::velox::exec::Driver>&, std::__1::shared_ptr<facebook::velox::exec::BlockingState>*) Driver.cpp:409
facebook::velox::exec::Driver::run(std::__1::shared_ptr<facebook::velox::exec::Driver>) Driver.cpp:493
facebook::velox::exec::Driver::enqueue(std::__1::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const Driver.cpp:229
void folly::detail::function::FunctionTraits<void ()>::callSmall<facebook::velox::exec::Driver::enqueue(std::__1::shared_ptr<facebook::velox::exec::Driver>)::$_0>(folly::detail::function::Data&) Function.h:371
folly::detail::function::FunctionTraits<void ()>::operator()() Function.h:400
folly::ThreadPoolExecutor::runTask(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) Exception.h:279
folly::ThreadPoolExecutor::runTask(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) Executor.h:235
folly::ThreadPoolExecutor::runTask(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) ThreadPoolExecutor.cpp:116
folly::CPUThreadPoolExecutor::threadRun(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>) CPUThreadPoolExecutor.cpp:302
decltype(*(std::__1::forward<folly::ThreadPoolExecutor*&>(fp0)).*fp(std::__1::forward<std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>&>(fp1))) std::__1::__invoke<void (folly::ThreadPoolExecutor::*&)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>&, void>(void (folly::ThreadPoolExecutor::*&)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>&) type_traits:3688
std::__1::__bind_return<void (folly::ThreadPoolExecutor::*)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), std::__1::tuple<folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> >, std::__1::tuple<>, __is_valid_bind_return<void (folly::ThreadPoolExecutor::*)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), std::__1::tuple<folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> >, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (folly::ThreadPoolExecutor::*)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), std::__1::tuple<folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> >, 0ul, 1ul, std::__1::tuple<> >(void (folly::ThreadPoolExecutor::*&)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), std::__1::tuple<folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> >&, std::__1::__tuple_indices<0ul, 1ul>, std::__1::tuple<>&&) functional:2852
std::__1::__bind_return<void (folly::ThreadPoolExecutor::*)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), std::__1::tuple<folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> >, std::__1::tuple<>, __is_valid_bind_return<void (folly::ThreadPoolExecutor::*)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), std::__1::tuple<folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread> >, std::__1::tuple<> >::value>::type std::__1::__bind<void (folly::ThreadPoolExecutor::*)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>&>::operator()<>() functional:2885
void folly::detail::function::FunctionTraits<void ()>::callSmall<std::__1::__bind<void (folly::ThreadPoolExecutor::*)(std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*, std::__1::shared_ptr<folly::ThreadPoolExecutor::Thread>&> >(folly::detail::function::Data&) Function.h:371
folly::detail::function::FunctionTraits<void ()>::operator()() Function.h:400
folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::'lambda'()::operator()() NamedThreadFactory.h:40
decltype(std::__1::forward<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::'lambda'()>(fp)()) std::__1::__invoke<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::'lambda'()>(folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::'lambda'()&&) type_traits:3747
void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::'lambda'()>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::'lambda'()>&, std::__1::__tuple_indices<>) thread:280
void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::'lambda'()> >(void*) thread:291
_pthread_start 0x00007fff203b78fc
thread_start 0x00007fff203b3443

I have root caused the problem. It was because ParquetReader has a duckDB allocator as its member, and it's not managed by any smart pointers. Then:

RowVectorPtr HiveDataSource::next(uint64_t size) {
      ...
    reader_.reset();    // This caused the the ParquetReader object to be destructed and its member allocator also destructed.
    rowReader_.reset();    // ParquetRowReader is trying to destruct the buffers and it makes use of the just deleted allocator.
}

A fix will be sent soon. The idea is not to keep the allocator as a plain member in ParquetReader since it's not used in that class other than getting the types. Other alternative fixes maybe

  1. revert the order of the reset() function. This works but it's not the right way.
  2. Use smart pointer to manage the allocator object. This requires a lot of change in DuckDB code.
    By looking at DuckDB implementation, it seems the allocator was intented to be alive throughout the query lifetime, and managed in the config that is static singleton. So I will try the approach to make the owner of the allocator ParquetRowReader instead of ParquetReader.

However I wonder why these readers are to be reset for every batch(1024 rows). Constructing and destructing these reader objects are costly operations and they should be reused for the split lifetime. @mbasmanova Is there any special reason it's done this way?

@yingsu00 yingsu00 self-assigned this Jan 6, 2022
@mbasmanova
Copy link
Contributor

CC: @ienkovich @majetideepak

@mbasmanova
Copy link
Contributor

However I wonder why these readers are to be reset for every batch(1024 rows).

These are reset only at the end of the split, not at every batch.

@mbasmanova mbasmanova added the bug Something isn't working label Jan 6, 2022
@mbasmanova
Copy link
Contributor

CC: @frankobe

@mbasmanova
Copy link
Contributor

Fixed in #869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants