Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dump happens in StringColumnReader::processFilter during parquet read #9757

Closed
yma11 opened this issue May 9, 2024 · 3 comments
Closed
Assignees
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.

Comments

@yma11
Copy link
Contributor

yma11 commented May 9, 2024

Bug description

Core dump happens when reading file, following is the key trace:

Stack: [0x00007f4567200000,0x00007f4567300000],  sp=0x00007f45672fc8a0,  free space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libvelox.so+0x52e6a48]  void facebook::velox::parquet::PageReader::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> >(facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true>&)+0xa68
C  [libvelox.so+0x52ff99f]  void facebook::velox::parquet::StringColumnReader::processFilter<true, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader> >(facebook::velox::common::Filter*, folly::Range<int const*>, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>)+0x10f

System information

Velox System Info v0.0.2
Commit: 61ef376
CMake Version: 3.22.1
System: Linux-5.15.0-102-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 12.3.0
C Compiler: /usr/bin/cc
C Compiler Version: 12.3.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

@yma11 yma11 added bug Something isn't working parquet triage Newly created issue that needs attention. labels May 9, 2024
@majetideepak majetideepak self-assigned this May 17, 2024
@majetideepak
Copy link
Collaborator

I will take a look at this. @yma11 can you share the table ddl used?

@yma11
Copy link
Contributor Author

yma11 commented May 21, 2024

@majetideepak Thanks for looking at this issue. The file is generated leveraging parquet-mr DataGenerator. As checked using Spark, it has schema as following:

+-------------+---------+-------+
|col_name     |data_type|comment|
+-------------+---------+-------+
|binary_field |binary   |null   |
|int32_field  |int      |null   |
|int64_field  |bigint   |null   |
|boolean_field|boolean  |null   |
|float_field  |float    |null   |
|double_field |double   |null   |
|flba_field   |binary   |null   |
|int96_field  |timestamp|null   |
+-------------+---------+-------+

The flba_field has a physical type FIXED_LEN_BYTE_ARRAY, not sure whether this is the problem. Here is the key inspect info:

############ Column(binary_field) ############
name: binary_field
path: binary_field
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: None
converted_type (legacy): NONE

############ Column(int32_field) ############
name: int32_field
path: int32_field
max_definition_level: 0
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE

############ Column(int64_field) ############
name: int64_field
path: int64_field
max_definition_level: 0
max_repetition_level: 0
physical_type: INT64
logical_type: None
converted_type (legacy): NONE

############ Column(boolean_field) ############
name: boolean_field
path: boolean_field
max_definition_level: 0
max_repetition_level: 0
physical_type: BOOLEAN
logical_type: None
converted_type (legacy): NONE

############ Column(float_field) ############
name: float_field
path: float_field
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE

############ Column(double_field) ############
name: double_field
path: double_field
max_definition_level: 0
max_repetition_level: 0
physical_type: DOUBLE
logical_type: None
converted_type (legacy): NONE

############ Column(flba_field) ############
name: flba_field
path: flba_field
max_definition_level: 0
max_repetition_level: 0
physical_type: FIXED_LEN_BYTE_ARRAY
logical_type: None
converted_type (legacy): NONE

############ Column(int96_field) ############
name: int96_field
path: int96_field
max_definition_level: 0
max_repetition_level: 0
physical_type: INT96
logical_type: None
converted_type (legacy): NONE

@majetideepak
Copy link
Collaborator

The flba_field has a physical type FIXED_LEN_BYTE_ARRAY, not sure whether this is the problem

This is the problem. We don't support parsing Velox varbinary column against Parquet FLBA. I added this support here #9887

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this issue Jun 7, 2024
…rray (facebookincubator#9887)

Summary:
Resolves: facebookincubator#9757

Pull Request resolved: facebookincubator#9887

Reviewed By: Yuhta, kgpai

Differential Revision: D57776408

Pulled By: mbasmanova

fbshipit-source-id: 9a282b68be810b1b99391105157b0777db7e568f
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this issue Jun 7, 2024
…rray (facebookincubator#9887)

Summary:
Resolves: facebookincubator#9757

Pull Request resolved: facebookincubator#9887

Reviewed By: Yuhta, kgpai

Differential Revision: D57776408

Pulled By: mbasmanova

fbshipit-source-id: 9a282b68be810b1b99391105157b0777db7e568f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants