Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected bug related to read with Mutation #8973

Open
hucome opened this issue Mar 6, 2024 · 1 comment
Open

Suspected bug related to read with Mutation #8973

hucome opened this issue Mar 6, 2024 · 1 comment
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.

Comments

@hucome
Copy link

hucome commented Mar 6, 2024

Bug description

[Expected behavior] and [actual behavior].

  1. Scenario example

The data of a table is stored in a large parquet file, which has many rows, such as 20000 rows.
TableScan of a query scans the data file and filter the deleted rows refered to the given corresponding mutation information. The deleted rows of the mutation are random and range from 0 to 20000.
The readBatchSize_ of the TableScan operator is less than 20000, such as 10000, so the rows in the file will be fetched multiple times

  1. Behavior

The result is that only the data result of the first read correctly filter the deleted rows, but the subsequent reading results are mostly not correct.

  1. Test example

duplicate the data in tpch table nation, and delete some data of it, then do the following query

-- select n_nationkey, count(1) from nation group by nation;

correct result:

企业微信截图_6850faa7-5261-4c66-bc4f-5b28d889b6b4

default velox incorrect result:

企业微信截图_839ca187-8804-40fc-bb58-5a67d5a91aea

System information

Velox System Info v0.0.2
Commit: de2f016
CMake Version: 3.22.2
System: Linux-5.4.119-1-tlinux4-0008
Arch: x86_64
C++ Compiler: /usr/lib64/ccache/c++
C++ Compiler Version: 10.3.0
C Compiler: /usr/lib64/ccache/cc
C Compiler Version: 10.3.0
CMake Prefix Path: /usr/local;/usr;/;/data/code/impala/Impala42/toolchain/toolchain-packages-gcc10.4.0/cmake-3.22.2;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

@hucome hucome added bug Something isn't working triage Newly created issue that needs attention. labels Mar 6, 2024
@hucome hucome changed the title Suspected bug related to Mutation Suspected bug related to read with Mutation Mar 6, 2024
@hucome hucome closed this as completed Mar 11, 2024
@hucome hucome reopened this Mar 11, 2024
@yingsu00
Copy link
Collaborator

@hucome Is this an Iceberg read problem? Does it happen with DWRF or ORC files as well? If it's Iceberg, can you please try the fix in #10505?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

2 participants