New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Parquet Reader: Error while reading struct with primitive type and complex type #8133

Open
dexchannel opened this Issue May 26, 2017 · 5 comments

Comments

Projects
None yet
5 participants
@dexchannel

dexchannel commented May 26, 2017

We faced problem with New Parquet Reader. When dealing with complex data in nested structures We got Exceptions:

com.facebook.presto.spi.PrestoException: length of sub blocks differ: block 0: 2, block 1: 1 at com.facebook.presto.hive.parquet.ParquetPageSource.getNextPage(ParquetPageSource.java:225) at com.facebook.presto.hive.HivePageSource.getNextPage(HivePageSource.java:204) at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:262) at com.facebook.presto.operator.Driver.processInternal(Driver.java:303) at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:537) at com.facebook.presto.operator.Driver.processFor(Driver.java:229) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:463) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: length of sub blocks differ: block 0: 2, block 1: 1 at com.facebook.presto.spi.block.InterleavedBlock.<init>(InterleavedBlock.java:48) at com.facebook.presto.hive.parquet.reader.ParquetReader.readStruct(ParquetReader.java:234) at com.facebook.presto.hive.parquet.reader.ParquetReader.readBlock(ParquetReader.java:308) at com.facebook.presto.hive.parquet.reader.ParquetReader.readArray(ParquetReader.java:163) at com.facebook.presto.hive.parquet.reader.ParquetReader.readArray(ParquetReader.java:153) at com.facebook.presto.hive.parquet.ParquetPageSource.getNextPage(ParquetPageSource.java:204) ... 12 more

According to tests exception arises when struct with primitive and complex type is array element.

Several tests with Presto version 0.177:

  1. Map with primitive types - works well:

presto> describe hive.presto_test.test1;
Column | Type | Extra | Comment
--------+-----------------------+-------+---------
c1 | map(varchar, varchar) | |

presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test1;
c1
----------------
{k1=v1, k2=v2}

presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test1;
c1
----------------
{k1=v1, k2=v2}

  1. array of structs with primitive type and map when array has multiple entries - got exception:

presto> describe hive.presto_test.test2;
Column | Type | Extra | Comment
--------+--------------------------------------------------+-------+---------
c1 | array(row(p1 integer, m1 map(varchar, varchar))) | |

presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test2;
c1
------------------------------------------
[{p1=1, m1={k1=v1}}, {p1=2, m1={k2=v2}}]

presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test2;
Query 20170526_075802_00029_ybb6u failed: length of sub blocks differ: block 0: 2, block 1: 1

  1. Structs with Primitive Type and Map - works well

presto> describe hive.presto_test.test3;
Column | Type | Extra | Comment
--------+-------------------------------------------+-------+---------
c1 | row(p1 integer, m1 map(varchar, varchar)) | |

presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test3;
c1
---------------------------
{p1=1, m1={k1=v1, k2=v2}}

presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test3;
c1
---------------------------
{p1=1, m1={k1=v1, k2=v2}}]

  1. array of structs with primitive type and map when array has single entry - got different exception:

presto> describe hive.presto_test.test4;
Column | Type | Extra | Comment
--------+--------------------------------------------------+-------+---------
c1 | array(row(p1 integer, m1 map(varchar, varchar))) | |

presto> set session hive.parquet_optimized_reader_enabled=false;
presto> select * from hive.presto_test.test4;
c1
----------------------
[{p1=1, m1={k1=v1}}]

presto> set session hive.parquet_optimized_reader_enabled=true;
presto> select * from hive.presto_test.test4;
Query 20170526_081253_00050_ybb6u failed: Invalid position 0 in block with 1 positions

Full Exception:
java.lang.IndexOutOfBoundsException: Invalid position 0 in block with 1 positions at com.facebook.presto.spi.block.AbstractArrayBlock.getRegionSizeInBytes(AbstractArrayBlock.java:97) at com.facebook.presto.spi.block.ArrayBlock.calculateSize(ArrayBlock.java:91) at com.facebook.presto.spi.block.ArrayBlock.getSizeInBytes(ArrayBlock.java:82) at com.facebook.presto.spi.Page.getSizeInBytes(Page.java:66) at com.facebook.presto.operator.OperatorContext.recordGetOutput(OperatorContext.java:180) at com.facebook.presto.operator.Driver.processInternal(Driver.java:304) at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:537) at com.facebook.presto.operator.Driver.processFor(Driver.java:229) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:463) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

@nezihyigitbasi

This comment has been minimized.

Show comment
Hide comment
@nezihyigitbasi
Contributor

nezihyigitbasi commented May 26, 2017

/cc @zhenxiao

@zhenxiao

This comment has been minimized.

Show comment
Hide comment
@zhenxiao

zhenxiao May 26, 2017

Contributor

yep, this is bug in the New Parquet Reader, I will take a look

Contributor

zhenxiao commented May 26, 2017

yep, this is bug in the New Parquet Reader, I will take a look

@usmanm

This comment has been minimized.

Show comment
Hide comment
@usmanm

usmanm Aug 20, 2017

Hey guys, any update on this? We're facing the same issue as described here: https://medium.com/hadoop-noob/presto-parquet-reader-fc7c333fc0a4

usmanm commented Aug 20, 2017

Hey guys, any update on this? We're facing the same issue as described here: https://medium.com/hadoop-noob/presto-parquet-reader-fc7c333fc0a4

@dotcomputercraft

This comment has been minimized.

Show comment
Hide comment
@dotcomputercraft

dotcomputercraft Oct 5, 2017

@zhenxiao - We are seeing this problem in our organization. Is there an eta for a parquet Reader fix? cc: @nezihyigitbasi

dotcomputercraft commented Oct 5, 2017

@zhenxiao - We are seeing this problem in our organization. Is there an eta for a parquet Reader fix? cc: @nezihyigitbasi

@zhenxiao

This comment has been minimized.

Show comment
Hide comment
@zhenxiao

zhenxiao May 26, 2018

Contributor

The fix is merged:
#9156
try the latest code, should be fixed

Contributor

zhenxiao commented May 26, 2018

The fix is merged:
#9156
try the latest code, should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment