New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to insert empty map data type into parquet format #5934
Comments
This is a known issue and the root cause boils down to the Hive Parquet writers, which Presto uses to write Parquet data (see HIVE-11625). |
But when I used hive/hadoop to perform the operation, it was able to complete the operation without error |
Which hive version are you using?
|
I was using hive 1.0 on emr. And the issue I am having is a bit different from the ticket you linked. Presto seems to fail if the map being inserted is empty (like {}). |
Here is a concrete example:
|
If you re-run your query and check the complete stack trace (pass Query 20160825_165119_00008_3zd6n failed: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
at com.facebook.presto.hive.HivePageSink$HiveRecordWriter.addRow(HivePageSink.java:747)
at com.facebook.presto.hive.HivePageSink.doAppend(HivePageSink.java:411)
at com.facebook.presto.hive.HivePageSink.lambda$appendPage$2(HivePageSink.java:390)
at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:76)
at com.facebook.presto.hive.HivePageSink.appendPage(HivePageSink.java:390)
at com.facebook.presto.spi.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:42)
at com.facebook.presto.operator.TableWriterOperator.addInput(TableWriterOperator.java:207)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:384)
at com.facebook.presto.operator.Driver.processFor(Driver.java:301)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:529)
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:665)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:244)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
... 23 more |
Got it.. I will upvote there. |
I am having the same issue. How did you overcome this? Did you change your schema or replace null values? |
I am also facing this issue.. any workaround please |
@nezihyigitbasi You are right and the parquet writer does not allow empty maps, but I may have found something specific to presto. If you create an array that contains a struct, you'll get the same error.
you'll get
If I'm not mistaken, empty arrays should be allowed in parquet(tested in spark) |
I stumbled upon the same problem with Athena: when I do a CTAS query there that has empty arrays in the result set, I get this error. |
Not really a solution to the actual issue here but re: @PowerToThePeople111's problem, you can work around the write-to-Parquet error by writing the Athena CTAS query to JSON instead. There are performance/compression tradeoffs, but they may be preferable to the hours I spent fruitlessly trying to coalse my nulls into compliance |
Had anyone come across ways to create Athena CTAS query by checking cardinality of nested structure? In my case, I have a |
In our case, this happens as an Athena CTAS trying to write some rows with empty arrays. Our fix is to use INSERT INTO ...
SELECT
nullif(my_col, ARRAY[]) as my_col
FROM ... Not sure if this is an option to your case, @alecbw, but if it is then it may be better than writing to json. @abhishekupadhyaya perhaps you can try checking for an empty struct?, maybe:
|
Hi,
I am trying to insert some data that might contain empty data for the map column into a parquet table and I kept getting:
Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
Looks similar to this avro issue: https://github.com/Parquet/parquet-mr/pull/169
Edward
The text was updated successfully, but these errors were encountered: