-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema to parquet writer options #6074
Conversation
✅ Deploy Preview for meta-velox canceled.
|
6928163
to
5cc5e09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WriterOptions has the Velox schema. We should be able to convert that to Arrow schema in the Parquet Writer.
@majetideepak It appears that the |
@JkSelf Please add a Velox schema to Parquet writer Options. |
5cc5e09
to
d2c4d9b
Compare
d2c4d9b
to
52c9e21
Compare
@majetideepak Updated. Can you help to review again? |
52c9e21
to
2849a8c
Compare
2849a8c
to
91914f0
Compare
91914f0
to
f0b4196
Compare
@majetideepak Thanks for your review. I have fixed your comments. And can you help to review again? Thanks. |
@majetideepak Can you help to review again? |
@majetideepak Please help to review again. Thanks. |
@majetideepak @mbasmanova Please help to review again. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf Sorry, I missed your comments. I think the PR description should clarify that this is an improvement from the current approach of inferring the schema from the input data.
f0b4196
to
6b243a2
Compare
@majetideepak I have fixed your comments. Can you help to review again? |
a245d30
to
6490a94
Compare
@mbasmanova @majetideepak The issue seems to be caused by not using dictionary encoding in the vector in |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1ec6373
to
075aa9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf Thank you for iterating. Looks good % couple of nits.
@mbasmanova Resolved all your comments. Can you help to review again? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova merged this pull request in 2ada2ec. |
ParquetReaderBenchmark was broken due to facebookincubator#6074. This commit fixes the issue.
ParquetReaderBenchmark was broken due to facebookincubator#6074. This commit fixes the issue.
Summary: ParquetReaderBenchmark was broken due to facebookincubator#6074. This commit fixes the issue. Pull Request resolved: facebookincubator#8274 Reviewed By: xiaoxmeng Differential Revision: D52703288 Pulled By: kewang1024 fbshipit-source-id: fc8aede66ff2d2dda7b8d2777b5e18316296aaf6
Here is the schema information for the TPCH
supplier
table.When we write the
supplier
table into a parquet file using velox parquet writer in Gluten, we encounter the incorrect schema issue.The reason for this is that the schema is inferred from the record batch. To address this, this PR introduces a schema into the
WriterOption
. This allows us to pass the appropriate schema to the Parquet writer.