Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate ValuesWriterFactory instances for non-default page options when creating ParquetWriter #22719

Merged
merged 1 commit into from May 14, 2024

Conversation

hantangwangd
Copy link
Member

@hantangwangd hantangwangd commented May 11, 2024

Fix #22130

Description

The sharing of a single ValuesWriterFactory among multiple ParquetProperties instances with different write options is not thread-safe and may lead to inconsistency issues.

This PR set a new separate ValuesWriterFactory instance into ParquetProperties on its creation when using a non-default writer options, so that it could work well in multi-thread environment.

Test Plan

  • Make sure this change do not affect existing test cases

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@hantangwangd
Copy link
Member Author

@elharo Thanks for your suggestion, they have been fixed! Please take a look when available.

@hantangwangd hantangwangd requested a review from elharo May 13, 2024 05:56
elharo
elharo previously approved these changes May 13, 2024
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! Just one nit.

@tdcmeehan
Copy link
Contributor

Looks good, but commit message is too long. Perhaps reword as follows:

Fix ValuesWriterFactory in ParquetReader thread safety

The sharing of a single `ValuesWriterFactory` among multiple
`ParquetProperties` instances with different write options is not
thread-safe and may lead to inconsistency issues.

The sharing of a single `ValuesWriterFactory` among multiple
`ParquetProperties` instances with different write options is not
thread-safe and may lead to inconsistency issues.
@hantangwangd
Copy link
Member Author

Looks good, but commit message is too long. Perhaps reword as follows:

......

Thanks for the suggestion, fixed!

@hantangwangd hantangwangd merged commit e484062 into prestodb:master May 14, 2024
56 checks passed
@hantangwangd hantangwangd deleted the fix_parquet_writer branch May 14, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestFullParquetReader>AbstractTestParquetReader.testDecimalBackedByINT32
3 participants