-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase on upstream (2022-11-15) #1
Conversation
Serialize time.Time values as Parquet timestamps. The default unit is NANOS and can be changed using the timestamp() struct tag. type timeColumn struct { t1 time.Time t2 time.Time `parquet:",timestamp(millisecond)"` }
Reading a parquet file with a decimal column isn't loaded with logical type information. This behavior was not implemented. `decimalType` is more complex from the other types because a parquet decimal can be backed by multiple different physical types. This PR loads logical type information for `DECIMAL` fields. Closes segmentio#365
parquet-cli prints the string format of decimals as `DECIMAL(precision,scale)`. Update parquet-go's string format to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you merge, we'll need to create another release tag, for us to use in our go.mod
I'll send some instructions shortly.
tagName = "segmentio" | ||
keyTagName = "segmentio-key" | ||
valueTagName = "segmentio-value" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: something looks fishy here; the neeva
branch should already have these changes.
Did you pull the latest neeva branch, to create your PR?
@@ -381,8 +381,27 @@ func schemaElementTypeOf(s *format.SchemaElement) Type { | |||
case lt.Enum != nil: | |||
return (*enumType)(lt.Enum) | |||
case lt.Decimal != nil: | |||
// TODO: | |||
// return (*decimalType)(lt.Decimal) | |||
// A parquet decimal can be one of several different physical types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a better title and description for the PR?
Ideally we'd upstream any changes we make, and it'll be helpful to have a bit more context.
E.g. I see a bunch of changes in this PR to handle decimal and timestamps, but the current PR description is about handling empty files.
The motivation for this is to pull in a fix that I added to upstream to make it possible to read empty files.