Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase on upstream (2022-11-15) #1

Merged
merged 5 commits into from
Nov 15, 2022
Merged

Conversation

bprosnitz
Copy link

The motivation for this is to pull in a fix that I added to upstream to make it possible to read empty files.

Larry Marburger and others added 5 commits November 14, 2022 13:14
Serialize time.Time values as Parquet timestamps. The default unit is NANOS and can be changed using the timestamp() struct tag.

type timeColumn struct {
	t1 time.Time
	t2 time.Time `parquet:",timestamp(millisecond)"`
}
Reading a parquet file with a decimal column isn't loaded with logical
type information. This behavior was not implemented. `decimalType` is
more complex from the other types because a parquet decimal can be
backed by multiple different physical types.

This PR loads logical type information for `DECIMAL` fields.

Closes segmentio#365
parquet-cli prints the string format of decimals as
`DECIMAL(precision,scale)`. Update parquet-go's string format to match.
@bprosnitz bprosnitz marked this pull request as ready for review November 15, 2022 18:35
@bprosnitz bprosnitz merged commit ddd7cd1 into neeva Nov 15, 2022
Copy link

@tatatodd tatatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you merge, we'll need to create another release tag, for us to use in our go.mod
I'll send some instructions shortly.

tagName = "segmentio"
keyTagName = "segmentio-key"
valueTagName = "segmentio-value"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: something looks fishy here; the neeva branch should already have these changes.
Did you pull the latest neeva branch, to create your PR?

@@ -381,8 +381,27 @@ func schemaElementTypeOf(s *format.SchemaElement) Type {
case lt.Enum != nil:
return (*enumType)(lt.Enum)
case lt.Decimal != nil:
// TODO:
// return (*decimalType)(lt.Decimal)
// A parquet decimal can be one of several different physical types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a better title and description for the PR?
Ideally we'd upstream any changes we make, and it'll be helpful to have a bit more context.

E.g. I see a bunch of changes in this PR to handle decimal and timestamps, but the current PR description is about handling empty files.

@tatatodd tatatodd mentioned this pull request Nov 15, 2022
@tatatodd tatatodd deleted the benjamin/rebaseupstream20221115 branch November 15, 2022 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants