-
Notifications
You must be signed in to change notification settings - Fork 102
Add the ability to omit fields in schema. #196
Conversation
@@ -277,10 +284,10 @@ value 10: R:0 D:0 V:10.0 | |||
dump: `row group 0 | |||
-------------------------------------------------------------------------------- | |||
owner: BINARY ZSTD DO:0 FPO:4 SZ:66/57/0.86 VC:2 ENC:DELTA_LENGTH_BYTE_ARRAY ST:[no stats for this column] | |||
ownerPhoneNumbers: BINARY GZIP DO:0 FPO:70 SZ:162/112/0.69 VC:3 ENC:DELTA_LENGTH_BYTE_ARRAY,RLE ST:[no stats for this column] | |||
ownerPhoneNumbers: BINARY GZIP DO:0 FPO:70 SZ:162/112/0.69 VC:3 ENC:RLE,DELTA_LENGTH_BYTE_ARRAY ST:[no stats for this column] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reordering is likely due to different parquet-tools
versions. We might want to keep it as-is or the tests won't pass in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting. I don't see where we are installing parquet-tools for the GH runners. Which version are we supposed to be using ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parquet-tools 1.12.2 with openjdk 18
schema.go
Outdated
if head == "-" && s[len(s)-1] == ',' { | ||
head = "-," | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a weird condition to have in a function that splits strings, what do we need it for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was meant to handle the condition where we would use -
as actual name for the field. Not sure if that is something one would want but it is supported by other packages like json so handled the cases here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine if we don't allow column names to be "-"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think it's worth supporting, maybe move that to the appendSttuctField function? It's a bit unexpected to have it be a special case in split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this is what the last commit changed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
Fixes #185
Also fixing some other failing tests.