Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Add the ability to omit fields in schema. #196

Merged
merged 4 commits into from
Jun 1, 2022
Merged

Add the ability to omit fields in schema. #196

merged 4 commits into from
Jun 1, 2022

Conversation

Pryz
Copy link
Contributor

@Pryz Pryz commented May 24, 2022

Fixes #185

Also fixing some other failing tests.

schema.go Outdated Show resolved Hide resolved
@Pryz Pryz requested a review from a team May 25, 2022 16:18
@@ -277,10 +284,10 @@ value 10: R:0 D:0 V:10.0
dump: `row group 0
--------------------------------------------------------------------------------
owner: BINARY ZSTD DO:0 FPO:4 SZ:66/57/0.86 VC:2 ENC:DELTA_LENGTH_BYTE_ARRAY ST:[no stats for this column]
ownerPhoneNumbers: BINARY GZIP DO:0 FPO:70 SZ:162/112/0.69 VC:3 ENC:DELTA_LENGTH_BYTE_ARRAY,RLE ST:[no stats for this column]
ownerPhoneNumbers: BINARY GZIP DO:0 FPO:70 SZ:162/112/0.69 VC:3 ENC:RLE,DELTA_LENGTH_BYTE_ARRAY ST:[no stats for this column]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reordering is likely due to different parquet-tools versions. We might want to keep it as-is or the tests won't pass in CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. I don't see where we are installing parquet-tools for the GH runners. Which version are we supposed to be using ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet-tools 1.12.2 with openjdk 18

schema.go Outdated Show resolved Hide resolved
schema.go Outdated
Comment on lines 690 to 692
if head == "-" && s[len(s)-1] == ',' {
head = "-,"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a weird condition to have in a function that splits strings, what do we need it for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant to handle the condition where we would use - as actual name for the field. Not sure if that is something one would want but it is supported by other packages like json so handled the cases here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine if we don't allow column names to be "-"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it's worth supporting, maybe move that to the appendSttuctField function? It's a bit unexpected to have it be a special case in split

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this is what the last commit changed :)

Copy link
Contributor

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

@Pryz Pryz merged commit 862d584 into main Jun 1, 2022
@Pryz Pryz deleted the issue-185 branch June 1, 2022 17:54
@achille-roussel achille-roussel added the feature New feature or request label Jun 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to ignore a field when write parquet to file
2 participants