-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: reduce parquet file size datalake #3035
Conversation
f4bee74
to
9c1bef1
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #3035 +/- ##
==========================================
+ Coverage 53.09% 53.39% +0.30%
==========================================
Files 332 342 +10
Lines 52350 52812 +462
==========================================
+ Hits 27794 28199 +405
- Misses 22956 23000 +44
- Partials 1600 1613 +13
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
9c1bef1
to
d5cad00
Compare
d5cad00
to
4ca3751
Compare
db27b78
to
95dd9c8
Compare
95dd9c8
to
818c9e5
Compare
a06bfa7
to
4a85861
Compare
250800c
to
86ec392
Compare
…re.reduceParquetSize
…r-server into chore.reduceParquetSize
sql/migrations/warehouse/000021_drop_merged_schema_from-wh_uploads.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not drop the column just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the database column, we also need to consider the revertability of the code.
If we introduce this change and have to revert back, the following code will not work correctly, as the MergedSchema
won't be defined:
schema := &job.Upload.UploadSchema
if job.Upload.LoadFileType == warehouseutils.LOAD_FILE_TYPE_PARQUET {
schema = &job.Upload.MergedSchema
}
We need to consider if we can mitigate this situation, for example manually populate MergeSchema from upload schema in case of revert.
Or we can proactively populate both columns with UploadSchema
, so in case we revert to a previous version a valid schema could be used.
e777aba
to
3ea0f9e
Compare
b7c3f83
to
c854513
Compare
d7fefdc
to
59d67f8
Compare
Description
mergedschema
columnuseParquetLoadFilesRS
config variableNotion Ticket
Notion Link
Security