You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We explored the option of making the Parquet size configurable by making the upload interval configurable using the env P_STORAGE_UPLOAD_INTERVAL, but the problem with this approach is manifold.
The data in staging area is now variable and can span hours or days. This makes the query code very complicated.
There is not much benefit on the parquet size, because it is difficult to predict the volume of logs.
A better approach would be to add a separate compaction engine that can compact and create more compressed parquet files for historical data. We'll take that up in a separate exercise. For now we need to revert the changes in #616 and also remove the P_STORAGE_UPLOAD_INTERVAL option completely.
The text was updated successfully, but these errors were encountered:
We explored the option of making the Parquet size configurable by making the upload interval configurable using the env
P_STORAGE_UPLOAD_INTERVAL
, but the problem with this approach is manifold.A better approach would be to add a separate compaction engine that can compact and create more compressed parquet files for historical data. We'll take that up in a separate exercise. For now we need to revert the changes in #616 and also remove the
P_STORAGE_UPLOAD_INTERVAL
option completely.The text was updated successfully, but these errors were encountered: