Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store which splits are partial and which are complete #2809

Open
severo opened this issue May 15, 2024 · 0 comments
Open

Store which splits are partial and which are complete #2809

severo opened this issue May 15, 2024 · 0 comments
Labels
improvement / optimization P1 Not as needed as P0, but still important/wanted

Comments

@severo
Copy link
Collaborator

severo commented May 15, 2024

In each step, we should store if we truncated the data or not.

Currently, config-parquet-and-info only stores the fact that some of the splits have been partially converted to parquet, but not the list of them.

We want to have the info for each split.

The same goes when we convert to duckdb, and when we compute the statistics. It should be the case on each truncation, so that we can show trustable information in the viewer. tches. That lead me and other wrongly believing its 3.1M instead of 31M (edited)

Related issues and discussions:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement / optimization P1 Not as needed as P0, but still important/wanted
Projects
None yet
Development

No branches or pull requests

1 participant