-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behaviour of append method #23
Comments
I think it's closely related to dask/fastparquet#114 ... happy to move the discussion to Dask repository if more appropriate! |
@marchinidavide I had the same issue and think I resolved it. see my code and comments in issue #17 would be great if you confirm and verify performance |
Thanks for the link to your solution, seems I completely missed it! |
I've pushed a new version to the dev branch. It should result in a faster and more consistent behavior when using append. By default, PyStore will aim for partitions of ~99MB each (as per Dask's recommendation). LMK. |
Thanks! Will definitely have a look soon and give feedback! |
Closing this issue and moving all related discussions to issue #21. Please see my comments here: #21 (comment), and here: #21 (comment) |
Hi everyone :)
I would like to confirm my understanding of method to append data to one item using
collection.append(item, data)
.To my understanding, this operation creates a new parquet file and modifies the metadata.
I would like to avoid having thousands of very small file but rather "including new data in the last .parquet file" and create a new one only after the last .parquet file reaches the predefined length (I'm fine with the current value of 1 million rows).
I see from the code that this goes in the end to calling Dask's
dd.to_parquet()
and I tried to dig deeper into it but I find the code very convoluted and difficult to read :(Ideally my workflow would be this in pseudocode:
I didn't find a way to modify the
_metadata
file, any hint on this would be really appreciated :)Also, any opinion on why I shouldn't be doing this is very welcome!
The text was updated successfully, but these errors were encountered: