-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: protect ak.to_parquet against memory explosion when args are swapped. #2523
fix: protect ak.to_parquet against memory explosion when args are swapped. #2523
Conversation
Improvements: this function isn't going through the standard Also, only strings for |
Codecov Report
Additional details and impacted files
|
Just poking around and saw this, so feel free to disregard - although it's just a nice quality of life thing, it would be nice to have support for Path. Every time I use |
That one's easy. Allowing file-like objects to pass through to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I made a change to open up support for any path-likes, are you happy with these modifications? I'll just test them locally to ensure that there are no corner-cases.
If an Awkward Array is passed as
destination
tofsspec.core.url_to_fs(destination)
, the memory use explodes before Python crashes. The array doesn't even need to be very large to send fsspec into dozens of GB; I think it might be converting the array into a string and interpreting[
]
{
}
as wildcards, a combinatorially large file list.Anyway, it's very common for me to forget the argument order in I/O functions (
ak.to_parquet
,json.dump
, etc.) and I just try one until it works. It needs to be the case that trying the wrong one produces an immediate error message.