-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Way to enforce "fortran" layout #15796
Comments
Currently, It would make sense to have a parameter on I think you can hack around this by converting to numpy and then back. The resulting DataFrame will be contiguous in memory. |
I don't think we should want that @stinodego. That requires unsafe allocations and will be super hard to enfore throughout the engine. And besides all much more costly than simply paying the copy at the end. Either way there needs to be made a copy. It doesn't matter if we do it internally or when moving out to numpy. I will close this as it will have no benefit. |
Right, I was thinking that if you do many However, any operations you do on the DataFrame inbetween those calls will not guarantee that the Fortran layout is preserved. So better to just not give any guarantees about it. |
Thank you so much for the quick answer. I did assume that the layout is by always fortran internally (except maybe for some weird numpy arrays), but yeah I can see how this could create some complications at other places. Makes sense to not implement it then. By the way, why does |
In such a case, I think people should cache their numpy array. I don't think our methods should be focussed with caching.
If we could, we would. A numpy array is backed by a single contiguous allocation. Polars DataFrames are backed by multiple buffers. Don't worry about that copy too much. They happen implicitly all the time. |
Description
It would be great if there would be a way to convert a
DataFrame
and maybe even aLazyFrame
, so that all columns havefortran
memory outline anddf.to_numpy(allow_copy=False)
will always pass.If you need to call
to_numpy
many times, the copy costs can add up and it would be better to convert the arrays upfront, once.I tried to use
df.rechunk()
but:fails with
That might also be a bug, not sure.
The text was updated successfully, but these errors were encountered: