Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack Overflow (?) crash when using all_horizontal #16270

Closed
2 tasks done
Elvynzs opened this issue May 16, 2024 · 4 comments · Fixed by #16287
Closed
2 tasks done

Stack Overflow (?) crash when using all_horizontal #16270

Elvynzs opened this issue May 16, 2024 · 4 comments · Fixed by #16287
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@Elvynzs
Copy link

Elvynzs commented May 16, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import numpy as np
import polars as pl
df = pl.DataFrame(np.random.randn(1000,1000))
df.select(pl.all_horizontal(pl.all())) #Crashes here

Log output

Nothing

Issue description

Running the above code on my compute crashes the python environment silently, with no message, through python interpreter or jupyter lab.
Using pytest with a similar code, I get a "stack overflow" message and my tests are stopped mid-way.

However this may be a Windows issue only, as the same tests do not crash when ran using our ci/cd runners (that use linux)

The maximum shape I can get working is about (700, 700), however the operation is extremely slow when compared to pandas. (12ms vs 300ms).

Expected behavior

I expected a pl.Series of len 1000 as output (filled with True), but instead python crashes.

This code worked fine on 0.20.21. After checking this was broken in 0.20.22

Installed versions

--------Version info---------
Polars:               0.20.26
Index type:           UInt32
Platform:             Windows-10-10.0.22000-SP0
Python:               3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:40:50) [MSC v.1937 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.4
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              15.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@Elvynzs Elvynzs added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 16, 2024
@PierreAttard
Copy link
Contributor

I have the same issue.

@ritchie46
Copy link
Member

This is a stackoverflow. We can improve remove that. However, if you are doing such wide horizontal operations, I would consider transposing as this will never be performant (and will also not be on pandas when they transition to pyarrow).

@Elvynzs
Copy link
Author

Elvynzs commented May 17, 2024

On my 2 computers, the code starts crashing at about ~750 columns.

I would argue that this amount of columns, while quite big, should not crash polars.
In my use case, I use polars to handle time series, and unfortunately we can have in the worst case ~1k colums for 10k-100k rows, so transposing the dataframe would not help. We would have to write custom code to split the dataframes then aggregate the results.

(In my initial code, I used all_horizontal to drop all rows containing at least 1 nan)


As a temporary workaround, sum_horizontal does not crash, so maybe I will see if I can rely on this method instead.

@ritchie46
Copy link
Member

Yes, we will fix that. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants