-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ColumnNotFoundError appears in lazy mode only in version 0.20.28 #16435
Comments
|
It would be much appreciated if you could work out a minimal reproducible example. These types of bugs have very high priority for us, but this report does not give us enough to go on. |
@stinodego I understand, but the query is quite complex and uses a lot of features so hard for me to zero in on the offending part. Obviously even harder for you without anything to go on, but hoped that maybe the spare details would ring a bell given it's at least particular to the latest version. @owenprough-sift tried the optimization flags, but no luck, issue still arose with
Cannot get rid of |
@stinodego okay, tried digging a bit, and I have a cursed
I get
it works again.
it works fine, so still not easily reproducible. |
This might be a regression caused by the |
checked, and |
this looked like a very likely culprit, but found out that there is a toggle for it #16446 |
I believe I'm also seeing the same bug with ColumnNotFoundError in when collecting a LazyFrame that has been constructed in a fairly complicated way (I'm also having trouble reducing it to a simple example). Collecting with cluster_with_columns=False prevents the problem. |
Here is the smallest example I can come up with:
Result with cluster_with_columns=True: Result with cluster_with_columns=False (or using DataFrame instead of LazyFrame):
|
Checks
Reproducible example
I am not sure how to reproduce this.
Log output
Issue description
This is not a very strong bug report, I'm sorry, just wanted to give a heads-up on a potential issue in the newest version 0.20.28.
We have several complex queries that we can run either in lazy or eager mode, but on upgrade our tests started failing with polars.exceptions.ColumnNotFoundError: Intending to debug, I ran the same tests in eager mode - and then all tests passed.
I tried doing a manual bisect search for the version where the error was introduced, and it seems to be in 0.20.27-0.20.28, (the former was yanked), as the tests pass without issue on 0.20.26.
I will downgrade for now, will alert you if I find more actionable intelligence on the issue.
Exact error is
Expected behavior
running with lazyframes should yield identical results to running on normal dataframes.
Installed versions
The text was updated successfully, but these errors were encountered: