Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: LazyFrame neglects head(0) if a sort follows #15779

Closed
2 tasks done
lorentzenchr opened this issue Apr 19, 2024 · 3 comments · Fixed by #15784
Closed
2 tasks done

BUG: LazyFrame neglects head(0) if a sort follows #15779

lorentzenchr opened this issue Apr 19, 2024 · 3 comments · Fixed by #15784
Assignees
Labels
A-optimizer Area: plan optimization accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars

Comments

@lorentzenchr
Copy link
Contributor

lorentzenchr commented Apr 19, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

I'm very sorry that I did not arrive at a small reproducible example, I really tried.
See https://github.com/lorentzenchr/model-diagnostics/blob/615b131892c6ddf81b0fbdf331daf5beb8601843/src/model_diagnostics/calibration/tests/test_identification.py#L502 for a test that fails with model-diagnostics==1.1.0 and polars>=0.20.20 (also tested with 0.20.21).
The lines causing this are:
https://github.com/lorentzenchr/model-diagnostics/blob/615b131892c6ddf81b0fbdf331daf5beb8601843/src/model_diagnostics/calibration/identification.py#L468-L469

Log output

No response

Issue description

A df.head(0) should always produce a dataframe with zero rows and some columns. As of polars 0.20.20, a LazyFrame neglects the head(0) if followed by sort(..), at least in certain (harder to reproduce) circumstances.

The fix for model-diagnostics was lorentzenchr/model-diagnostics#148.

Expected behavior

Return a dataframe with 0 rows.

Installed versions

--------Version info---------
Polars:               0.20.21
Index type:           UInt32
Platform:             macOS-14.4.1-x86_64-i386-64bit
Python:               3.11.7 (main, Dec  4 2023, 18:10:11) [Clang 15.0.0 (clang-1500.1.0.2.5)]

----Optional dependencies----
numpy:                1.26.3
openpyxl:             <not installed>
pandas:               2.1.4
pyarrow:              14.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.25
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@lorentzenchr lorentzenchr added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 19, 2024
@cmdlineluser
Copy link
Contributor

Minimal repro:

(pl.LazyFrame({"foo": [1, 2]})
   .sort("foo")
   .head(0)
   .sort("foo")
   .collect()
)

# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 2   │
# └─────┘
(pl.LazyFrame({"foo": [1, 2]})
   .sort("foo")
   .head(0)
   .sort("foo")
   .collect(slice_pushdown=False)
)

# shape: (0, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ i64 │
# ╞═════╡
# └─────┘

@lorentzenchr
Copy link
Contributor Author

lorentzenchr commented Apr 19, 2024

@cmdlineluser Thank you so much. I guess I was missing the first sort.

@orlp
Copy link
Collaborator

orlp commented Apr 19, 2024

Thanks for the reproduction.

@orlp orlp added P-high Priority: high A-optimizer Area: plan optimization and removed needs triage Awaiting prioritization by a maintainer labels Apr 19, 2024
@c-peters c-peters added the accepted Ready for implementation label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-optimizer Area: plan optimization accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants