Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexically-ordered Categorical columns created in StringCache context lose their name following a sort #6014

Closed
2 tasks done
alexander-beedie opened this issue Jan 3, 2023 · 0 comments · Fixed by #6029
Closed
2 tasks done
Labels
bug Something isn't working python Related to Python Polars

Comments

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 3, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

We're investigating a number of Categorical-related gremlins at the moment.
Here's the first, suitably minimised (against latest polars build 0.15.11).

Reproducible example

Setup:

import polars as pl

with pl.StringCache():
    # create basic categorical
    df1 = pl.DataFrame( {'key':['bbb','aaa','ccc'] } ).with_column(
        pl.col('key').cast(pl.Categorical)
    )
    # create lexically-ordered categorical
    df2 = pl.DataFrame( {'key':['bbb','aaa','ccc'] } ).with_column(
        pl.col('key').cast(pl.Categorical).cat.set_ordering('lexical') 
    )

# create lexically-ordered categorical *outside* of StringCache context
df3 = pl.DataFrame( {'key':['bbb','aaa','ccc'] } ).with_column(
    pl.col('key').cast(pl.Categorical).cat.set_ordering('lexical') 
)

Issue:

df1.sort( by=pl.col('key') )
# ┌─────┐
# │ key │
# │ --- │
# │ cat │
# ╞═════╡
# │ bbb │
# │ aaa │
# │ ccc │
# └─────┘

df2.sort( by=pl.col('key') )
# ┌─────┐
# │     │  << column name vanished
# │ --- │
# │ cat │
# ╞═════╡
# │ aaa │
# │ bbb │
# │ ccc │
# └─────┘

df3.sort( by=pl.col('key') )
# ┌─────┐
# │ key │
# │ --- │
# │ cat │
# ╞═════╡
# │ aaa │
# │ bbb │
# │ ccc │
# └─────┘

Expected behavior

Column name should not vanish from df2.

Installed versions

---Version info---
Polars: 0.15.11
Index type: UInt32
Platform: macOS-13.1-arm64-arm-64bit
Python: 3.10.9 (main, Dec 15 2022, 17:11:09) [Clang 14.0.0 (clang-1400.0.29.202)]
---Optional dependencies---
pyarrow: 9.0.0
pandas: 1.4.4
numpy: 1.23.3
fsspec: 2022.8.2
connectorx: <not installed>
xlsx2csv: 0.8
matplotlib: <not installed>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant