Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.exclude(pl.first()) still isn't working #13517

Closed
2 tasks done
Wainberg opened this issue Jan 8, 2024 · 5 comments
Closed
2 tasks done

pl.exclude(pl.first()) still isn't working #13517

Wainberg opened this issue Jan 8, 2024 · 5 comments
Labels
bug Something isn't working invalid A bug report that is not actually a bug python Related to Python Polars

Comments

@Wainberg
Copy link
Contributor

Wainberg commented Jan 8, 2024

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

>>> import polars as pl
>>> pl.exclude(pl.first())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/polars/functions/lazy.py", line 1292, in exclude
    return exclude(columns, *more_columns)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/polars/selectors.py", line 1323, in exclude
    return ~_combine_as_selector(columns, *more_columns)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/polars/selectors.py", line 228, in _combine_as_selector
    raise TypeError(
TypeError: invalid input for `exclude`

Expected one or more `str`, `DataType` or selector; found <Expr ['nth(0)'] at 0x7F2DF6AB5D00> instead.```

Log output

No response

Issue description

When I suggested supporting expressions and selectors in pl.exclude (#13247), my motivating example was that pl.exclude(pl.first()) didn't work. Unfortunately, it still doesn't work after #13301, because pl.exclude() accepts selectors but not expressions.

Expected behavior

pl.exclude(pl.first()) should be a valid expression or selector that excludes the first column.

Installed versions

--------Version info---------
Polars:               0.20.3
Index type:           UInt32
Platform:             Linux-3.10.0-1160.102.1.el7.x86_64-x86_64-with-glibc2.17
Python:               3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:03:24) [GCC 12.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
numpy:                1.26.3
openpyxl:             3.1.2
pandas:               2.1.4
pyarrow:              14.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             0.8.1
xlsxwriter:           3.1.9
@Wainberg Wainberg added bug Something isn't working python Related to Python Polars labels Jan 8, 2024
@MarcoGorelli
Copy link
Collaborator

Isn't it expected that this raises? You can use cs.first(), which is a column selector. pl.first() selects the first element, not the first column

@Wainberg
Copy link
Contributor Author

Wainberg commented Jan 8, 2024

df.select(pl.first()) does select the first column: https://docs.pola.rs/py-polars/html/reference/expressions/api/polars.first.html.

@cmdlineluser
Copy link
Contributor

cmdlineluser commented Jan 8, 2024

This function has different behavior depending on the input type:

  • None -> Expression to take first column of a context.
  • str -> Syntactic sugar for pl.col(column).first()

It seems like perhaps the special cased column selector behaviour needs to go?

As it stands, it basically cannot be combined with anything, it's not just an exclude issue:

df.select(pl.col(pl.first()))
# TypeError: invalid input for `col`

df.drop(pl.first())
# TypeError: argument 'columns': 'Expr' object cannot be converted to 'PyString'

@alexander-beedie alexander-beedie self-assigned this Jan 8, 2024
@stinodego stinodego added the needs triage Awaiting prioritization by a maintainer label Jan 13, 2024
@stinodego
Copy link
Member

As Marco pointed out, this should raise, as exclude expects a column name or selector, and pl.first() is an expression. The error message correctly indicates this.

For further discussion on how selectors should work, please refer to #13757

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jan 21, 2024
@stinodego stinodego added invalid A bug report that is not actually a bug and removed needs triage Awaiting prioritization by a maintainer labels Jan 21, 2024
@Wainberg
Copy link
Contributor Author

Worth mentioning that this also doesn't work for Expr.exclude, but for a different reason:

>>> pl.DataFrame({'a': [1], 'b': [2]}).select(pl.all().exclude(pl.first()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/michael/miniforge3/lib/python3.12/site-packages/polars/expr/expr.py", line 947, in exclude
    raise TypeError(msg)
TypeError: invalid input for `exclude`

Expected one or more `str` or `DataType`; found <Expr ['nth(0)'] at 0x7F386B584A70> instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid A bug report that is not actually a bug python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

5 participants