Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String panic on compare int to str #14462

Closed
2 tasks done
paddymul opened this issue Feb 13, 2024 · 3 comments
Closed
2 tasks done

String panic on compare int to str #14462

paddymul opened this issue Feb 13, 2024 · 3 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@paddymul
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

cleaned  = pl.DataFrame({'a': [  10 , 20]})
original = pl.DataFrame({'a': [ "10", 20]})

expected = pl.DataFrame(
    {'a'     : [  10 ,   20],
     'a_orig': [ "10", None]})

cleaned.select(
    pl.col('a'),
    pl.when((original['a'] == cleaned['a']).eq(0)).then(None).otherwise(cleaned['a']).alias("a_orig"))

Log output

-traceback
thread '<unnamed>' panicked at crates/polars-core/src/series/arithmetic/borrowed.rs:426:44:
data types don't match: InvalidOperation(ErrString("sub operation not supported for dtypes `str` and `str`"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[119], line 10
      2 original = pl.DataFrame({'a': [ "10", 20]})
      4 expected = pl.DataFrame(
      5     {'a'     : [  10 ,   20],
      6      'a_orig': [ "10", None]})
      8 cleaned.select(
      9     pl.col('a'),
---> 10     pl.when((original['a'] - cleaned['a']).eq(0)).then(None).otherwise(cleaned['a']).alias("a_orig"))

File ~/anaconda3/envs/buckaroo-dev-5/lib/python3.11/site-packages/polars/series/series.py:1068, in Series.__sub__(self, other)
   1066 if isinstance(other, pl.Expr):
   1067     return F.lit(self) - other
-> 1068 return self._arithmetic(other, "sub", "sub_<>")

File ~/anaconda3/envs/buckaroo-dev-5/lib/python3.11/site-packages/polars/series/series.py:1005, in Series._arithmetic(self, other, op_s, op_ffi)
   1002     other = pl.Series("", [None])
   1004 if isinstance(other, Series):
-> 1005     return self._from_pyseries(getattr(self._s, op_s)(other._s))
   1006 elif _check_for_numpy(other) and isinstance(other, np.ndarray):
   1007     return self._from_pyseries(getattr(self._s, op_s)(Series(other)._s))

PanicException: data types don't match: InvalidOperation(ErrString("sub operation not supported for dtypes `str` and `str`"))

Issue description

I'm not sure if this is an issue. I think panics are different than other errors. Also the error message is confusing.
```"sub operation not supported for dtypes str and `str`"


I am subtracting  an int column from a string column, not `str` from `str`

### Expected behavior

I guess I would expect an error, but I'm not sure if this is the right type of error.  Also the message could be clearer.

I want to be able to get the dataframe in `expected`.

I tried with objects, and it too failed.

```python
cleaned  = pl.DataFrame({'a': pl.Series([  10 , 20], dtype=pl.Object)})
original = pl.DataFrame({'a': pl.Series([ "10", 20], dtype=pl.Object)})

expected = pl.DataFrame(
    {'a'     : [  10 ,   20],
     'a_orig': [ "10", None]})

cleaned.select(
    pl.col('a'),
    pl.when((original['a'] == cleaned['a']).eq(True)).then(None).otherwise(cleaned['a']).alias("a_orig"))

Installed versions

Polars:               0.20.7
Index type:           UInt32
Platform:             macOS-13.6.1-arm64-arm-64bit
Python:               3.11.7 (main, Dec 15 2023, 12:09:56) [Clang 14.0.6 ]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
numpy:                1.26.3
openpyxl:             <not installed>
pandas:               2.1.4
pyarrow:              11.0.0
pydantic:             2.5.3
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

@paddymul paddymul added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 13, 2024
@mcrumiller
Copy link
Contributor

On the current main branch this gives a ComputeError and does not panic:

polars.exceptions.ComputeError: cannot compare string with numeric data

@ritchie46
Copy link
Member

Then I think it is the proper behavior on main. We can add a test and close.

@mcrumiller
Copy link
Contributor

We already have a test for that here. We can close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

4 participants