-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_dict - mapping a string to None converts the entire column to f64 of null #7140
Comments
Realized that map_dict takes the default parameter which is why everything became null. If anything though, maybe the change in dtype is still unexpected? |
I would think it is unexpected. You should probably re-open this so that it gets seen by the team. |
I'd say the dtype change is perhaps a bit confusing but not unexpected. Essentially the keys of the mapping dictionary govern the dtype. Your dictionary contains keys of only null which gets inferred as f64. Some details here. Not sure what the best solution is 🤷. |
It can be easily solved by assigning a typed series as a value in the mapping dict: In [39]: df.with_columns(pl.col('name').map_dict({'b': pl.Series("", None, dtype=pl.Utf8)}))
Out[39]:
shape: (4, 1)
┌───────────┐
│ name │
│ --- │
│ list[str] │
╞═══════════╡
│ null │
│ null │
│ null │
│ null │
└───────────┘ |
I wouldn't say that. I'd say it actively hurts it. Now the dtype is In [4]: df.with_columns(pl.col('name').map_dict({'b': None}).cast(pl.Utf8))
Out[4]:
shape: (4, 1)
┌──────┐
│ name │
│ --- │
│ str │
╞══════╡
│ null │
│ null │
│ null │
│ null │
└──────┘ |
Oops, missed the extra |
Reread the example properly this time. >>> df.with_columns(
... pl.col("name").map_dict(
... {"b": None},
... default=pl.col("name")
... )
... )
shape: (4, 1)
┌──────┐
│ name │
│ --- │
│ str │
╞══════╡
│ a │
│ null │
│ c │
│ d │
└──────┘ |
I want to note that at this point Following up with some more context/reflections on why this was unexpected behavior for me. First off, like many others I am a pandas -> polars convert, but I have been using polars actively and almost exclusively for several months now. Nevertheless, when I first wanted to "replace X strings with null" the solution wasn't immediately obvious to me, so I googled "polars replace" and my first hit was this stackoverflow post. The updated solution pointed to the new On another note, I want to explain why the |
|
The docstring for map_dict is inproved and mentions now
|
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Hi there, love polars so I wanted to report a potential bug I encountered. I tried converting a specific string to null using
map_dict
and this converted the column's dtype from string to f64 and made every value null. My intent was to map a set of strings to null and in retrospect this is readily doable withpl.when(...).then(...)
. Nevertheless, it was unexpected to see this behavior.Reproducible example
Expected behavior
This works using
pl.when(...).then(...)
Installed versions
The text was updated successfully, but these errors were encountered: