-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str.replace_many
to take a dictionary that defines a replacement mapping.
#17220
Comments
just use map = { df.with_columns( |
Good suggestion - we can do the same 'trick' as we do in |
Is there a way that we can make this work with regex ? I have tried something like: import regex
import polars
df = polars.DataFrame({
"x": ["a", "b", "c", "1", "2", "3"],
})
map = {
regex.compile(r"[a-z]"): "alpha",
regex.compile(r"[0-9]"): "digit",
}
df.with_columns(
polars.col("x").replace(map)
) For my personal use case, I need to replace a large number of regex patterns, and it's not very ergonomic to use two lists because it can be hard to keep track of what is replacing what. Another possibility is a list of tuples: map = [
(r"[a-z]", "alpha"),
(r"[0-9]", "digit"),
] This (in my opinion) is nicer to follow than something like: old = [r"[a-z]", r"[0-9]"]
new = ["alpha", "digit"] |
A dict is just going to be syntactic sugar. You can just define your map and then call |
That gives me:
|
Call |
I must be missing something, as none of the following work for me: ## ============================================================================
import regex
import polars
## ============================================================================
df = polars.DataFrame(
{
"x": ["a", "b", "c", "1", "2", "3"],
}
)
## ============================================================================
map = {
regex.compile(r"[a-z]"): "alpha",
regex.compile(r"[0-9]"): "digit",
}
df.with_columns(polars.col("x").replace(map))
df.with_columns(polars.col("x").replace(map.keys(), map.values()))
df.with_columns(polars.col("x").replace(list(map.keys()), list(map.values())))
## ============================================================================
map = {
r"[a-z]": "alpha",
r"[0-9]": "digit",
}
df.with_columns(polars.col("x").str.replace_many(map))
df.with_columns(polars.col("x").str.replace_many(map.keys(), map.values()))
df.with_columns(polars.col("x").str.replace_many(list(map.keys()), list(map.values())))
## ============================================================================ All throw an exception except the third (when calling |
There are a few different issues:
Polars uses the Rust crate https://github.com/rust-lang/regex - so you must pass "strings" when using the regex functions.
It uses https://github.com/BurntSushi/aho-corasick which works with "literal strings" only. It sounds like you may really be asking for: |
Description
Currently the
str.replace_many
method takes two lists for the original and replacement strings.It would be handy to include the ability to just pass a dictionary which defines the replacement mapping:
The text was updated successfully, but these errors were encountered: