Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

col.str.contains does not respect case insensitive regex object #82

Closed
kymckay opened this issue Jul 11, 2023 · 2 comments
Closed

col.str.contains does not respect case insensitive regex object #82

kymckay opened this issue Jul 11, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@kymckay
Copy link

kymckay commented Jul 11, 2023

Have you tried latest version of polars?

  • yes

What version of polars are you using?

0.8.0

What operating system are you using polars on?

Debian

What node version are you using

v18.13.0

Describe your bug.

Using the str.contains function of a column doesn't respect the case insensitive flag of a provided regex object.

What are the steps to reproduce the behavior?

Use str.contains on a column, providing a regex object created with the "i" flag for case insensitivity. Test it on a sample with different casing than the original regex pattern.

Example

import pl from "nodejs-polars"

let df = pl.DataFrame({
    "text": ["foo", "FOO", "FoO"],
})

const regex = new RegExp("foo", "i")

df = df.withColumn(pl.col("text").str.contains(regex).alias("result"))

console.log(df.toString())

What is the actual behavior?

The contains function does not match all case variations.

shape: (3, 2)
┌──────┬────────┐
│ text ┆ result │
│ ---  ┆ ---    │
│ str  ┆ bool   │
╞══════╪════════╡
│ foo  ┆ true   │
│ FOO  ┆ false  │
│ FoO  ┆ false  │
└──────┴────────┘

What is the expected behavior?

The contains function should match all variations. Probably by injecting the appropriate (?i) and (?-i) flags for polars to interpret.

shape: (3, 2)
┌──────┬────────┐
│ text ┆ result │
│ ---  ┆ ---    │
│ str  ┆ bool   │
╞══════╪════════╡
│ foo  ┆ true   │
│ FOO  ┆ true   │
│ FoO  ┆ true   │
└──────┴────────┘
@kymckay kymckay added the bug Something isn't working label Jul 11, 2023
@kymckay
Copy link
Author

kymckay commented Jul 11, 2023

For what it's worth this is easily worked around using a string and the (?i), (?-i) flags as needed. I'm just raising for posterity.

@universalmind303
Copy link
Collaborator

closed via #92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants