Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: String.capitalize() is broken/inconsistent #8271

Closed
1 task done
NickCrews opened this issue Feb 7, 2024 · 0 comments · Fixed by #8270
Closed
1 task done

bug: String.capitalize() is broken/inconsistent #8271

NickCrews opened this issue Feb 7, 2024 · 0 comments · Fixed by #8270
Labels
bug Incorrect behavior inside of ibis

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Feb 7, 2024

What happened?

See failing tests in #8270

Several issues:

  1. There is an "easy" problem where Capitalize(None) -> "". for sqlalchemy backends. That's just a simple bug.

Trickier are deciding on semantics:

  1. should we capitalize every "word", or just the first letter of the string? postgres and polars does the first, the current sqlalchemy implementation does the second. Not sure about other backends.
  2. what is our definition of a word boundary? postgres says "words are defined as sequences of alphanumeric characters separated by non-alphanumeric characters." polars splits on whitespace source Oh boy this won't be fun.

I would say we should punt on the hard token thing, and use the same semantics python's str.capitalize(), with first letter upper, the rest lower. Then we just need to bring all the backends in line with this.

Later, we can add a .title() or similar to do the the per-token capitalization.

What version of ibis are you using?

main

What backend(s) are you using, if any?

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the bug Incorrect behavior inside of ibis label Feb 7, 2024
cpcloud pushed a commit that referenced this issue Feb 15, 2024
…nds (#8270)

Fixes #8271.

BREAKING CHANGE: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word's definition being backend-specific).
ncclementi pushed a commit to ncclementi/ibis that referenced this issue Feb 21, 2024
…nds (ibis-project#8270)

Fixes ibis-project#8271.

BREAKING CHANGE: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word's definition being backend-specific).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant