Skip to content

Conversation

zishan044
Copy link

Problem

PyArrow string arrays incorrectly raised NotImplementedError for numeric group references (\1, \2) in str.replace, even though PyArrow supports them. This caused failures when using regex patterns with group references on PyArrow-backed string arrays.

Solution

Modified the condition in pandas/core/arrays/_arrow_string_mixins.py to only block named group references (\g<name>) while allowing numeric group references.

Changes Made

  • Removed the blanket prohibition on all backslash+digit patterns in replacement strings
  • Only block named group references (detected by r"\g<" in repl)
  • All other unsupported features (callable repl, case=False, flags != 0, re.Pattern objects) still properly raise NotImplementedError

Testing

Manually verified the fix works with:

  • Basic numeric group replacement: s.str.replace(r'\[(\d+)\]', r'(\1)', regex=True)
  • Multiple numeric groups: s.str.replace(r'(\w+)\[(\d+)\](\w+)', r'\1-\2-\3', regex=True)
  • Confirmed named groups still properly raise NotImplementedError

Example

# This now works (was broken before):
s = pd.Series(["var.one[0]", "var.two[1]"]).convert_dtypes(dtype_backend="pyarrow")
result = s.str.replace(r'\[(\d+)\]', r'(\1)', regex=True)
# result: ["var.one(0)", "var.two(1)"]

# This still properly fails (named groups not supported):
s.str.replace(r'\[(?P<digit>\d+)\]', r'(\g<digit>)', regex=True)  # Raises NotImplementedError

Note: I haven't added formal tests yet due to local environment issues with the test framework, but I've manually verified the fix works. I'm happy to add tests if maintainers can help me resolve the test environment setup.

@Alvaro-Kothe
Copy link
Member

I haven't added formal tests yet due to local environment issues with the test framework

Most of the information needed to setup the development environment is available here: https://pandas.pydata.org/docs/dev/development/contributing_environment.html

What are you having difficulty with?

@Alvaro-Kothe Alvaro-Kothe added Bug Arrow pyarrow functionality labels Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Series.str.replace stopped working with regex groups

2 participants