Skip to content

Conversation

hamdanal
Copy link

Similar to str.fullmatch and other methods that accept regular expressions

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Similar to str.fullmatch and other methods
----------
pat : str
Character sequence.
Character sequence or regular expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by regular expression do you mean a string that is interpreted as a regular expression or a compiled regular expression object?

to avoid confusion, if the former then no doc change probably needed, if the later the type hints in the signature would also need to be updated and some code changes required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he meant a compiled regular expression, this is how we are trying to type it in the stubs.
I believe we should align all the docs, since it uses the functions of re under the hood the functions below support re.Pattern so compiled regular expression is also accepted at runtime.
If we look at the docs it seems like it is a bit unclear what regular expression means because I would assume it is just a regular string in the for r"...".
So the question is should we allow for compiled regular expression as it is supported at runtime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation is the official API. If the stubs have been updated to reflect the types that are accepted then this is the tail wagging the dog? If we update the documentation, then we also need to update the type annotations in the code as well as ensure that the behavior is tested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and you are correct, I think the confusion originally came from regular expression != compiled regex.
But then I went into the stubs and it seems like we are testing for it:

def test_replace_compiled_regex_mixed_object():
pat = re.compile(r"BAD_*")
ser = Series(
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0]
)
result = Series(ser).str.replace(pat, "", regex=True)
expected = Series(
["a", np.nan, "b", np.nan, np.nan, "foo", None, np.nan, np.nan], dtype=object
)
tm.assert_series_equal(result, expected)

So the question would be to clarify what do we mean by regular expression, is it compiled or not, and so we can:

  • clarify the docs
  • update the stubs according to allow or not re.Pattern[str]

Please let us know @simonjayhawkins.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that I understood regular expression to mean both string and compiled patterns but this PR is meant to bring the match method in line with the other str methods.

I can open another PR to clarify the docs and update the inline types if there is consensus.

@simonjayhawkins simonjayhawkins added Docs API Design API - Consistency Internal Consistency of API/Behavior labels Jul 17, 2025
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 26, 2025
@simonjayhawkins
Copy link
Member

If we update the documentation, then we also need to update the type annotations in the code as well as ensure that the behavior is tested?

Its appears that the change proposed in this PR has been done in #61964 along with updates to type annotations as tests added as suggested.

@hamdanal hamdanal deleted the str-match-re-doc branch September 1, 2025 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior API Design Docs Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants