Skip to content

Commit

Permalink
BUG: Fix for .str.replace with repl function
Browse files Browse the repository at this point in the history
.str.replace now accepts a callable (function) as replacement string.
It now raises a TypeError when repl is not string like nor a callable.
Docstring updated accordingly.

closes #15055

Author: Joost Kranendonk <Joost.Kranendonk@hzpc.nl>
Author: Joost Kranendonk <joost.kranendonk@hzpc.nl>

Closes #15056 from hzpc-joostk/pandas-GH15055-patch-1 and squashes the following commits:

826730c [Joost Kranendonk] simplify .str.replace TypeError reraising and test
90779ce [Joost Kranendonk] fix linting issues
c2cc13a [Joost Kranendonk] Update v0.20.0.txt
e15dcdf [Joost Kranendonk] fix bug catch TypeError with wrong number of args
40c0d97 [Joost Kranendonk] improve .str.replace with callable
f15ee2a [Joost Kranendonk] improve test .str.replace with callable
14beb21 [Joost Kranendonk] Add test for .str.replace with regex named groups
27065a2 [Joost Kranendonk] Reraise TypeError only with wrong number of args
ae04a3e [Joost Kranendonk] Add whatsnew for .str.replace with callable repl
067a7a8 [Joost Kranendonk] Fix testing bug for .str.replace
30d4727 [Joost Kranendonk] Bug fix in .str.replace type checking done wrong
4baa0a7 [Joost Kranendonk] add tests for .str.replace with callable repl
91c883d [Joost Kranendonk] Update .str.replace docstring
6ecc43d [Joost Kranendonk] BUG: Fix for .str.replace with repl function
  • Loading branch information
Joost Kranendonk authored and jreback committed Jan 23, 2017
1 parent a1b6587 commit be3f2ae
Show file tree
Hide file tree
Showing 4 changed files with 121 additions and 8 deletions.
21 changes: 20 additions & 1 deletion doc/source/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,25 @@ following code will cause trouble because of the regular expression meaning of
# We need to escape the special character (for >1 len patterns)
dollars.str.replace(r'-\$', '-')
The ``replace`` method can also take a callable as replacement. It is called
on every ``pat`` using :func:`re.sub`. The callable should expect one
positional argument (a regex object) and return a string.

.. versionadded:: 0.20.0

.. ipython:: python
# Reverse every lowercase alphabetic word
pat = r'[a-z]+'
repl = lambda m: m.group(0)[::-1]
pd.Series(['foo 123', 'bar baz', np.nan]).str.replace(pat, repl)
# Using regex groups
pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
repl = lambda m: m.group('two').swapcase()
pd.Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)
Indexing with ``.str``
----------------------

Expand Down Expand Up @@ -406,7 +425,7 @@ Method Summary
:meth:`~Series.str.join`;Join strings in each element of the Series with passed separator
:meth:`~Series.str.get_dummies`;Split strings on the delimiter returning DataFrame of dummy variables
:meth:`~Series.str.contains`;Return boolean array if each string contains pattern/regex
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string or the return value of a callable given the occurrence
:meth:`~Series.str.repeat`;Duplicate values (``s.str.repeat(3)`` equivalent to ``x * 3``)
:meth:`~Series.str.pad`;"Add whitespace to left, right, or both sides of strings"
:meth:`~Series.str.center`;Equivalent to ``str.center``
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ New features
~~~~~~~~~~~~

- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
- ``.str.replace`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)



Expand Down
70 changes: 63 additions & 7 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,17 @@ def _map(f, arr, na_mask=False, na_value=np.nan, dtype=object):
try:
convert = not all(mask)
result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert)
except (TypeError, AttributeError):
except (TypeError, AttributeError) as e:
# Reraise the exception if callable `f` got wrong number of args.
# The user may want to be warned by this, instead of getting NaN
if compat.PY2:
p_err = r'takes (no|(exactly|at (least|most)) ?\d+) arguments?'
else:
p_err = (r'((takes)|(missing)) (?(2)from \d+ to )?\d+ '
r'(?(3)required )positional arguments?')

if len(e.args) >= 1 and re.search(p_err, e.args[0]):
raise e

def g(x):
try:
Expand Down Expand Up @@ -303,8 +313,13 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
----------
pat : string
Character sequence or regular expression
repl : string
Replacement sequence
repl : string or callable
Replacement string or a callable. The callable is passed the regex
match object and must return a replacement string to be used.
See :func:`re.sub`.
.. versionadded:: 0.20.0
n : int, default -1 (all)
Number of replacements to make from start
case : boolean, default True
Expand All @@ -315,12 +330,53 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
Returns
-------
replaced : Series/Index of objects
Examples
--------
When ``repl`` is a string, every ``pat`` is replaced as with
:meth:`str.replace`. NaN value(s) in the Series are left as is.
>>> Series(['foo', 'fuz', np.nan]).str.replace('f', 'b')
0 boo
1 buz
2 NaN
dtype: object
When ``repl`` is a callable, it is called on every ``pat`` using
:func:`re.sub`. The callable should expect one positional argument
(a regex object) and return a string.
To get the idea:
>>> Series(['foo', 'fuz', np.nan]).str.replace('f', repr)
0 <_sre.SRE_Match object; span=(0, 1), match='f'>oo
1 <_sre.SRE_Match object; span=(0, 1), match='f'>uz
2 NaN
dtype: object
Reverse every lowercase alphabetic word:
>>> repl = lambda m: m.group(0)[::-1]
>>> Series(['foo 123', 'bar baz', np.nan]).str.replace(r'[a-z]+', repl)
0 oof 123
1 rab zab
2 NaN
dtype: object
Using regex groups:
>>> pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
>>> repl = lambda m: m.group('two').swapcase()
>>> Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)
0 bAR
1 NaN
dtype: object
"""

# Check whether repl is valid (GH 13438)
if not is_string_like(repl):
raise TypeError("repl must be a string")
use_re = not case or len(pat) > 1 or flags
# Check whether repl is valid (GH 13438, GH 15055)
if not (is_string_like(repl) or callable(repl)):
raise TypeError("repl must be a string or callable")
use_re = not case or len(pat) > 1 or flags or callable(repl)

if use_re:
if not case:
Expand Down
37 changes: 37 additions & 0 deletions pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,43 @@ def test_replace(self):
values = klass(data)
self.assertRaises(TypeError, values.str.replace, 'a', repl)

def test_replace_callable(self):
# GH 15055
values = Series(['fooBAD__barBAD', NA])

# test with callable
repl = lambda m: m.group(0).swapcase()
result = values.str.replace('[a-z][A-Z]{2}', repl, n=2)
exp = Series(['foObaD__baRbaD', NA])
tm.assert_series_equal(result, exp)

# test with wrong number of arguments, raising an error
if compat.PY2:
p_err = r'takes (no|(exactly|at (least|most)) ?\d+) arguments?'
else:
p_err = (r'((takes)|(missing)) (?(2)from \d+ to )?\d+ '
r'(?(3)required )positional arguments?')

repl = lambda: None
with tm.assertRaisesRegexp(TypeError, p_err):
values.str.replace('a', repl)

repl = lambda m, x: None
with tm.assertRaisesRegexp(TypeError, p_err):
values.str.replace('a', repl)

repl = lambda m, x, y=None: None
with tm.assertRaisesRegexp(TypeError, p_err):
values.str.replace('a', repl)

# test regex named groups
values = Series(['Foo Bar Baz', NA])
pat = r"(?P<first>\w+) (?P<middle>\w+) (?P<last>\w+)"
repl = lambda m: m.group('middle').swapcase()
result = values.str.replace(pat, repl)
exp = Series(['bAR', NA])
tm.assert_series_equal(result, exp)

def test_repeat(self):
values = Series(['a', 'b', NA, 'c', NA, 'd'])

Expand Down

0 comments on commit be3f2ae

Please sign in to comment.