You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a Series S, I find the S.str.extract method very useful. It is great how you implemented naming the resulting DataFrame columns according to the names specified in the capturing groups of the regular expression.
However there seems to be a bug when there is a capture group named "name" for example
@jreback should we prevent a user from using 'name' as one of the regex capture group names?
(I may be wrong here)
The problem seems to occur because in pandas/core/strings.py _wrap_result getattr(result, 'name', None)
returns the 'name' column/series instead of the name attribute.
The name attribute is not set for the result return variable in str_extract, so the result from getattr would default to None unless we are talking about @tdhock 's usecase or another method calling _wrap_result explicitly sets a value for name in result.
One solution, I suppose would be to check in f inside str_extract if one of the named groups in the pattern is called 'name', but idk if this is a good approach to solving this.
Something like:
def str_extract(arr, pat, flags=0):
#omitting extra stuff
def f(x):
if not isinstance(x, compat.string_types):
return empty_row
m = regex.search(x)
if m:
if "name" in m.groupdict().keys():
#do something to warn user
else:
return [np.nan if item is None else item for item in m.groups()]
else:
return empty_row
For a Series
S
, I find theS.str.extract
method very useful. It is great how you implemented naming the resulting DataFrame columns according to the names specified in the capturing groups of the regular expression.However there seems to be a bug when there is a capture group named "name" for example
The result I expected was
The text was updated successfully, but these errors were encountered: