Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fixed .str.contains(..., na=False) for categorical series #22170

Merged
merged 36 commits into from
Nov 20, 2018
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
077d136
BUG: fixed .str.contains(..., na=False) for categorical series
pulkitmaloo Aug 2, 2018
2ae44d1
na argument for _wrap_results
pulkitmaloo Aug 2, 2018
a1b3d7b
fixed str.contains for missing values
pulkitmaloo Aug 17, 2018
dbd990b
PEP8 Issue: added whitespace after ','
pulkitmaloo Aug 17, 2018
9d5d2c2
Merge branch 'master' of https://github.com/pulkitmaloo/pandas into c…
pulkitmaloo Aug 24, 2018
90aef7b
na argument for wrap_results in match
pulkitmaloo Aug 24, 2018
69f16af
BUG: fixed .str.contains(..., na=False) for categorical series
pulkitmaloo Aug 2, 2018
78cf8c7
na argument for _wrap_results
pulkitmaloo Aug 2, 2018
93bb24a
fixed str.contains for missing values
pulkitmaloo Aug 17, 2018
6c2700f
PEP8 Issue: added whitespace after ','
pulkitmaloo Aug 17, 2018
6649129
na argument for wrap_results in match
pulkitmaloo Aug 24, 2018
5c87e81
Merge branch 'categorical_bug' of https://github.com/pulkitmaloo/pand…
pulkitmaloo Aug 24, 2018
a09dcc5
Merge branch 'master' into categorical_bug
pulkitmaloo Aug 25, 2018
3abdea5
Update circle-27-compat.yaml
pulkitmaloo Sep 22, 2018
d136599
Update travis-27-locale.yaml
pulkitmaloo Sep 22, 2018
b942e16
Merge branch 'categorical_bug' of https://github.com/pulkitmaloo/pand…
pulkitmaloo Sep 22, 2018
82f9b9e
added tests for na arg for categorical and objects
pulkitmaloo Sep 22, 2018
53e9253
updated _wrap_results with arg fill_value and removed na
pulkitmaloo Sep 23, 2018
9f0286f
merging
pulkitmaloo Sep 24, 2018
ffa9969
BUG: fixed .str.contains(..., na=False) for categorical series
pulkitmaloo Aug 2, 2018
07c1d73
na argument for _wrap_results
pulkitmaloo Aug 2, 2018
7f1f2e2
fixed str.contains for missing values
pulkitmaloo Aug 17, 2018
f6cb04f
PEP8 Issue: added whitespace after ','
pulkitmaloo Aug 17, 2018
1f0256a
na argument for wrap_results in match
pulkitmaloo Aug 24, 2018
7025f34
Update travis-27-locale.yaml
pulkitmaloo Sep 22, 2018
7542448
added tests for na arg for categorical and objects
pulkitmaloo Sep 22, 2018
f1b4274
updated _wrap_results with arg fill_value and removed na
pulkitmaloo Sep 23, 2018
386ab98
Merge branch 'categorical_bug' of https://github.com/pulkitmaloo/pand…
pulkitmaloo Sep 24, 2018
7a09c44
Update travis-27-locale.yaml
pulkitmaloo Sep 24, 2018
3408920
fixed line too long
pulkitmaloo Sep 24, 2018
3288d11
whatsnew note
pulkitmaloo Sep 25, 2018
6c87770
Update v0.24.0.txt
pulkitmaloo Sep 25, 2018
d242647
Update doc/source/whatsnew/v0.24.0.txt
TomAugspurger Oct 19, 2018
def1b4e
Merge branch 'master' into PR_TOOL_MERGE_PR_22170
jreback Nov 18, 2018
fd99431
whatsnew
jreback Nov 18, 2018
44b36a4
cleanup
jreback Nov 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,7 @@ Numeric
Strings
^^^^^^^

-
- Bug in :class:`StringMethods` where `str_contains()` was not filling missing values with given argument for na for a categorical Series (:issue:`22158`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use double backticks on Series, can you use the actual reference here, e.g. :func:`Series.str.contains`

Copy link
Contributor Author

@pulkitmaloo pulkitmaloo Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Thanks, I've updated it. Please let me know if there's anything else you'd like to change.

Copy link
Contributor Author

@pulkitmaloo pulkitmaloo Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Can you please review and merge this PR.

-
-

Expand Down
9 changes: 5 additions & 4 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -1862,7 +1862,7 @@ def __iter__(self):
g = self.get(i)

def _wrap_result(self, result, use_codes=True,
name=None, expand=None):
name=None, expand=None, fill_value=np.nan):

from pandas.core.index import Index, MultiIndex

Expand All @@ -1872,7 +1872,8 @@ def _wrap_result(self, result, use_codes=True,
# so make it possible to skip this step as the method already did this
# before the transformation...
if use_codes and self._is_categorical:
result = take_1d(result, self._orig.cat.codes)
result = take_1d(result, self._orig.cat.codes,
fill_value=fill_value)

if not hasattr(result, 'ndim') or not hasattr(result, 'dtype'):
return result
Expand Down Expand Up @@ -2499,12 +2500,12 @@ def join(self, sep):
def contains(self, pat, case=True, flags=0, na=np.nan, regex=True):
result = str_contains(self._parent, pat, case=case, flags=flags, na=na,
regex=regex)
return self._wrap_result(result)
return self._wrap_result(result, fill_value=na)

@copy(str_match)
def match(self, pat, case=True, flags=0, na=np.nan):
result = str_match(self._parent, pat, case=case, flags=flags, na=na)
return self._wrap_result(result)
return self._wrap_result(result, fill_value=na)

@copy(str_replace)
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
Expand Down
24 changes: 20 additions & 4 deletions pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,10 +543,26 @@ def test_contains(self):
assert result.dtype == np.bool_
tm.assert_numpy_array_equal(result, expected)

# na
values = Series(['om', 'foo', np.nan])
res = values.str.contains('foo', na="foo")
assert res.loc[2] == "foo"
# na for category
jreback marked this conversation as resolved.
Show resolved Hide resolved
values = Series(["a", "b", "c", "a", np.nan], dtype="category")
result = values.str.contains('a', na=True).astype(object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the .astype(object). What's the dtype when you don't do that? Categorical? If so, I'd rather we document that as part of the API and test it.

expected = Series([True, False, False, True, True], dtype=np.object_)
assert isinstance(result, Series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed. The assert_series_equal checks that.

tm.assert_series_equal(result, expected)

result = values.str.contains('a', na=False).astype(object)
expected = Series([True, False, False, True, False], dtype=np.object_)
tm.assert_series_equal(result, expected)

# na for objects
values = Series(["a", "b", "c", "a", np.nan])
result = values.str.contains('a', na=True).astype(object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for the astype object.

expected = Series([True, False, False, True, True], dtype=np.object_)
tm.assert_series_equal(result, expected)

result = values.str.contains('a', na=False).astype(object)
expected = Series([True, False, False, True, False], dtype=np.object_)
tm.assert_series_equal(result, expected)

def test_startswith(self):
values = Series(['om', NA, 'foo_nom', 'nom', 'bar_foo', NA, 'foo'])
Expand Down