Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Series.str.casefold #25419

Merged
merged 17 commits into from
Feb 28, 2019
1 change: 1 addition & 0 deletions doc/source/reference/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,7 @@ strings and apply several methods to it. These can be accessed like
:template: autosummary/accessor_method.rst

Series.str.capitalize
Series.str.casefold
Series.str.cat
Series.str.center
Series.str.contains
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,7 @@ Method Summary
:meth:`~Series.str.partition`;Equivalent to ``str.partition``
:meth:`~Series.str.rpartition`;Equivalent to ``str.rpartition``
:meth:`~Series.str.lower`;Equivalent to ``str.lower``
:meth:`~Series.str.casefold`;Equivalent to ``str.casefold``
:meth:`~Series.str.upper`;Equivalent to ``str.upper``
:meth:`~Series.str.find`;Equivalent to ``str.find``
:meth:`~Series.str.rfind`;Equivalent to ``str.rfind``
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Other Enhancements
- Indexing of ``DataFrame`` and ``Series`` now accepts zerodim ``np.ndarray`` (:issue:`24919`)
- :meth:`Timestamp.replace` now supports the ``fold`` argument to disambiguate DST transition times (:issue:`25017`)
- :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :meth:`datetime.time` objects with timezones (:issue:`24043`)
- ``Series.str`` has gained :meth:`Series.str.casefold` method to removes all case distinctions present in a string (:issue:`25405`)
- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`)
- :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`)
-
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -2943,6 +2943,8 @@ def rindex(self, sub, start=0, end=None):
remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
uppercase.
Series.str.casefold: Removes all case distinctions in the string.
.. versionadded:: 0.25.0

Examples
--------
Expand Down Expand Up @@ -2995,6 +2997,7 @@ def rindex(self, sub, start=0, end=None):
_shared_docs['capitalize'] = dict(type='be capitalized',
method='capitalize')
_shared_docs['swapcase'] = dict(type='be swapcased', method='swapcase')
_shared_docs['casefold'] = dict(type='be casefolded', method='casefold')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the versionadded 0.25.0 here (may need to add it to the dict to be formatted)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @jreback added!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u verify this renders ok in the terminal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, yeah, I printed the docstring, looks fine for me. @jreback

Convert strings in the Series/Index to be casefolded.

Equivalent to :meth:`str.casefold`.

Returns
-------
Series/Index of objects

See Also

Series.str.lower : Converts all characters to lowercase.
Series.str.upper : Converts all characters to uppercase.
Series.str.title : Converts first character of each word to uppercase and
remaining to lowercase.
Series.str.capitalize : Converts first character to uppercase and
remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
uppercase
Series.str.casefold: Removes all case distinctions in the string.
   .. versionadded:: 0.25.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this render well on the web? Not aware of any other instance where the version added is in the See Also section. May be worth messing around with substitution to put it in the Summary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rendering looks fine for me in the terminal. Since the summary of rendering only allows two parameters, and among these methods, only this Series.str.casefold is added in the new version. So i put this in the See also under Series.str.casefold. Otherwise, i assume the rendering might need to be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to generate one HTML file rather easily:

https://python-sprints.github.io/pandas/guide/pandas_pr.html#visual-validation-of-the-docstring

If you can double check there would be preferable as majority of users will interface to docs via HTML.

Copy link
Member Author

@charlesdong1991 charlesdong1991 Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, do you have any idea why it's complaining AttributeError: type object 'StringMethods' has no attribute 'casefold'? is this quoting master? @WillAyd

Copy link
Member Author

@charlesdong1991 charlesdong1991 Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, i tried something like below to put version added in the summary, but when i kind of manually inserting a blank line for other methods since they don't have version added issue. I would like to hear your advice on how to improve this? @WillAyd

    _shared_docs['casemethods'] = ("""
    Convert strings in the Series/Index to %(type)s.

    Equivalent to :meth:`str.%(method)s`.
    %(version)s

........

_shared_docs['casefold'] = dict(type='be casefolded', method='casefold',
                                    version='.. versionadded:: 0.25.0')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you running to get the AttributeError? As far as the second comment, probably easiest to add a newline(s) into the version argument

lower = _noarg_wrapper(lambda x: x.lower(),
docstring=_shared_docs['casemethods'] %
_shared_docs['lower'])
Expand All @@ -3010,6 +3013,9 @@ def rindex(self, sub, start=0, end=None):
swapcase = _noarg_wrapper(lambda x: x.swapcase(),
docstring=_shared_docs['casemethods'] %
_shared_docs['swapcase'])
casefold = _noarg_wrapper(lambda x: x.casefold(),
docstring=_shared_docs['casemethods'] %
_shared_docs['casefold'])

_shared_docs['ismethods'] = ("""
Check whether all characters in each string are %(type)s.
Expand Down
11 changes: 10 additions & 1 deletion pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def assert_series_or_index_equal(left, right):
'len', 'lower', 'lstrip', 'partition',
'rpartition', 'rsplit', 'rstrip',
'slice', 'slice_replace', 'split',
'strip', 'swapcase', 'title', 'upper'
'strip', 'swapcase', 'title', 'upper', 'casefold'
], [()] * 100, [{}] * 100))
ids, _, _ = zip(*_any_string_method) # use method name as fixture-id

Expand Down Expand Up @@ -3424,3 +3424,12 @@ def test_method_on_bytes(self):
expected = Series(np.array(
['ad', 'be', 'cf'], 'S2').astype(object))
tm.assert_series_equal(result, expected)

@pytest.mark.skipif(compat.PY2, reason='not in python2')
def test_casefold(self):
# GH25405
casefolded = Series(['ss', NA, 'case', 'ssd'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call this expected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @WillAyd i just changed!

s = Series(['ß', NA, 'case', 'ßd'])
result = s.str.casefold()

tm.assert_series_equal(result, casefolded)