Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: list of lists in Series.str.cat #21950

Closed
h-vetinari opened this issue Jul 17, 2018 · 1 comment

Comments

Projects
None yet
3 participants
@h-vetinari
Copy link
Contributor

commented Jul 17, 2018

The .str.cat-method is the only one in the str-accessor that takes another Series as an argument, and as such, is a bit of a special case (e.g. it had no index alignment until v0.23).

It makes sense to support lists of objects which get concatenated sequentially, and list of lists have been supported since at least v0.17, see https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.str.cat.html

When I wrote #20347, I tried very hard to keep signature backwards-compatible, and the example from the v0.17-22 docs working:

>>> Series(['a', 'b']).str.cat([['x', 'y'], ['1', '2']], sep=',')
0    a,x,1
1    b,y,2
dtype: object

However, this added lots of complexity, and I think that this should be simplified, especially in light of @TomAugspurger's comment in #21894

As a reminder, the plan is to have no new deprecations in 0.25.x and 1.0.0. So this [v0.24] is the last round of deprecations before 1.0.

My suggestion is to modify the allowed combinations (as of v0.23) as follows:

Type of "others"                        |  action  |  comment
---------------------------------------------------------------------
list-like of strings                    |   keep   |  as before; mimics behavior elsewhere,
                                                      cf.: pd.Series(range(3)) + [2,4,6]
Series                                  |   keep   |
np.ndarray (1-dim)                      |   keep   |
DataFrame                               |   keep   |  sequential concatenation
np.ndarray (2-dim)                      |   keep   |  sequential concatenation
list-like of
    Series/Index/np.ndarray (1-dim)     |   keep   |  sequential concatenation
list-like containing list-likes (1-dim)
    other than Series/Index/np.ndarray  |   DEPR   |  sequential concatenation

In other words, if the user wants sequential concatenation, there are many possibilities available, and list-of-lists does not have to be one of them, IMO. This would substantially simplify (post-deprecation) the code for str.cat._get_series_list, which is currently a bit complicated. https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/strings.py#L2089

Finally, for completeness, the example from the v0.17-22 docs has been removed for v0.23, but there are two examples in https://pandas.pydata.org/pandas-docs/stable/text.html#concatenating-a-series-and-many-objects-into-a-series that would fall under the deprecation I'm suggesting.

@h-vetinari h-vetinari referenced this issue Jul 17, 2018

Open

DEPR: let's deprecate #18262

22 of 35 tasks complete
@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2018

To check for myself, and as a little "advert" for this proposal, I wanted to see what _get_series_list would look like with the above deprecation (and removal of the FutureWarning, once str.cat aligns by default). This is to be compared with https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/strings.py#L2089

def _get_series_list(self, others):
    """
    Auxiliary function for :meth:`str.cat`. Turn potentially mixed input
    into a list of Series (elements without an index must match the length
    of the calling Series/Index).

    Parameters
    ----------
    others : Series, DataFrame, np.ndarray, list-like or list-like of
        objects that are either Series, Index or np.ndarray (1-dim)

    Returns
    -------
    list : others transformed into list of Series
    """
    from pandas import Index, Series, DataFrame

    # self._orig is either Series or Index
    idx = self._orig if isinstance(self._orig, Index) else self._orig.index

    # Generally speaking, all objects without an index inherit the index
    # `idx` of the calling Series/Index - i.e. must have matching length.
    # Objects with an index (i.e. Series/Index/DataFrame) keep their own.
    if isinstance(others, Series):
        return [others]
    elif isinstance(others, Index):
        return [Series(others.values, index=others)]
    elif isinstance(others, DataFrame):
        return [others[x] for x in others]
    elif isinstance(others, np.ndarray) and others.ndim == 2:
        others = DataFrame(others, index=idx)
        return [others[x] for x in others]
    elif is_list_like(others):
        others = list(others)  # ensure iterators do not get read twice etc

        # in case of list-like `others`, all elements must be
        # either Series/Index/np.ndarray (1-dim)...
        if all(isinstance(x, (Series, Index))
               or (isinstance(x, np.ndarray) and x.ndim == 1) for x in others):
            los = []
            while others:  # iterate through list and append each element
                los = los + self._get_series_list(others.pop(0))
            return los
        # ... or just strings
        elif all(not is_list_like(x) for x in others):
            return [Series(others, index=idx)]
    raise TypeError('others must be Series, Index, DataFrame, np.ndarrary or '
                    'list-like (either containing only strings or containing '
                    'only objects of type Series/Index/np.ndarray[1-dim])')

@h-vetinari h-vetinari referenced this issue Aug 9, 2018

Merged

DEPR: list-likes of list-likes in str.cat #22264

4 of 4 tasks complete

@jreback jreback added this to the 0.24.0 milestone Aug 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.