Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: #9847, adding a "same" and "expand" param to StringMethods.split() #9870

Closed
wants to merge 7 commits into from

Conversation

sreejata
Copy link

closes #9847

sreejata and others added 3 commits April 13, 2015 12:34
Fix: pandas-dev#9847, adding a "same" and "expand" param to the StringMethods.spit
@jreback jreback added API Design Strings String extension data type and string data labels Apr 13, 2015
@jreback jreback added this to the 0.16.1 milestone Apr 13, 2015
@TomAugspurger
Copy link
Contributor

I haven't read through all the issues, but has expanding an Index to a MultiIndex (if needed) been proposed? Analogous to Series.str.* -> DataFrame

@jreback
Copy link
Contributor

jreback commented Apr 13, 2015

@TomAugspurger ohh, that would be nice, but let's make another issue about that.

sreejata and others added 4 commits April 13, 2015 13:49
Adding a future deprecation warning to split params
Adding doc string to say deprecated params
@@ -624,6 +624,7 @@ def str_pad(arr, width, side='left', fillchar=' '):

def str_split(arr, pat=None, n=None, return_type='series'):
"""
Deprecated: return_types 'series', 'index', 'frame' are now deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this comment to the return_type explanation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain: the first sentence is used in summary tables on the API pages, so this is not what we want to show there.

@shoyer
Copy link
Member

shoyer commented Apr 13, 2015

Did the 'index' option ever make it into a release? I'm pretty sure it didn't, in which case we don't need to preserve it.

@jorisvandenbossche
Copy link
Member

@TomAugspurger yes, I also thought about that! Looks like a very good idea to me, but indeed lets discuss in a separate issue (eg I think it should be in the main namespace, not in str)

@jorisvandenbossche
Copy link
Member

Indeed, 'index' should not be mentioned.

@jorisvandenbossche
Copy link
Member

I am still slightly in favor of having a expand=True/False instead of return_type='same'/'expand'.

Keeping return_type='series'/'frame' as backcompat has to happen in both cases, and it is only there for one release.

@shoyer
Copy link
Member

shoyer commented Apr 13, 2015

Yes, expand=True/False seems like a slightly cleaner API than string options.

@@ -649,11 +650,14 @@ def str_split(arr, pat=None, n=None, return_type='series'):
from pandas.core.frame import DataFrame
from pandas.core.index import Index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a minor point but it might be good idea to use a dict to organize the mapping from old to new arg names, at the beginning of the function, something like:

arg_map = {'series': 'same', 'index': 'same', 'frame': 'expand'}
if return_type in arg_map:
    warnings.warn("return_type='%s' is deprecated, please use '%s' instead" % (return_type, arg_map[return_type]), FutureWarning)
    return_type = arg_map[return_type]

this way the rest of the code doesn't have to contain references to the old names, making it easier to remove the deprecation stuff later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated to use expand=True/False or keep as same/expand?

@jreback
Copy link
Contributor

jreback commented Apr 28, 2015

@sreejata can you update for the comments above?

@mortada
Copy link
Contributor

mortada commented Apr 29, 2015

I'm also slightly in favor of expand=True/False. It seems cleaner and more intuitive especially since we'd like to do a similar expansion for Index.str -> MultiIndex in v0.17.0

@jreback
Copy link
Contributor

jreback commented Apr 29, 2015

ok, let's go with expand=True/False then. This should be complmetely back-compat (use the deprecate decorator)

@sreejata can you update

@sinhrks
Copy link
Member

sinhrks commented May 3, 2015

I've implemented the expand wrapper in #9773. #9773 work as below

Invoked by expand=False expand=True (default)
Series Series DataFrame
Index Index (of tuples) MultiIndex

I think the points are

split (this PR):

  • Can follow the above behavior simply replacing return_type to expand.
  • Whether to change the defualt to True on this occasion.

extract (#9985)

  • Expected Series and Index to work as the same.
  • Adding extract option breaks current Series behavior which implicitly decide returning type.

get_dummies (#9985)

  • Expected Series and Index to work as the same.
  • Adding extract option breaks current Series behavior which always return DataFrame.

@jreback
Copy link
Contributor

jreback commented May 4, 2015

@sinhrks

this needs to handle the backward compat issue (by deprecating the existing return_type=series/frame). The index is new in 0.16.1, so no need to handle that.

@jreback
Copy link
Contributor

jreback commented May 4, 2015

@sinhrks can you split out the return_type changes (and incorporate the deprecation warning), from #9773 (as that is for partition, yes?). Then can close this PR.

@sinhrks
Copy link
Member

sinhrks commented May 5, 2015

@jreback #9773 the logic is implemented in _wrap_result_expand, and it can be used in str.split as an alternative of _wrap_result. I think changes required in this PR are:

@jreback
Copy link
Contributor

jreback commented May 7, 2015

@sinhrks are you able to put in the deprecation of the existing return_type logic? (in favor or expand)?

@sinhrks
Copy link
Member

sinhrks commented May 8, 2015

@sreejata Can you update above?

@jreback If this needs in v0.16.1, I'll work on it.

@jreback
Copy link
Contributor

jreback commented May 8, 2015

ideally this could be fixed in next day or 2

@jreback jreback modified the milestones: 0.17.0, 0.16.1 May 8, 2015
@jreback
Copy link
Contributor

jreback commented May 9, 2015

replaced by #10085

@jreback jreback closed this May 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: return_type argument in StringMethods.split()
7 participants