New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add expand kw to str.get_dummies #10103

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@sinhrks
Member

sinhrks commented May 11, 2015

Ref: #10008.

Though this still under work (needs #10089 to simplify get_dummies flow), would like to discuss followings.

#### .str.extract note: overlaps with #11386

Currently it returns Series for a single group and DataFrame for multiples. To support expand kw, we have to choose :

1. Add expand option keeping existing behavior with warning for future change to extract=True (current impl).
2. Add expand option keeping existing behavior. Standardize extract=None (or other option) to select returning dimensionality automatically.
3. Add expand option with default True (or False). This breaks the API.
4. Make Index.str.extract return MultiIndex in multiple group case without adding expand option.

.str.get_dummies

  1. Add expand kw with default True. Currently this always returns DataFrame (and raises TypeError in Index). This doesn't break an API (current impl).
  2. Make Index.str.get_dummies return MultiIndex without adding expand option.

CC @mortada

Show outdated Hide outdated pandas/core/strings.py Outdated
Show outdated Hide outdated pandas/core/strings.py Outdated
@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche May 12, 2015

Member

Question: it there actually a need to have the option of expand here for those two functions?

If we add them, my opinion about the discussion points:

  • extract:
    • I would keep the current behaviour (seems also as a good behaviour (series for single group, dataframe for multiple groups), I see no reason to change?)
    • Have expand=None to let you able to override the default behaviour
  • get_dummies:
    • OK. How would this look for expand=False? A series/index with lists as elements? (question is if we want to encourage this? -> coming back to my initial question)
Member

jorisvandenbossche commented May 12, 2015

Question: it there actually a need to have the option of expand here for those two functions?

If we add them, my opinion about the discussion points:

  • extract:
    • I would keep the current behaviour (seems also as a good behaviour (series for single group, dataframe for multiple groups), I see no reason to change?)
    • Have expand=None to let you able to override the default behaviour
  • get_dummies:
    • OK. How would this look for expand=False? A series/index with lists as elements? (question is if we want to encourage this? -> coming back to my initial question)
@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks May 12, 2015

Member

@jorisvandenbossche Correct. There is an option to make Index.str work as the same as Series.str without adding expand kw. Added as alternatives. Though I prefer to make them to have unified kw/behavior.

get_dummies: How would this look for expand=False? A series/index with lists as elements?

Result will be a Series/Index of tuples, as the same as str.split(expand=False).

Member

sinhrks commented May 12, 2015

@jorisvandenbossche Correct. There is an option to make Index.str work as the same as Series.str without adding expand kw. Added as alternatives. Though I prefer to make them to have unified kw/behavior.

get_dummies: How would this look for expand=False? A series/index with lists as elements?

Result will be a Series/Index of tuples, as the same as str.split(expand=False).

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche May 12, 2015

Member

Not that important for this discussion, but str.split gives a list, not a tuple for me?

Member

jorisvandenbossche commented May 12, 2015

Not that important for this discussion, but str.split gives a list, not a tuple for me?

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche May 12, 2015

Member

Though I prefer to make them to have unified kw/behavior.

Well, I would also like that very much, but the default values of the keyword would in any case not be unified. So therefore, as it is not really possible to unify it that way, I was considering the option of not adding the keyword at all (which is also no unified behaviour)

Member

jorisvandenbossche commented May 12, 2015

Though I prefer to make them to have unified kw/behavior.

Well, I would also like that very much, but the default values of the keyword would in any case not be unified. So therefore, as it is not really possible to unify it that way, I was considering the option of not adding the keyword at all (which is also no unified behaviour)

@sinhrks sinhrks changed the title from (WIP) ENH: add expand kw to str.extract and str.get_dummies to ENH: add expand kw to str.extract and str.get_dummies May 29, 2015

@sinhrks sinhrks changed the title from ENH: add expand kw to str.extract and str.get_dummies to (WIP)ENH: add expand kw to str.extract and str.get_dummies May 29, 2015

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jul 28, 2015

Contributor

status?

Contributor

jreback commented Jul 28, 2015

status?

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Jul 29, 2015

Member

I hope to work on this, but it requires Index.fillna to make flow simple (#10089).

Member

sinhrks commented Jul 29, 2015

I hope to work on this, but it requires Index.fillna to make flow simple (#10089).

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 20, 2015

@jreback jreback added the Prio-medium label Aug 20, 2015

@sinhrks sinhrks changed the title from (WIP)ENH: add expand kw to str.extract and str.get_dummies to ENH: add expand kw to str.extract and str.get_dummies Nov 14, 2015

@jreback jreback changed the title from ENH: add expand kw to str.extract and str.get_dummies to ENH: add expand kw to str.get_dummies Feb 13, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 12, 2016

Contributor

@sinhrks what are we doing with this one?

Contributor

jreback commented Mar 12, 2016

@sinhrks what are we doing with this one?

@jreback jreback removed this from the Next Major Release milestone Mar 13, 2016

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Mar 14, 2016

Member

There are 2 points, and I think 1st point (add expand=False to return Series) is less useful (only for consistency). How about adding Index.str.get_dummies and close the issue?

  1. Add expand kw with default True. Currently this always returns DataFrame (and raises TypeError in Index). This doesn't break an API (current impl).
  2. Make Index.str.get_dummies return MultiIndex without adding expand option.
Member

sinhrks commented Mar 14, 2016

There are 2 points, and I think 1st point (add expand=False to return Series) is less useful (only for consistency). How about adding Index.str.get_dummies and close the issue?

  1. Add expand kw with default True. Currently this always returns DataFrame (and raises TypeError in Index). This doesn't break an API (current impl).
  2. Make Index.str.get_dummies return MultiIndex without adding expand option.
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 14, 2016

Contributor

@sinhrks I suppose you could add Index.str.get_dummies not really how useful this is, but it makes things consistent.

Contributor

jreback commented Mar 14, 2016

@sinhrks I suppose you could add Index.str.get_dummies not really how useful this is, but it makes things consistent.

@sinhrks sinhrks referenced this pull request Apr 10, 2016

Closed

ENH: Add Index.str.get_dummies #12842

4 of 4 tasks complete

@jreback jreback added this to the 0.18.1 milestone Apr 11, 2016

@jreback jreback closed this in e1aa2d9 Apr 11, 2016

@sinhrks sinhrks deleted the sinhrks:str_expand branch Apr 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment