ENH: union_categorical enhancements #13410

Open
jreback opened this Issue Jun 9, 2016 · 10 comments

Comments

Projects
None yet
5 participants
Contributor

jreback commented Jun 9, 2016 edited

xref #13361

  • support union w Series/CategoricalIndex as well as Categorical #14199
  • add ignore_order to ignore the raising on an ordered Categorical (and just have it work) #15219
  • do we want to put this in the pd namespace (or change its name). Consider Categorical.from_union(...)

jreback added this to the 0.18.2 milestone Jun 9, 2016

Contributor

jreback commented Jun 9, 2016

I think the location is fine. This mostly is part of a developer/extender API, e.g. used internally by other parts of pandas and other packages (e.g. dask), rather than in an of itself useful to a regular user.

Contributor

janschulz commented Jun 9, 2016

+1 for adding a Categorical.from_union(*cats, ignore_order=False) instead of pd.xxx() -> IMO it shouldn't be exposed as top level API and from_union() is a nice equivalent to from_codes().

@jreback jreback added a commit that referenced this issue Jul 29, 2016

@sinhrks @jreback sinhrks + jreback ENH: union_categorical supports identical categories with ordered
xref #13410, #13524

Author: sinhrks <sinhrks@gmail.com>

Closes #13763 from sinhrks/union_categoricals_ordered and squashes the following commits:

9cadc4e [sinhrks] ENH: union_categorical supports identical categories with ordered
59f2557
Contributor

jreback commented Sep 28, 2016

@chris-b1 this was partially closed by #14191 ?

@jreback jreback modified the milestone: 0.19.0, Next Major Release Sep 28, 2016

Contributor

chris-b1 commented Sep 28, 2016

It was #14199, but yes - I edited the top comment.

js3711 commented Jan 19, 2017 edited

@jreback @janschulz
I am interested in starting to contribute to pandas and see this as a good first PR opportunity. Do you guys agree?

  • If so, what do you see as the desired behavior for "add ignore_order to ignore the raising on an ordered Categorical (and just have it work)"
  • I do like the idea of Categorical.from_union(...). Should pandas.types.concat.union_categoricals still be supported (with the implementation living in from_union)?
Contributor

chris-b1 commented Jan 19, 2017

Setup

In [15]: c1 = pd.Categorical(['a', 'a', 'b'], categories=['b', 'a', 'c'], ordered=True)

In [16]: c2 = pd.Categorical(['b', 'b', 'a'])

In [17]: union_categoricals([c1, c2])
TypeError: Categorical.ordered must be the same

For your first question - the idea would be to allow this

In [18]: union_categoricals([c1, c2], ignore_order=True)
[a, a, b, b, b, a]
Categories (3, object): [b, a, c]

On your second question - not sure if there's complete agreement on the API, but assuming there is a Categorical.from_union I would suggest leaving the implementation where it is, and calling the union_categoricals function inside Categorical.from_union

Owner

jorisvandenbossche commented Jan 20, 2017 edited

the union_categoricals function is itself mentioned in the docs (http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning), so to start I think it is good to just improve this function (with eg what @chris-b1 showed above)

js3711 commented Jan 25, 2017

Thank you all for the comments. I have made an attempt at a pull request to support the ignore_order argument. #15219

I will hold off on from_union until there is agreement on the API change.

@jreback jreback added a commit that referenced this issue Feb 22, 2017

@jreback Justin Solinsky + jreback ENH union_categoricals supports ignore_order GH13410
xref #13410 (ignore_order portion)

Author: Justin Solinsky <justinsolinsky@Justins-MacBook-Pro.local>

Closes #15219 from js3711/GH13410-ENHunion_categoricals and squashes the following commits:

e9d00de [Justin Solinsky] GH15219 Documentation fixes based on feedback
d278d62 [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410
9b827ef [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410
14fee4f
Contributor

jreback commented Feb 22, 2017 edited

so to close this issue, I think we need to add Categorical.from_union as a short-cut (last item on the list).

@AnkurDedania AnkurDedania added a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017

@AnkurDedania Justin Solinsky + AnkurDedania ENH union_categoricals supports ignore_order GH13410
xref #13410 (ignore_order portion)

Author: Justin Solinsky <justinsolinsky@Justins-MacBook-Pro.local>

Closes #15219 from js3711/GH13410-ENHunion_categoricals and squashes the following commits:

e9d00de [Justin Solinsky] GH15219 Documentation fixes based on feedback
d278d62 [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410
9b827ef [Justin Solinsky] ENH union_categoricals supports ignore_order GH13410
1a0599d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment