### dsm054 commented Nov 18, 2015

 Sometimes it's handy to have access to a distinct integer for each group. For example, using the (internal) grouper: ``````>>> df = pd.DataFrame({"a": list("xyyzxy"), "b": list("ab"*3), "c": range(6)}) >>> df["group_id"] = df.groupby(["a","b"]).grouper.group_info[0] >>> df a b c group_id 0 x a 0 0 1 y b 1 2 2 y a 2 1 3 z b 3 3 4 x a 4 0 5 y b 5 2 `````` This can be achieved in a number of ways but none of them are particularly elegant, esp. if we're grouping on multiple keys and/or Series. Accordingly, after a brief discussion on gitter, I propose a new method `transform("enumerate")` which returns a Series of integers from 0 to ngroups-1 matching the order the groups will be iterated in. In other words, we'll simply be applying the following map: ``````>>> m = {k: i for i, (k,g) in enumerate(df.groupby(["a","b"]))} >>> m {('x', 'a'): 0, ('y', 'b'): 2, ('y', 'a'): 1, ('z', 'b'): 3} `````` (Note this is only to shows the desired behaviour, and wouldn't be how it'd be implemented!)

### jreback commented Nov 18, 2015

 can you show an example of its utility! also to note that this is really only useful as a `.transform` method (a reduction is kind of silly as its just the `range(len(df.groupby(...)))`)
### shoyer commented Nov 19, 2015

 Note that this is essentially exactly the same information provided by `pandas.factorize`: ``````In [1]: import pandas as pd In [2]: pd.factorize(['a', 'a', 'b', 'c']) Out[2]: (array([0, 0, 1, 2]), array(['a', 'b', 'c'], dtype=object)) ``````
### dsm054 commented Nov 19, 2015

 I couldn't think of a clean way to get factorize to handle the same inputs as groupby, though (both the multiple-series case and the mixed column-name/list input.) Might have missed something obvious, of course, as is my wont. But if I needed to write a few lines to get it to work, then those lines would more naturally fit as a groupby method, or so it seemed to me.
### dsm054 commented Nov 21, 2015

 As I went to implement this, I started to wonder if it doesn't make more sense to use `df.groupby("a").enumerate()` instead of `df.groupby("a").transform("enumerate")`, to be parallel with `df.groupby("a").cumcount()`, instead of `df.groupby("a").transform("cumcount")` (which doesn't work.) This would give us something like ``````>>> df = pd.DataFrame({"A": [1,2,2,2,1]}) >>> df["group_id"] = df.groupby("A").enumerate() >>> df["group_index"] = df.groupby("A").cumcount() >>> df A group_id group_index 0 1 0 0 1 2 1 0 2 2 1 1 3 2 1 2 4 1 0 1 ``````
### jreback commented Nov 21, 2015

 that looks reasonable

Merged

