ENH: Implement groupby.sample #34069

dsaxton · 2020-05-08T14:23:50Z

closes Feature Request: Sample method for Groupby objects #31775
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pandas/core/groupby/groupby.py

mroeschke

Implementation looks good. Just some doc comments.

pandas/core/groupby/groupby.py

jreback

cc @TomAugspurger @jorisvandenbossche if you'd have a look

jreback · 2020-06-04T11:40:02Z

pandas/core/groupby/groupby.py

+        else:
+            ws = [None] * self.ngroups
+
+        if random_state:


i don't think this is enough, you need to always have a random_state here that is consistent across the entire groupby.

I think either is fine. Either we get a random state from NumPy's global random state initially and re-use it, or we have each group draw from the global random state pool. It's similar to these two calls

.sample(random_state=0) # each call uses the seed 0

.sample(random_state=np.random.RandomState(0)) # each call makes an independent draw

I actually meant to make this random_state is not None (didn't consider other "falsey" values)

TomAugspurger · 2020-06-04T11:50:18Z

doc/source/whatsnew/v1.1.0.rst

@@ -275,6 +275,7 @@ Other enhancements
  such as ``dict`` and ``list``, mirroring the behavior of :meth:`DataFrame.update` (:issue:`33215`)
 - :meth:`~pandas.core.groupby.GroupBy.transform` and :meth:`~pandas.core.groupby.GroupBy.aggregate` has gained ``engine`` and ``engine_kwargs`` arguments that supports executing functions with ``Numba`` (:issue:`32854`, :issue:`33388`)
 - :meth:`~pandas.core.resample.Resampler.interpolate` now supports SciPy interpolation method :class:`scipy.interpolate.CubicSpline` as method ``cubicspline`` (:issue:`33670`)
+- :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` now implement the ``sample`` method for doing random sampling within groups (:issue:`31775`)


Need the full path to these classes in the docs.

TomAugspurger · 2020-06-04T11:56:59Z

pandas/core/groupby/groupby.py

+        else:
+            ws = [None] * self.ngroups
+
+        if random_state:


I think either is fine. Either we get a random state from NumPy's global random state initially and re-use it, or we have each group draw from the global random state pool. It's similar to these two calls

.sample(random_state=0) # each call uses the seed 0

.sample(random_state=np.random.RandomState(0)) # each call makes an independent draw

pandas/core/groupby/groupby.py

bashtage · 2020-06-05T15:18:20Z

pandas/core/groupby/groupby.py

+            the underlying object and will be used as sampling probabilities
+            after normalization within each group.
+        random_state : int, array-like, BitGenerator, np.random.RandomState, optional
+            If int, array-like, or BitGenerator (NumPy>=1.17), seed for


It it is a BitGenerator, do you use a Generator to produce the random samples or a RandomState. Best practice is to use a Generator since RandomState is effectively frozen in time. If an int, it is used as a seed for np.random.default_rng() or RandomState if NumPy >= 1.17?

This is following a pattern similar to the one used in pandas.core.generic.sample of processing the random_state according to pandas.core.common.random_state:

pandas/pandas/core/common.py

Line 394 in c71bfc3

def random_state(state=None):

pandas/core/groupby/groupby.py

jreback

looks fine, can you add a reference in doc/source/reference/groupby.rst

also a mention / small example in user_guide/groupby.rst if appropriate

jreback · 2020-06-14T15:22:21Z

thanks @dsaxton very nice!

dsaxton added 3 commits May 8, 2020 09:15

ENH: Implement groupby.sample

b91b767

Add test

d0cf785

Add tag

0656332

dsaxton requested a review from mroeschke May 8, 2020 14:49

dsaxton added 4 commits May 8, 2020 09:55

Troubleshoot CI

cb5f105

doc nit

40966bf

Merge remote-tracking branch 'upstream/master' into groupby-sample

4f2e8da

Move tag

904bdcd

dsaxton added Enhancement Groupby labels May 8, 2020

mroeschke reviewed May 8, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

dsaxton added 3 commits May 8, 2020 13:49

Dispatch and allow weights

0db1ed7

Merge remote-tracking branch 'upstream/master' into groupby-sample

cbaf4a5

black

07dacf2

mroeschke reviewed May 8, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

mroeschke reviewed May 8, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

mroeschke reviewed May 8, 2020

View reviewed changes

dsaxton added 6 commits May 8, 2020 15:55

Add doc examples

2935645

Fixup

3e159a8

Another fixup

2397c3a

Edit tests

8c3dfd8

Merge remote-tracking branch 'upstream/master' into groupby-sample

21923a7

Merge remote-tracking branch 'upstream/master' into groupby-sample

11f3d77

jreback requested changes May 10, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Show resolved Hide resolved

pandas/core/groupby/groupby.py Show resolved Hide resolved

bashtage reviewed May 11, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

dsaxton added 4 commits May 11, 2020 11:59

Update docstring

e6579d3

Merge remote-tracking branch 'upstream/master' into groupby-sample

cf41a58

Don't use selected_obj.index

37037c2

Merge remote-tracking branch 'upstream/master' into groupby-sample

540af35

jreback requested changes May 14, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

dsaxton added 4 commits June 2, 2020 18:06

Delete

372da0e

random_state

b1bf65f

Merge remote-tracking branch 'upstream/master' into groupby-sample

04789a1

Doc

48eea97

jreback requested changes Jun 4, 2020

View reviewed changes

TomAugspurger reviewed Jun 4, 2020

View reviewed changes

dsaxton added 3 commits June 4, 2020 08:32

not None

b07b377

Merge remote-tracking branch 'upstream/master' into groupby-sample

b447f85

doc

62f7a15

bashtage reviewed Jun 5, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

bashtage reviewed Jun 5, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

bashtage reviewed Jun 5, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

bashtage reviewed Jun 5, 2020

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

dsaxton added 3 commits June 5, 2020 11:24

doc

68d8d4a

Merge remote-tracking branch 'upstream/master' into groupby-sample

572cc6c

Add weights example

97034ae

bashtage reviewed Jun 5, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

bashtage reviewed Jun 5, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

dsaxton and others added 3 commits June 5, 2020 14:47

Fix weights index and adjust test

ad0bd61

Update docstring

05a1ba5

Merge remote-tracking branch 'upstream/master' into groupby-sample

e31a119

jreback requested changes Jun 9, 2020

View reviewed changes

DANIEL SAXTON added 2 commits June 9, 2020 22:31

Update doc

56a49a0

Merge remote-tracking branch 'upstream/master' into groupby-sample

27cb1ba

jreback added this to the 1.1 milestone Jun 14, 2020

jreback approved these changes Jun 14, 2020

View reviewed changes

jreback merged commit b3f483f into pandas-dev:master Jun 14, 2020

dsaxton deleted the groupby-sample branch June 14, 2020 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Implement groupby.sample #34069

ENH: Implement groupby.sample #34069

dsaxton commented May 8, 2020 •

edited

Loading

mroeschke left a comment

jreback left a comment

jreback Jun 4, 2020

TomAugspurger Jun 4, 2020

dsaxton Jun 4, 2020

TomAugspurger Jun 4, 2020

TomAugspurger Jun 4, 2020

bashtage Jun 5, 2020

dsaxton Jun 5, 2020

jreback left a comment

jreback commented Jun 14, 2020

ENH: Implement groupby.sample #34069

ENH: Implement groupby.sample #34069

Conversation

dsaxton commented May 8, 2020 • edited Loading

mroeschke left a comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback Jun 4, 2020

Choose a reason for hiding this comment

TomAugspurger Jun 4, 2020

Choose a reason for hiding this comment

dsaxton Jun 4, 2020

Choose a reason for hiding this comment

TomAugspurger Jun 4, 2020

Choose a reason for hiding this comment

TomAugspurger Jun 4, 2020

Choose a reason for hiding this comment

bashtage Jun 5, 2020

Choose a reason for hiding this comment

dsaxton Jun 5, 2020

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback commented Jun 14, 2020

dsaxton commented May 8, 2020 •

edited

Loading