Doc: Added warning to treat group chunks as immutable when using apply #19114

pdpark · 2018-01-07T06:39:46Z

closes issue DOC: clarify dangers of fast apply in GroupBy apply docs #14180

…utomatic-exclusion-of-nuisance-columns section

…ly" section of groupby.rst Resolves: pandas-dev#14180

jreback · 2018-01-07T15:22:32Z

doc/source/gotchas.rst

@@ -332,3 +332,97 @@ using something similar to the following:
 See `the NumPy documentation on byte order
 <https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html>`__ for more
 details.
+
+
+Alternative to storing lists in Pandas DataFrame Cells


just DataFrame

jreback · 2018-01-07T15:22:53Z

doc/source/gotchas.rst

+
+Alternative to storing lists in Pandas DataFrame Cells
+------------------------------------------------------
+Storing nested lists/arrays inside a pandas object should be avoided for performance and memory use reasons. Instead they should be "exploded" into a flat DataFrame structure.


use double backticks around DataFrame

jreback · 2018-01-07T15:23:21Z

doc/source/gotchas.rst

+.. ipython:: python
+
+   from collections import OrderedDict
+   df = (pd.DataFrame(OrderedDict([('name', ['A.J. Price']*3), 


use dict contruction directly, if you want column ordering then pass columns

jreback · 2018-01-07T15:23:41Z

doc/source/gotchas.rst

+                     ))
+   df
+
+   nn = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3


call this something more apparent

jreback · 2018-01-07T15:23:58Z

doc/source/gotchas.rst

+   nn = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3
+   nn
+
+   # Step 1: Create an index with the "parent" columns to be included in the final Dataframe


you can use sphinx to number these

jreback · 2018-01-07T15:25:06Z

doc/source/groupby.rst

@@ -955,6 +959,42 @@ will be (silently) dropped. Thus, this does not pose any problems:

   df.groupby('A').std()

+.. note::


this is for another issue?

PR #18953 ?

Yes, I should have made this a separate branch on my fork and separate pull request.

I will make the updates per your notes above.

I created a clean pull request for this fix: #19215

jreback · 2018-01-11T00:14:49Z

superseded by #19175

pdpark added 4 commits December 27, 2017 00:14

Added note about groupby excluding Decimal columns by default

155e85e

Moved note about exclusion of Decimal columns from agg functions to a…

5bb3321

…utomatic-exclusion-of-nuisance-columns section

Adding example of exploding nested lists

6cf1c2c

docs: Add warning to treat group chunks as immutable to "Flexible app…

e212e78

…ly" section of groupby.rst Resolves: pandas-dev#14180

jreback added the Docs label Jan 7, 2018

jreback requested changes Jan 7, 2018

View reviewed changes

jreback closed this Jan 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: Added warning to treat group chunks as immutable when using apply #19114

Doc: Added warning to treat group chunks as immutable when using apply #19114

pdpark commented Jan 7, 2018

jreback Jan 7, 2018

jreback Jan 7, 2018

jreback Jan 7, 2018

jreback Jan 7, 2018

jreback Jan 7, 2018

jreback Jan 7, 2018

jreback Jan 7, 2018

pdpark Jan 7, 2018

pdpark Jan 12, 2018

jreback commented Jan 11, 2018

		@@ -955,6 +959,42 @@ will be (silently) dropped. Thus, this does not pose any problems:

		df.groupby('A').std()

		.. note::

Doc: Added warning to treat group chunks as immutable when using apply #19114

Doc: Added warning to treat group chunks as immutable when using apply #19114

Conversation

pdpark commented Jan 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 11, 2018