Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Added warning to treat group chunks as immutable when using apply #19114

Closed
wants to merge 4 commits into from

Conversation

pdpark
Copy link

@pdpark pdpark commented Jan 7, 2018

@jreback jreback added the Docs label Jan 7, 2018
@@ -332,3 +332,97 @@ using something similar to the following:
See `the NumPy documentation on byte order
<https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html>`__ for more
details.


Alternative to storing lists in Pandas DataFrame Cells
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just DataFrame


Alternative to storing lists in Pandas DataFrame Cells
------------------------------------------------------
Storing nested lists/arrays inside a pandas object should be avoided for performance and memory use reasons. Instead they should be "exploded" into a flat DataFrame structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double backticks around DataFrame

.. ipython:: python

from collections import OrderedDict
df = (pd.DataFrame(OrderedDict([('name', ['A.J. Price']*3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use dict contruction directly, if you want column ordering then pass columns

))
df

nn = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this something more apparent

nn = [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']]*3
nn

# Step 1: Create an index with the "parent" columns to be included in the final Dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use sphinx to number these

@@ -955,6 +959,42 @@ will be (silently) dropped. Thus, this does not pose any problems:

df.groupby('A').std()

.. note::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for another issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #18953 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should have made this a separate branch on my fork and separate pull request.

I will make the updates per your notes above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a clean pull request for this fix: #19215

@jreback
Copy link
Contributor

jreback commented Jan 11, 2018

superseded by #19175

@jreback jreback closed this Jan 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants