DOC: Enhancing pivot / reshape docs #21038

VincentLa · 2018-05-14T20:13:41Z

closes DOC: enhance pivot / reshape docs #19089
tests added / passed (NA Just Docs)
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Enhancing pivot / reshape docs

Added more examples and added Q + A section.

…at did not exist. Also added more examples

…index/column pairs

…op of the section

pep8speaks · 2018-05-14T20:13:45Z

Hello @VincentLa! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 16, 2018 at 15:30 Hours UTC

VincentLa · 2018-05-14T20:15:31Z

I get an error when trying to run

(pandas-dev) VincentLa@ch-C02Q6HPBG8WN pandas-vincentla (master) $ git diff upstream/master -u -- "*.py" | flake8 --diff
fatal: bad revision 'upstream/master'

WillAyd · 2018-05-14T20:31:17Z

@VincentLa do you have your upstream pointing to pandas?

(pandas_dev) williams-imac:pandas williamayd$ git remote -v show
origin	git@github.com:WillAyd/pandas.git (fetch)
origin	git@github.com:WillAyd/pandas.git (push)
upstream	https://github.com/pandas-dev/pandas.git (fetch)
upstream	https://github.com/pandas-dev/pandas.git (push)

If not make sure you do that - it's outlined in the contributing guide:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#forking

VincentLa · 2018-05-14T20:38:24Z

@WillAyd I believe I do:

VincentLa@ch-C02Q6HPBG8WN pandas-vincentla (master) $ git remote -v show
origin	https://github.com/VincentLa/pandas.git (fetch)
origin	https://github.com/VincentLa/pandas.git (push)
upstream	https://github.com/pandas-dev/pandas.git (fetch)
upstream	https://github.com/pandas-dev/pandas.git (push)

WillAyd

Thanks for the PR! Admittedly haven't looked at this in its rendered form but here's some comments on a first pass

WillAyd · 2018-05-14T20:35:00Z

doc/source/reshaping.rst

@@ -93,6 +92,12 @@ You can then select subsets from the pivoted ``DataFrame``:
 Note that this returns a view on the underlying data in the case where the data
 are homogeneously-typed.

+.. note::
+   ``pandas.pivot`` will error with a ``ValueError: Index contains duplicate


Can we convert pandas.pivot into an inline reference, similar to how you have pandas.pivot_table?

WillAyd · 2018-05-14T20:38:16Z

doc/source/reshaping.rst

+Question 1
+~~~~~~~~~~
+
+How do I pivot ``df`` such that the ``col`` values are columns,


Double backticks are for literals, where single backticks are for inline code / argument refs. Can you change any argument reference (i.e. df, col, etc...) to single backticks?

@WillAyd while that makes sense, this seems inconsistent with how the single backticks and double backticks are being used elsewhere in this doc. It also seems like the double backticks look better in the docs itself.

Hmm OK - that's a fair point. @TomAugspurger do you know if there's an official stance on this? Worth updating in a separate issue?

WillAyd · 2018-05-14T20:38:34Z

doc/source/reshaping.rst

+~~~~~~~~~~
+
+How do I pivot ``df`` such that the ``col`` values are columns,
+``row`` values are the index, and mean of ``val0`` are the values? In


", and the mean"

WillAyd · 2018-05-14T20:40:15Z

doc/source/reshaping.rst

+
+.. ipython:: python
+
+   np.random.seed([3,1415])


Any reason to choose a list as a seed here? Not saying there's anything wrong with it per se, just have always seen an int literal like 12345

No real reason. I'm basing these examples off of the StackOverflow post originally linked in the issue: #19089.

WillAyd · 2018-05-14T20:46:45Z

doc/source/reshaping.rst

+Question 4
+~~~~~~~~~~
+
+How can I Group By over multiple columns?


Not sure "Group By" is the right terminology to use here - any reason in particular you went with that?

Wondering if in general it wouldn't be more concise and easier to word if we did away with the "Q/A" format and just preceded each example with something like "Multiple values can be used at once" and leaving it to the example to highlight the effect of that

WillAyd · 2018-05-14T20:47:24Z

doc/source/whatsnew/v0.23.0.txt

@@ -1375,6 +1375,7 @@ Reshaping
 - Bug in :func:`isna`, which cannot handle ambiguous typed lists (:issue:`20675`)
 - Bug in :func:`concat` which raises an error when concatenating TZ-aware dataframes and all-NaT dataframes (:issue:`12396`)
 - Bug in :func:`concat` which raises an error when concatenating empty TZ-aware series (:issue:`18447`)
+- Updated :func:`~pandas.pivot_table` with more comprehensive examples. Also updated Reshaping and Pivot Tables documentation with a Frequenty Asked Questions example (:issue:`19089`)


We don't typically add a whatsnew note for documentation-only updates so you can remove this

WillAyd · 2018-05-14T20:49:37Z

pandas/core/frame.py

+        foo one      4      1
+            two    NaN      6
+
+        We can also fill missing values using the `fill_value` parameter.


Worth calling out in this example that providing the fill_value has preserved the int dtype, instead of casting to float as np.nan would

WillAyd · 2018-05-14T20:50:12Z

pandas/core/frame.py

+        foo one      4      1
+            two      0      6
+
+        The next example aggregates by taking the mean using values for


"mean across multiple columns" reads a little easier than "mean using values for multiple columns" IMO

WillAyd · 2018-05-14T20:52:18Z

Hmm have you fetched or pulled anything from upstream yet then? Perhaps git fetch upstream will resolve that issue locally?

codecov · 2018-05-14T22:19:59Z

Codecov Report

Merging #21038 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #21038   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files         161      161           
  Lines       51200    51200           
=======================================
  Hits        47232    47232           
  Misses       3968     3968

Flag	Coverage Δ
#multiple	`90.63% <ø> (ø)`	⬆️
#single	`42.29% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.03% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf4c0b6...67112b8. Read the comment docs.

jreback · 2018-05-15T00:08:19Z

doc/source/reshaping.rst

 .. note::
    If you just want to handle one column as a categorical variable (like R's factor),
    you can use  ``df["cat_col"] = pd.Categorical(df["col"])`` or
    ``df["cat_col"] = df["col"].astype("category")``. For full docs on :class:`~pandas.Categorical`,
    see the :ref:`Categorical introduction <categorical>` and the
    :ref:`API documentation <api.categorical>`.
+
+Frequently Asked Questions (and Examples)
+------------------


needs to be the same length as the title (how about just make this title Examples)?

jreback · 2018-05-15T00:09:24Z

doc/source/reshaping.rst

+   )
+
+   df
+


you don't need to have these as Question, rather just make an informative title.

jreback · 2018-05-15T00:14:31Z

doc/source/reshaping.rst

+   np.random.seed([3,1415])
+   n = 20
+
+   cols = np.array(['key', 'row', 'item', 'col'])


you can just do

In [12]: cols + pd.DataFrame((np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str)) Out[12]: 0 1 2 3 0 key1 row3 item2 col0 1 key0 row2 item1 col4 2 key1 row1 item0 col2 3 key1 row1 item0 col1 4 key0 row3 item1 col2 5 key1 row0 item2 col4 6 key2 row2 item0 col3 7 key2 row0 item2 col2 8 key1 row1 item0 col1 9 key0 row4 item0 col4 10 key0 row0 item1 col2 11 key0 row4 item1 col4 12 key0 row4 item2 col1 13 key1 row1 item1 col1 14 key1 row0 item2 col4 15 key2 row2 item1 col0 16 key2 row2 item2 col0 17 key0 row3 item0 col2 18 key1 row0 item1 col4 19 key0 row3 item1 col2

Thanks! Refactored a bit.

VincentLa · 2018-05-16T14:53:18Z

@jreback @WillAyd made changes based on the feedback! Would love additional review.

WillAyd

Thanks for the updates. Couple more comments - may have more whenever this gets pushed to the nightly doc build

WillAyd · 2018-05-16T21:32:38Z

doc/source/reshaping.rst

@@ -93,6 +92,12 @@ You can then select subsets from the pivoted ``DataFrame``:
 Note that this returns a view on the underlying data in the case where the data
 are homogeneously-typed.

+.. note::
+   :func:`~pandas.pivot` will error with a ``ValueError: Index contains duplicate


Does this render? Might need a space after directive

WillAyd · 2018-05-16T21:33:49Z

doc/source/reshaping.rst

+Examples
+--------
+
+In this section, we will review frequently asked questions and examples. The 


Since we got rid of the Q and A format don't need this intro

WillAyd · 2018-05-16T21:36:16Z

doc/source/reshaping.rst

+   n = 20
+
+   cols = np.array(['key', 'row', 'item', 'col'])
+   df = cols + pd.DataFrame((np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str))


Minor nit but can add the columns to the constructor and get rid of the line below

WillAyd · 2018-05-16T21:38:09Z

doc/source/reshaping.rst

+   cols = np.array(['key', 'row', 'item', 'col'])
+   df = cols + pd.DataFrame((np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str))
+   df.columns = cols
+   df = df.join(pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val'))


Stylistic nit but I think it would be better to use pd.concat instead of join here

WillAyd · 2018-05-16T21:38:33Z

doc/source/reshaping.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Suppose we wanted to pivot ``df`` such that the ``col`` values are columns,
+``row`` values are the index, and the mean of ``val0`` are the values? In


This isn't a question, so replace ? with .

WillAyd · 2018-05-16T21:39:27Z

doc/source/reshaping.rst

+   df.pivot_table(
+       values=['val0', 'val1'], index='row', columns='col', aggfunc=['mean'])
+
+Note to subdivide over multiple columns we can pass in a list to the


Just for readability we don't need to start each of these with "Note"

jreback · 2018-06-19T01:44:40Z

can you update

jreback · 2018-10-11T01:59:01Z

can we rebase this @VincentLa (or @datapythonista )

jreback · 2018-11-01T01:31:00Z

@pandas-dev/pandas-core if someone has a chance to rebase this

TomAugspurger · 2018-11-02T16:50:48Z

Updated. Didn't make any changes other than removing whitespace.

datapythonista

Fixed a pep8 issue, lgtm.

TomAugspurger · 2018-11-06T20:48:27Z

Updated again, hopefully CI will pass.

jreback

@VincentLa can you merge master and address the outstanding questions.

jreback · 2018-11-11T23:53:46Z

doc/source/reshaping.rst

+
+.. code-block:: ipython
+
+   col   col0   col1   col2   col3  col4


this should be a ipython block

datapythonista · 2018-11-12T00:03:48Z

@jreback this PR is discontinued. Tom and I made changes to it, so it can be merged (some improvements are left as later work, but I think the current version is correct and an improvement to what we have). Otherwise we'll have to close, or keep making the improvements ourselves.

jreback · 2018-11-12T00:22:11Z

thanks @datapythonista

…fixed * upstream/master: DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620) BUILD: Simplifying contributor dependencies (pandas-dev#23522) BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539) BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619)

* upstream/master: BUG: Don't over-optimize memory with jagged CSV (pandas-dev#23527) DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635) More helpful Stata string length error. (pandas-dev#23629) BUG: astype fill_value for SparseArray.astype (pandas-dev#23547) CLN: datetimelike arrays: isort, small reorg (pandas-dev#23587) CI: Check in the CI that assert_raises_regex is not being used (pandas-dev#23627) CLN:Remove unused **kwargs from user facing methods (pandas-dev#23249) DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620)

VincentLa14 added 6 commits May 14, 2018 13:48

Deleting a duplicate example in pd.DataFrame.pivot_table documentation

783f5f0

Fixing a broken example, the broken example referred to a column E th…

9e79c2f

…at did not exist. Also added more examples

Adding a clarification note on an error with pivot due to non-unique …

f60874a

…index/column pairs

In my opinion, it makes more sense to have the overall image at the t…

ab4584d

…op of the section

Removing unnecessary phrase

263b1d8

Adding frequently asked questions section

db68b01

fixing linter errors

8c9ae27

WillAyd requested changes May 14, 2018

View reviewed changes

jreback requested changes May 15, 2018

View reviewed changes

jreback reviewed May 15, 2018

View reviewed changes

jreback added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 15, 2018

VincentLa added 3 commits May 15, 2018 10:50

Merge remote-tracking branch 'upstream/master'

eaec575

Removing whatsnew and fixing some typos

e0d9501

Rephrasing reshaping docs instead of q+a just examples

61f9f43

Merge remote-tracking branch 'upstream/master'

56587c7

WillAyd requested changes May 16, 2018

View reviewed changes

datapythonista self-assigned this Jul 22, 2018

TomAugspurger added 2 commits November 2, 2018 09:49

Merge remote-tracking branch 'upstream/master' into VincentLa-master

a935dd3

whitespace

5283d29

pep8 issue

c146d7c

datapythonista approved these changes Nov 3, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into VincentLa-master

67112b8

jreback requested changes Nov 11, 2018

View reviewed changes

doc/source/reshaping.rst

.. code-block:: ipython

col col0 col1 col2 col3 col4

Copy link

Contributor

jreback Nov 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a ipython block

jreback merged commit dcb8b6a into pandas-dev:master Nov 12, 2018

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

DOC: Enhancing pivot / reshape docs (pandas-dev#21038)

92b015d

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

DOC: Enhancing pivot / reshape docs (pandas-dev#21038)

c96a2eb

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: Enhancing pivot / reshape docs (pandas-dev#21038)

cf07567

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: Enhancing pivot / reshape docs (pandas-dev#21038)

c0bcf67

DOC: Enhancing pivot / reshape docs #21038

DOC: Enhancing pivot / reshape docs #21038

Conversation

VincentLa commented May 14, 2018 • edited Loading

pep8speaks commented May 14, 2018 • edited Loading

Comment last updated on May 16, 2018 at 15:30 Hours UTC

VincentLa commented May 14, 2018

WillAyd commented May 14, 2018

VincentLa commented May 14, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VincentLa May 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented May 14, 2018

codecov bot commented May 14, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback May 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VincentLa commented May 16, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 19, 2018

jreback commented Oct 11, 2018

jreback commented Nov 1, 2018

TomAugspurger commented Nov 2, 2018

datapythonista left a comment

Choose a reason for hiding this comment

TomAugspurger commented Nov 6, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Nov 12, 2018

jreback commented Nov 12, 2018

VincentLa commented May 14, 2018 •

edited

Loading

pep8speaks commented May 14, 2018 •

edited

Loading

VincentLa May 15, 2018 •

edited

Loading

codecov bot commented May 14, 2018 •

edited

Loading

jreback May 15, 2018 •

edited

Loading