Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC : Updated the comparison with spreadsheets with GroupBy #39965

Closed
wants to merge 17 commits into from

Conversation

0xpranjal
Copy link
Contributor

@0xpranjal 0xpranjal commented Feb 22, 2021

@jreback jreback added the Docs label Feb 24, 2021
@jreback jreback requested a review from afeld February 24, 2021 00:46
@jreback jreback added the IO Excel read_excel, to_excel label Feb 24, 2021
Copy link
Member

@afeld afeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI that I'm mentoring @Bhard27 through HackIllinois (#39349), so am being more nitpicky than I would for most pull request reviews.

Nice work! I believe the build failure is a flaky test unrelated to this change, so can be ignored.

Unfortunately, Power Query doesn't seem to be compatible with Office 365 or Office for Mac, which are the versions I have, so I can't try it.

We do have a section on Pivot Tables below, which is a different way of grouping by row. Let's:

  • Create sub-headings under GroupBy for Power Query and Pivot Tables, moving the content for the latter up
  • Explicitly state that Excel defines "pivot" more broadly than pandas

Thanks!

To aggregate a column, select the column to perform the Aggregate Operation on from the Column drop-down.
A Row Operation does not require a Column, since data is grouped based on table rows.

.. image:: ../../_static/spreadsheets/group-by.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this section duplicates content from the canonical documentation, which doesn't add any value. I suggest we either:

  • Walk them through an example in Excel that matches the pandas example below, so we are comparing apples to apples
  • Cut from "Then using the…" through this image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afeld - are you good here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this again, I see that the requested change was not made. I also agree that having an excel example that matches the pandas example would be desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bhard27 Are you able to update the excel image to match the pandas example below?

# default is axis=0
grouped = df.groupby("class")
grouped = df.groupby("order", axis="columns")
grouped = df.groupby(["class", "order"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning to a variable doesn't output anything, and even if it did, it would be something not useful like

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f8424dc3d90>

Let's do groupby().mean() to show something more interesting.

In Excel, this can be done by using the `Query Editor <https://support.microsoft.com/en-us/office/introduction-to-the-query-editor-power-query-1d6cdb63-bf70-4ae8-a7d5-6ae9547004d9>`_.
You can group the values in various rows into a single value by grouping the rows according to the values in one or more columns
Then using the `Query Editor ribbon <https://support.microsoft.com/en-us/office/combine-data-from-multiple-data-sources-power-query-70cfe661-5a2a-4d9d-a4fe-586cc7878c7d>`_, Right-click the column header that you want to group on,
and click `Group By <https://support.microsoft.com/en-us/office/group-rows-in-a-table-power-query-e1b9e916-6fcc-40bf-a6e8-ef928240adf1>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see Excel changing where to find the Query Editor, so let's try and future-proof this documentation by not getting into those details, directing readers to the canonical Excel documentation instead.

],
index=["falcon", "parrot", "lion", "monkey", "leopard"],
columns=("class", "order", "max_speed"),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an outstanding task in #38990 around standardizing the example datasets used in the page. Totally fine to take care of that as follow-up; just noting.

@afeld
Copy link
Member

afeld commented Feb 25, 2021

Oh, one more thing:

@0xpranjal 0xpranjal requested a review from afeld February 28, 2021 20:55
Copy link
Member

@afeld afeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outstanding comments still hold, but I think it's fine to take care of those as follow-ups. I don't have the ability to merge, but 👍 from me.

@afeld
Copy link
Member

afeld commented Mar 4, 2021

Also, the build failure seems unrelated to this change.

GroupBy
-------

In Excel, this can be done by using the `Query Editor <https://support.microsoft.com/en-us/office/introduction-to-the-query-editor-power-query-1d6cdb63-bf70-4ae8-a7d5-6ae9547004d9>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - can you merge master, I think it will resolve the CI issue.

@rhshadrach
Copy link
Member

@Bhard27 - Are you interested in finishing this up? I think changes look good, can you merge master and will have another look.

@0xpranjal
Copy link
Contributor Author

@Bhard27 - Are you interested in finishing this up? I think changes look good, can you merge master and will have another look.

Hey, I have pushed the changes you suggested. Is there anything else that needs to be done in this? Also, I don't have permission to merge this to master.

@rhshadrach
Copy link
Member

@Bhard27 - yes, can you merge the master branch into your PR branch (happy to help with further instructions if you're unfamiliar with doing this, just let me know). We like to do this to run the CI on the most recent changes with master.

Also, I don't have permission to merge this to master.

Right, that needs to be done by a maintainer. Even maintainers shouldn't merge there own PRs ;)

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 26, 2021
@rhshadrach
Copy link
Member

@Bhard27 - friendly ping, are you interested in continuing this?

@0xpranjal
Copy link
Contributor Author

@Bhard27 - friendly ping, are you interested in continuing this?

Sorry about the inactivity. I am a little confused about merging it with master branch because I think I already pushed it in the master branch.
Screenshot 2021-06-06 at 5 05 02 PM

@rhshadrach
Copy link
Member

rhshadrach commented Jun 9, 2021

@Bhard27 I think what you've screenshot is the fact that you are doing a pull request from your branch into master. This branch was last updated on February 28th, and the tests that ran were based on what this patch (which hasn't changed) and master (which has changed) combined together. In general, we'd like to make sure the tests get run on a very recent copy of master, so the request here is to merge the master branch into this PR branch. Doing so will run the tests again on all the updated code.

To do this, run the command git merge upstream/master, then push to origin (your fork of the repo). In general you may have to resolve conflicts, but I don't believe that there will be any.

@0xpranjal
Copy link
Contributor Author

@Bhard27 I think what you've screenshot is the fact that you are doing a pull request from your branch into master. This branch was last updated on February 28th, and the tests that ran were based on what this patch (which hasn't changed) and master (which has changed) combined together. In general, we'd like to make sure the tests get run on a very recent copy of master, so the request here is to merge the master branch into this PR branch. Doing so will run the tests again on all the updated code.

To do this, run the command git merge upstream/master, then push to origin (your fork of the repo). In general you may have to resolve conflicts, but I don't believe that there will be any.

Oh, Thanks a lot for the explanation. I totally understand why we need to merge it again and run the tests. I will do that ASAP

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for merging @Bhard27 - just the one outstanding request by @afeld. I think it would be good if the pandas example matched the excel example, but okay if it doesn't.

To aggregate a column, select the column to perform the Aggregate Operation on from the Column drop-down.
A Row Operation does not require a Column, since data is grouped based on table rows.

.. image:: ../../_static/spreadsheets/group-by.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bhard27 Are you able to update the excel image to match the pandas example below?

@rhshadrach
Copy link
Member

@Bhard27 - closing this as stale, but reply here if you are interested in continuing and can reopen.

@rhshadrach rhshadrach closed this Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants