Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix render reorder causing incorrect groupings #218

Merged
merged 2 commits into from
Feb 28, 2024
Merged

Conversation

machow
Copy link
Collaborator

@machow machow commented Feb 27, 2024

Currently, Great Tables keeps a GT._body.body dataframe, ordered to match the final output (e.g. sorted by group). This means that what happens currently looks like this:

  1. formatters run, put results on GT._body.body
  2. GT._build_data() returns a new GT object, with ._body.body reordered to match final output
  3. GT.render_as_html() -> create_body_component_h() : attempts to set any missing values in ._body.body to a string version of the original data. This is to fill in any unformatted values.

However, there are two issues with this:

  • We attempt to grab rows from _body.body when rendering html, as if the dataframe hasn't been sorted. (but it has).
  • With polars, we are accidentally filling in missing values in step (3) with the original data, in its original order.

In order to fix this, we need to decide whether _body.body should be sorted, and make sure it's filled consistently across pandas and polars.

This PR simply removes the sorting of _body.body. Since we index based on the original data (e.g. passing row=1 to a formatter refers to that row position in the original, unsorted data), it seems like we can avoid sorting ._body.body, and just grab the correctly ordered rows when rendering.

(Note that the snapshot test I added when addressing #201 was wrong 馃槶, and I fixed it here)

Fixes: #98

@machow
Copy link
Collaborator Author

machow commented Feb 27, 2024

Shoot, I checked out this branch from the formatter PR one. Once that's merged I'll just get this one caught up, so it's less chaotic!

@rich-iannone
Copy link
Member

Shoot, I checked out this branch from the formatter PR one. Once that's merged I'll just get this one caught up, so it's less chaotic!

The formatter PR has been reviewed (great work!) so it can be merged.

@machow machow changed the base branch from main to chore-rm-unused-code February 28, 2024 16:54
@machow machow changed the base branch from chore-rm-unused-code to main February 28, 2024 16:54
@machow
Copy link
Collaborator Author

machow commented Feb 28, 2024

Alright, it should now just show the diff against main (no fmt method changes)! Note that this PR makes a fairly substantial change of keeping _body.body in the original input order. This means nothing gets reordered, only that the renderer func grabs rows in final display order.

Definitely let me know if this will cause issues in the future, based on any gt for R behaviors!

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you!

@rich-iannone rich-iannone marked this pull request as ready for review February 28, 2024 17:46
@rich-iannone rich-iannone merged commit dd22658 into main Feb 28, 2024
7 checks passed
@rich-iannone rich-iannone deleted the fix-render-reorder branch February 28, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using row ID values in rows is inconsistent when the table has row groups
2 participants