Adds pandas-friendly formatting for multi-column and multi-index tables #986

afriedman412 · 2023-10-26T16:56:30Z

(partially) Fixes #970
• Tables can now handle multi-index formatting similar to how multi-column formatted is currently implemented.
• Code added to format multi-index and multi-header to match Pandas (repeat labels are now omitted but spaced correctly

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

afriedman412 · 2023-10-26T16:57:04Z

ok what's the best way to include a pandas df for testing (assuming we don't want to require pandas for testing)

andersonhc · 2023-10-27T02:04:41Z

ok what's the best way to include a pandas df for testing (assuming we don't want to require pandas for testing)

We have camelot.py into test\requirements.txt, and camelot depends on pandas, so pandas is already installed with our test requirements.

codecov-commenter · 2023-10-27T02:46:03Z

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (e23fdcc) 93.70% compared to head (a434fe1) 93.40%.
Report is 1 commits behind head on master.

❗ Current head a434fe1 differs from pull request most recent head 955937b. Consider uploading reports for the commit 955937b to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #986      +/-   ##
==========================================
- Coverage   93.70%   93.40%   -0.30%     
==========================================
  Files          28       28              
  Lines        8276     8309      +33     
  Branches     1514     1522       +8     
==========================================
+ Hits         7755     7761       +6     
- Misses        324      350      +26     
- Partials      197      198       +1

Files	Coverage Δ
fpdf/table.py	`87.89% <32.50%> (-6.63%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

afriedman412 · 2023-10-27T17:04:02Z

I added the code to create a table from a dataframe directly to the pdf object because it needs to get the number of index rows and columns from the dataframe ... and it seemed redundant to have to pass all the other kwargs to the initial function for building the table.

also added a test!

gmischler

@afriedman412, I appreciate your enthusiasm and can support the basic purpose of this PR. It is certainly desirable to make the use of data generated by other software packages as easily useable as possible. This applies to almost any data producing software, which happens to include pandas.

BUT.

The way you implemented this is not how it should be done.
There is absolutely no reason why the core classes of fpdf2 should have any knowledge about internal data structures of any one of those data generating packages. Sorry if this sounds unduly harsh, but from a software architecture perspective, this approach is just utterly and horribly wrong. If we allow this to happen and then continue on the the same path, fpdf2 would quickly turn into bloatware and become unmaintainable.

I've outlined the basic reasons for my objection and at least one possible solution in #970, just before and again after you joined the discussion. You seemed to have either not read, not understood, or just ignored this input. Is there a particular reason for that?

afriedman412 · 2023-10-30T19:22:58Z

I've outlined the basic reasons for my objection and at least one possible solution in #970, just before and again after you joined the discussion. You seemed to have either not read, not understood, or just ignored this input. Is there a particular reason for that?

Apologies for not seeing that you already went through this above. I kind of misread the conversation, as what stuck out to me immediately was the mis-framing of the issue.

While I agree that it's not ideal that Pandas doesn't do this formatting itself, multi-level dataframes are common enough that their formatting is worth accommodating. Automatically grabbing the number of levels is more intuitive than having to pass them as separate variables. It might not be elegant from a software engineer standpoint, but it's a small UX improvement that will save data people (such as myself) countless small headaches. But it's your package and if you don't want it that way, that's up to you!

To the larger point -- and why I missed the earlier conversation -- my goal here was to build out FPDF's ability to understand what index and column labels are, if for no other reason than to make their independent styling less janky!

gmischler · 2023-10-30T21:49:20Z

Apologies for not seeing that you already went through this above. I kind of misread the conversation, as what stuck out to me immediately was the mis-framing of the issue.

Yes, #970 is approaching the topic from a rather particular angle. No worries about that.

my goal here was to build out FPDF's ability to understand what index and column labels are, if for no other reason than to make their independent styling less janky!

That part is very much appreciated! The dilemma is simply about the optimal answer to "what goes where".

What the table module definitively needs internally are a good way to handle row labels (are those called "indexes" outside the pandas ecosystem as well?) and row spans. Adding those two features would be a good foundation for any of the more ambitious ideas (and the latter also helpful for write_html()).

On top of that foundation, a good concept for what you call "multi-level dataframes" will need some careful thought. The trick here is not focus your tunnel vision on pandas. Consider that there are likely other packages out there that may produce similar data, but will deliver them in a slightly different form, using differently named and organized data structures. Ultimately we will want to support all of them (or at least make that possible), so we need some kind of translation layer in between, separate from the core fpdf2 modules.

Whether this translation layer should become a part of the library (eg. in a new subdirectory "adapters"), would be better hosted as its own project, or some other approach entirely, that is up to discussion. But in any case, there needs to be a cleanly structured system with defined levels of abstraction. Data translation and conversion between different systems is a tricky topic, and doesn't get easier by just stirring everything together in one bowl.

If you're still interested in doing this (assuming I haven't chased you away with my criticism 😉), I suggest to restart from a clean slate. You can probably reuse a good chunk of the code already written, but it needs to be organized quite differently, so trying to salvage this PR might more hinder your progress than helping it.

Lucas-C · 2023-10-30T22:22:10Z

fpdf/fpdf.py

@@ -4882,6 +4882,23 @@ def table(self, *args, **kwargs):
        yield table
        table.render()

+    def create_table_from_dataframe(self, df, *args, **kwargs):


Such function should probably live in its own fpdf/pandas.py module.

@gmischler already provided a detailed explanation on why this should not be there, in fpdf2 core source file 😊

Lucas-C · 2023-10-30T22:26:24Z

my goal here was to build out FPDF's ability to understand what index and column labels are, if for no other reason than to make their independent styling less janky!

That part is very much appreciated! The dilemma is simply about the optimal answer to "what goes where".

I fully second @gmischler there 😊

I totally agree that this PR cannot be merged "as it is" now,
but I hope you will consider making some refactorings @afriedman412,
because this contribution to fpdf2 would be very welcome!

We sure can find room in fpdf2 for a "small UX improvement that will save data people countless small headaches"!

A good start could be to refactor your code in order minimizes the changes to the existing classes,
and initiate a new fpdf/pandas.py module!

afriedman412 · 2023-10-30T22:53:33Z

appreciate the feedback!

do you already have adapters set up or would this be the first?

gmischler · 2023-10-31T02:59:55Z

do you already have adapters set up or would this be the first?

So far they all live in the "Mixing with other libs" section of the documentation...
Until very recently, fpdf2 didn't have complex higher level APIs like now for tables and text regions, so the question didn't come up in the same way.

...and initiate a new fpdf/pandas.py module!

I'd hope to keep it as generic as possible.

fpdf/adapters/table_base.py
fpdf/adapters/table_pandas.py
fpdf/adapters/table_xyzpackage.py

There may well be other types of adapters in the future covering different aspects of working with fpdf2. I can't think of a good example right now, but users keep surprising me with their ideas... 😉

afriedman412 · 2023-10-31T17:02:41Z

ok rad, will work on this

afriedman412 · 2023-11-07T18:41:32Z

What the table module definitively needs internally are a good way to handle row labels (are those called "indexes" outside the pandas ecosystem as well?) and row spans. Adding those two features would be a good foundation for any of the more ambitious ideas (and the latter also helpful for write_html()).

can you clarify what you want the write_html() function to do?

I got down the rabbit hole of heading/index functionality and realized I didn't have a destination!

afriedman412 requested a review from gmischler as a code owner October 26, 2023 16:56

gmischler mentioned this pull request Oct 27, 2023

Adds pandas-friendly formatting for multi-column and multi-index tables #980

Closed

4 tasks

gmischler requested changes Oct 28, 2023

View reviewed changes

Lucas-C reviewed Oct 30, 2023

View reviewed changes

afriedman412 closed this Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds pandas-friendly formatting for multi-column and multi-index tables #986

Adds pandas-friendly formatting for multi-column and multi-index tables #986

afriedman412 commented Oct 26, 2023

afriedman412 commented Oct 26, 2023

andersonhc commented Oct 27, 2023 •

edited

codecov-commenter commented Oct 27, 2023

afriedman412 commented Oct 27, 2023

gmischler left a comment •

edited

afriedman412 commented Oct 30, 2023

gmischler commented Oct 30, 2023 •

edited by Lucas-C

Lucas-C Oct 30, 2023

Lucas-C commented Oct 30, 2023

afriedman412 commented Oct 30, 2023

gmischler commented Oct 31, 2023

afriedman412 commented Oct 31, 2023

afriedman412 commented Nov 7, 2023

Adds pandas-friendly formatting for multi-column and multi-index tables #986

Adds pandas-friendly formatting for multi-column and multi-index tables #986

Conversation

afriedman412 commented Oct 26, 2023

afriedman412 commented Oct 26, 2023

andersonhc commented Oct 27, 2023 • edited

codecov-commenter commented Oct 27, 2023

Codecov Report

afriedman412 commented Oct 27, 2023

gmischler left a comment • edited

Choose a reason for hiding this comment

afriedman412 commented Oct 30, 2023

gmischler commented Oct 30, 2023 • edited by Lucas-C

Lucas-C Oct 30, 2023

Choose a reason for hiding this comment

Lucas-C commented Oct 30, 2023

afriedman412 commented Oct 30, 2023

gmischler commented Oct 31, 2023

afriedman412 commented Oct 31, 2023

afriedman412 commented Nov 7, 2023

andersonhc commented Oct 27, 2023 •

edited

gmischler left a comment •

edited

gmischler commented Oct 30, 2023 •

edited by Lucas-C