New feature: FPDF.table() #701

Lucas-C · 2023-02-20T15:39:07Z

Current situation
fpdf2 currently let users employ the cell() & multi_cell() methods to build tables, as demonstrated in part 5 of our tutorial: https://pyfpdf.github.io/fpdf2/Tutorial.html#tuto-5-creating-tables
We also have some recipes regarding building tables in our documentation: https://pyfpdf.github.io/fpdf2/Tables.html

Based on the feedbacks in several table-related issues & discussions opened on this GitHub project, it seems to me that a FPDF.table() method would be very handy for our users.

Features
It would be ideal that the end implementation provides the following set of features:

support cells with content wrapping over several lines
control over column & row sizes, or by default let them be automatically computed
control over text alignment in cells, with rules by column or row
allow to set table headings, styled differently, but make this optional
control table width
honor the initial X / Y current position to render the table, and allow to easily center it in the page
handle splitting a table over page breaks, with headings repeated
allow to embed images in cells
control over borders: color, width & where they are drawn (e.g. allow to not draw the surrounding square, allow to only draw the horizontal line above the headings, etc.) Also: control thickness of border below headings
control over cell background, through a callback function to allow maximum customization
(bonus) allow for several cells to be merged horizontally (aka colspan)
(bonus) replace the table-building logic in fpdf/html.py by a call to this new FPDF.table() method

Method design
In issue #680 I pitched the following API for this feature:

from fpdf import FPDF

pdf = FPDF()
with pdf.table() as table:
    table.col_widths = ...  # optional
    with table.row() as row:
        row.cell(...)  # or row.image(...)

Regarding this, feedbacks and alternative suggestions are very welcome! 😊
Here is what I like about this one:

it defers the actual table building & rendering to the end of the table() context, which mean that we'll be able to perform some calculations on the row heights / column widths based on all the table content provided
it gives more flexibility to the user than having a huge data object provided in one go to a table() method, while still making it easy to build a table based on such big data dictionary / sequence
requiring several method calls will allow us to "split" control parameters between those methods, and limit the number of parameters passed to table(). The image() method for example, with its 11 parameters, is becoming a bit difficult to apprehend.

The text was updated successfully, but these errors were encountered:

Lucas-C · 2023-02-28T12:19:40Z

The PR is almost ready: #703

MartinThoma · 2023-03-24T20:03:40Z

Hey! I'm Martin, the maintainer of pypdf and PyPDF2 👋

Do you think the table-feature could be added in a way that it's possible to read the table structure from the PDF (programmatically, without heuristics)?

MartinThoma · 2023-03-24T21:39:01Z

I was thinking about "14.6 Marked Content", see https://accessible-pdf.info/basics/general/overview-of-the-pdf-tags

Lucas-C · 2023-03-26T10:52:01Z

Thank you for reaching out @MartinThoma!

Yes, this is a really good suggestion.
It shouldn't be difficult to add, as we already have the necessary building block: https://github.com/PyFPDF/fpdf2/blob/2.6.1/fpdf/fpdf.py#L3799

However, I am not sure how best to test that we implement this right...
Would you recommend any tool I could use to check that table content can be properly extracted based on marked content?
I only know https://github.com/camelot-dev/camelot, but is is not based on marked content tags.

MartinThoma · 2023-03-26T11:34:26Z

Good question! I want to give those capabilities to pypdf in the long run, but right now we are not there yet.

Looking at some libraries:

Tika / PdfBox has it, but tika-python probably not: Can tika extract "Marked Content" (tagged PDFs)? chrismattmann/tika-python#393
pdfminer.six: They claim they support it, but I couldn't figure out how to use it How can I extract marked content from tagged PDFs? pdfminer/pdfminer.six#868
PyMuPDF seems not to to be able to do it

I've actually asked this several years ago and haven't received an answer: How can I extract all PDF Tags related to content with Python?

Lucas-C · 2023-03-27T08:36:37Z

Thank you for the detailed answer @MartinThoma!
I have also found this screenshot that illutrates table tagged elements:

I have just added a commit to PR #703 related to this: 46bc617 (#703). It contains:

unit tests ensuring tables can be extracted from PDF docs generated with fpdf2, using camelot or tabula
some guidelines in the documentation: https://github.com/PyFPDF/fpdf2/blob/table/docs/Tables.md#parsabilty-of-the-tables-generated

I was not able to find examples of using pdfminer to extract tables from PDF docs.
Regarding PyMuPDF, the GitHub issue you pointed seems to indicate that it does NOT support table data extraction.
For tika-python, I am going to wait for the answer to the question you asked.

Given that, among tools dedicated to PDF-tables extraction, none of them uses PDF tags / annotations in the process of doing their job, I am not sure that adding PDF tags is really worthwile...
At least not in a systematical way.
An optional tag=True argument could later be added to FPDF.table(), but I don't think it's necessary in the initial version.

What do you think about this @MartinThoma?

MartinThoma · 2023-03-27T11:18:44Z

Wow, you're amazing 😍

Regarding PyMuPDF, the GitHub issue you pointed seems to indicate that it does NOT support table data extraction.

Oops, my bad, I mistyped 🙈

Given that, among tools dedicated to PDF-tables extraction, none of them uses PDF tags / annotations in the process of doing their job, I am not sure that adding PDF tags is really worthwile

Yes, I understand. It's a bit of a henn-egg-problem. Please don't forget that screen readers / accessibility solutions might use the tags as well. I think the tags were originally designed for them. But here I have no knowledge.

I don't think it's necessary in the initial version

I agree 👍

Lucas-C added enhancement up-for-grabs table multi_cell labels Feb 20, 2023

Lucas-C mentioned this issue Feb 20, 2023

How to put a PNG image into a table, using an IO buffer? #680

Closed

2 tasks

Lucas-C added a commit that referenced this issue Feb 20, 2023

Implement FPDF.table() - close #701

faabefd

Lucas-C added a commit that referenced this issue Feb 20, 2023

Implement FPDF.table() - close #701

9892367

Lucas-C added a commit that referenced this issue Feb 20, 2023

Implement FPDF.table() - close #701

be440c9

Lucas-C added a commit that referenced this issue Feb 20, 2023

Implement FPDF.table() - close #701

fc2c0b3

Lucas-C added a commit that referenced this issue Feb 20, 2023

Implement FPDF.table() - close #701

70d192b

Lucas-C self-assigned this Feb 20, 2023

Lucas-C added a commit that referenced this issue Feb 24, 2023

Implement FPDF.table() - close #701

54ea855

Lucas-C mentioned this issue Feb 24, 2023

Support character based line wrapping (#649) #657

Merged

5 tasks

Lucas-C removed the up-for-grabs label Feb 28, 2023

Lucas-C added a commit that referenced this issue Feb 28, 2023

Implement FPDF.table() - close #701

bd7f4fc

Lucas-C added a commit that referenced this issue Feb 28, 2023

Implement FPDF.table() - close #701

39f24dd

Lucas-C added a commit that referenced this issue Mar 16, 2023

Implement FPDF.table() - close #701

f61770d

Lucas-C added a commit that referenced this issue Mar 17, 2023

Implement FPDF.table() - close #701

563b4b4

Lucas-C added a commit that referenced this issue Mar 27, 2023

Implement FPDF.table() - close #701

de07a56

Lucas-C added a commit that referenced this issue Mar 27, 2023

Implement FPDF.table() - close #701

3a45ed3

Lucas-C added a commit that referenced this issue Mar 27, 2023

Implement FPDF.table() - close #701

0684aee

Lucas-C closed this as completed in 0579097 Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature: FPDF.table() #701

New feature: FPDF.table() #701

Lucas-C commented Feb 20, 2023 •

edited

Loading

Lucas-C commented Feb 28, 2023

MartinThoma commented Mar 24, 2023

MartinThoma commented Mar 24, 2023

Lucas-C commented Mar 26, 2023

MartinThoma commented Mar 26, 2023 •

edited

Loading

Lucas-C commented Mar 27, 2023

MartinThoma commented Mar 27, 2023

New feature: FPDF.table() #701

New feature: FPDF.table() #701

Comments

Lucas-C commented Feb 20, 2023 • edited Loading

Lucas-C commented Feb 28, 2023

MartinThoma commented Mar 24, 2023

MartinThoma commented Mar 24, 2023

Lucas-C commented Mar 26, 2023

MartinThoma commented Mar 26, 2023 • edited Loading

Lucas-C commented Mar 27, 2023

MartinThoma commented Mar 27, 2023

Lucas-C commented Feb 20, 2023 •

edited

Loading

MartinThoma commented Mar 26, 2023 •

edited

Loading