Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPDFRecorder frequent usage makes document generation slow #1092

Closed
dmail00 opened this issue Jan 14, 2024 · 4 comments
Closed

FPDFRecorder frequent usage makes document generation slow #1092

dmail00 opened this issue Jan 14, 2024 · 4 comments

Comments

@dmail00
Copy link

dmail00 commented Jan 14, 2024

It is not immediately apparent to me why the rewind on FPDFRecorder copies the singleton FPDF state, was it imagined that this would be rewound multiple times?

From looking at the commit history of this file, it was introduced around version 2.3; the initial commit [1] does not contain this deepcopy yet strangely the next commit to this file is modifying an already present deepcopy [2].

The reason for asking this, is that I am generating 30 or so documents at a time with a similar format, each with five heading and this is slow, it is copying the singleton state twice for each header which seems like madness. The speed can be drastically improved by removing the second deepcopy or at least using a boolean flag to copy or not; however it can be even better using some logic that at least fits my needs. This is:

  • cache the current font
  • set the heading font as the current
  • get the width of the text for the heading
  • if this greater than the width minus the margins* then hand off to start_section
  • else get the height of the font line* and add the height of four lines using the previous font**
    ** if this new y would exceed the page break trigger create a page
    ** render the marked sequence
  • reset the old font

* margins includes any possible heading borders.
** Anything less than four lines, to me at least, does not make sense in putting a header on this page

[1] ea834ec
[2] 9fb2c3b

@Lucas-C
Copy link
Member

Lucas-C commented Jan 15, 2024

Hi @dmail00

It is not immediately apparent to me why the rewind on FPDFRecorder copies the singleton FPDF state, was it imagined that this would be rewound multiple times?

Yes indeed, there is even a unit test for that: https://github.com/py-pdf/fpdf2/blob/2.7.7/test/test_recorder.py#L30

The reason for asking this, is that I am generating 30 or so documents at a time with a similar format, each with five heading and this is slow, it is copying the singleton state twice for each header which seems like madness. The speed can be drastically improved by removing the second deepcopy or at least using a boolean flag to copy or not; however it can be even better using some logic that at least fits my needs.

Great if you found some solution that fits your needs 🙂

I think it'd be great to improve fpdf2 performances for as many use cases as possible, as long as we do not lose functionalities.

It is not immediately clear to me what is your use case there: could you provide some minimal script that reproduces the performance issue you mentioned, please? This would help us to understand through which methods calls the FPDFRecorder() is intensively used in your situation (e.g. is it reached through FPDF.offset_rendering() or FPDF.unbreakable()? I'd guess it's the former one from FPDF.start_section()), and also provide a "baseline" use case that we could work on, to improve its speed.

Also, if you already found a solution for your problem, what is your intent in opening this issue?
Would you like to contribute on improving things in fpdf2? 🙂

@Lucas-C
Copy link
Member

Lucas-C commented Jan 15, 2024

If performances are better, I think that instead of calling .offset_rendering() in .start_section() we could copy the approach used in Table._get_row_layout_info(), that is to call .multi_cell() a first time with output=MethodReturnValue.PAGE_BREAK inside a pdf._disable_writing() context-block.

@Lucas-C Lucas-C changed the title FPDFRecorder FPDFRecorder frequent usage makes document generation slow Jan 15, 2024
@dmail00
Copy link
Author

dmail00 commented Jan 15, 2024

If performances are better, I think that instead of calling .offset_rendering() in .start_section() we could copy the approach used in Table._get_row_layout_info(), that is to call .multi_cell() a first time with output=MethodReturnValue.PAGE_BREAK inside a pdf._disable_writing() context-block.

This works really well, thanks and is just as fast as what I was doing. I did change it to return the height output=MethodReturnValue.HEIGHT as I still think having spaces for lines after the heading makes sense.

It is not immediately clear to me what is your use case there: could you provide some minimal script that reproduces the performance issue you mentioned, please?

It would have to be a contrived example, as what is currently written is pulled in from an external source and contains personal information.

Also, if you already found a solution for your problem, what is your intent in opening this issue?

It was solely to make you aware of the problem and provide a potential solution.

@Lucas-C
Copy link
Member

Lucas-C commented Feb 28, 2024

I opened PR #1127 to improve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants