Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMP] base/report, web/report: remove pdf merge with single call to w… #17101

Closed
wants to merge 1 commit into from

Conversation

smetl
Copy link
Contributor

@smetl smetl commented May 19, 2017

…khtmltopdf

The old behavior was to call wkhtmltopdf for each report and to put all of them together
using the merge_pdf method. It was a very slow approach because all report needs
5 temporary files, one subprocess and then, a merge.

The new behavior is to perform a single call to wkhtmltopdf by putting all the html
together and so, avoid merge to greatly improve the performances of the reporting.

see task: https://www.odoo.com/web#id=33527&view_type=form&model=project.task&action=333&active_id=248&menu_id=4720

--
I confirm I have signed the CLA and read the PR guidelines at www.odoo.com/submit-pr

@smetl smetl added the RD research & development, internal work label May 19, 2017
@smetl smetl requested a review from sle-odoo May 19, 2017 11:09
@lmignon
Copy link
Contributor

lmignon commented May 19, 2017

@smetl What will happen with the page numbering? Will the page numbering restart for each document? IMO it would be better to have an option on the report to process all the generated document at once or individually. Moreover does the generated invoice be saved in an attachment linked to the related account.invoice instance if you generate a lot of invoices at once?

@smetl
Copy link
Contributor Author

smetl commented May 19, 2017

@lmignon The page numbering is not restart for each document.
Unfortunately, we are not able to generate the attachments for each record individually with this commit.

@smetl smetl force-pushed the master-rem-pdf-merge-las branch 3 times, most recently from 85b93cb to fdeacdb Compare May 22, 2017 08:30
@smetl
Copy link
Contributor Author

smetl commented May 22, 2017

@lmignon The page numbering is now restarted at each document.

@lmignon
Copy link
Contributor

lmignon commented May 22, 2017

@smetl This refactoring is not compatible with the requirements of our customers since we lost important functionalities when reports are generated in batch mode:

  • For each record we must store the generated document as attachment,
  • For each record we must honor the attachment_use parameter

Even if we have cases where these functionalities are not required, we have others where these are required. If you can't find a way to keep these functionalities we must at least have a way to choice between the 2 modes: call wkhtml2pdf all at once or one by one. The one by one mode will be used when we must keep the current behavior (attachment_use and attachment stored by record)

@nhomar
Copy link
Collaborator

nhomar commented May 22, 2017

@lmignon for some reason the attachment_use feature is lost intentionally, I can't find a real reason with real explanation and rationale yet, but what I think is that odoo is betting now to make an special method to record the printed reporst as attachments by use cases (and not as an API as this was before) that's my asumption but @antonylesuisse can explain better the ratioanale, then it is not lost with this change it was lost time ago on master.

One suposition I have is that when you send by email the report is attached and in the past we have the same PDF saved twice (once when sent and once when printed [1]) and the solution was remove the autogeneration part in favor of force the "send by email" feature (which is by code).

I think we need a more elegant solution but this was what odoo's brings until now. :-(

[1] #13838

@nhomar
Copy link
Collaborator

nhomar commented May 22, 2017

@smetl

I am worried about this:

Unfortunately, we are not able to generate the attachments for each record individually with this commit.

Every record is an individual unit, join them all will bring more problems than solutions.

Sales orders, Purchase orders, Invoices, Account moves .... all of them ar completelly separated documents.

At least you generate a powerfull analysisi of the content, you will face probelms with page-brakes in-code also and with Javascripts that power reports to show advanced information which depends of the document itself.

I have a question:

In what case you need huge amount of records printed at once....?

If the reason is performance, I think this is an incorrect approach.

And please I know you do not have the decision, but we all use the autoattachment feature a lot, I yet do not understand the removal of such nice feature.

@nhomar
Copy link
Collaborator

nhomar commented May 22, 2017

@lmignon

I correct myself I confused a WiP PR with master one, ths removal of attahcment_use has not been merged yet into master.

image

I tested with tender reports and it works properly as usual.

image

@smetl smetl force-pushed the master-rem-pdf-merge-las branch from 5253d7d to 9c4b989 Compare May 23, 2017 10:35
@smetl
Copy link
Contributor Author

smetl commented May 24, 2017

@lmignon @nhomar My third commit uses a trick to split the resulting pdf to allow the storing as attachment for each record individually. Then, we keep all the functionalities as previously but the performance are increased.

@smetl smetl force-pushed the master-rem-pdf-merge-las branch from 2375f60 to 2671398 Compare May 24, 2017 12:05
@smetl smetl force-pushed the master-rem-pdf-merge-las branch 2 times, most recently from 8e57f1b to f68f2c1 Compare June 2, 2017 10:54
[(r.id, r) for r in self.env[self.model].browse([res_id for res_id in res_ids if res_id])])
if len(res_ids) == 1 and res_ids[0] in record_map:
# Only one record, so postprocess directly and append the whole pdf.
self.postprocess_pdf_report(record_map[res_ids[0]], pdf_content)
Copy link
Contributor

@lmignon lmignon Jun 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smetl Once again you remove a functionality.... The potential modification of the content by the postprocess method will no more be part of the result...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmignon After discussion with @antonylesuisse , this functionality will not be longer supported for performance reason. This commit avoids generating a temporary file for each sub-report.

Copy link
Contributor

@lmignon lmignon Jun 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smetl but if the pdf_content is a StringIO and the StringIO is added to the writer after the call to subprocess it's possible to support this functionality. Am I wrong?
cc @antonylesuisse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmignon The problem is we call wkhtmltopdf only once. Then, the file contains all the reports, not only the report for each record individually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smetl Yes it's nice to have wkhtmltopdb called only once. By reading your code I've the feeling it's possible to adapt this code to put into the final result a StringIO that could be passed to the postprocess method without additional performance cost.

      # Append content of pdf if exists
        if pdf_content:
            content_streams = []
            # Call the postprocess method for each record.
            if res_ids and self.attachment_use and self.attachment:
                # Build a record_map mapping id -> record
                record_map = dict(
                    [(r.id, r) for r in self.env[self.model].browse([res_id for res_id in res_ids if res_id])])
                if len(res_ids) == 1 and res_ids[0] in record_map:
                    # Only one record, so postprocess directly and append the whole pdf.
                    stream = StringIO(pdf_content)
                    self.postprocess_pdf_report(record_map[res_ids[0]], stream)
                    content_streams.append(streams)
                else:
                    # In case of multiple docs, we need to split the pdf according the records.
                    # To do so, we split the pdf based on outlines computed by wkhtmltopdf.
                    # An outline is a <h?> html tag found on the document. To retrieve this table,
                    # we look on the pdf structure using pypdf to compute the outlines_pages that is
                    # an array like [0, 3, 5] that means a new document start at page 0, 3 and 5.
                    outlines_pages = sorted(
                        [outline.getObject()[0] for outline in reader.trailer['/Root']['/Dests'].values()])
                    assert len(outlines_pages) == len(res_ids)
                    for i, num in enumerate(outlines_pages):
                        if not res_ids[i]:
                            continue
                        to = outlines_pages[i + 1] if i + 1 < len(outlines_pages) else reader.numPages
                        writer = PdfFileWriter()
                        for j in range(num, to):
                            writer.addPage(reader.getPage(j))
                        attachment_content = StringIO()
                        writer.write(attachment_content)
                        self.postprocess_pdf_report(record_map[res_ids[i]], attachment_content)
                        content_streams.append(attachment_content)
            else:
                stream = StringIO(pdf_content)
                content_streams.append(stream)

            # add content to the result
            for stream in content_streams:
                reader = PdfFileReader(stream)
                writer.appendPagesFromReader(reader)
            close_streams(content_streams)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmignon In my last commit, I pass the stringio instead of the attachment_content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smetl Thank you. And what do you thing about my proposal into the code above.
The idea it to put each stream passed to the postprocess method into an array and at the end iterate on the array to append the pages to the writer. With this changen if a stream is modified by the postprocess, it's part of the final result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmignon I understand what you mean. I will adapt my code to take into account your concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmignon It should work now with my last commit.

@smetl smetl force-pushed the master-rem-pdf-merge-las branch from c63181f to 8149b4b Compare June 6, 2017 07:57
@lmignon
Copy link
Contributor

lmignon commented Jun 6, 2017

@smetl 🎉 Thank you for the changes... LGTM

@smetl smetl requested a review from rco-odoo June 8, 2017 09:21
@smetl smetl force-pushed the master-rem-pdf-merge-las branch from 8149b4b to 549fa0d Compare June 8, 2017 09:24
Copy link
Member

@rco-odoo rco-odoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the code looks okay to me.

However, your naming convention is inconsistent. You should stick to the usual Odoo convention:

  • "record" is for a browse record,
  • "records" is for a recordset,
  • "record_id" is for a record ID (integer),
  • "record_ids" is for a collection of record IDs.

Thanks,
Raphael

:param attachment_name: The name of the attachment.
:param record_id: The record that will own the attachment.
:param pdf_content: The optional name content of the file to avoid reading both times.
:return The newly generated attachment if no AccessError, else None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, colon missing in :return:

@@ -180,73 +146,50 @@ def unlink_action(self):
# Main report methods
#--------------------------------------------------------------------------
@api.multi
def retrieve_attachment(self, record_id, attachment_name=None):
def retrieve_attachment(self, record_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name record_id is confusing: it suggests that it should be a record ID. I recommend to use record instead.

@api.model
def postprocess_pdf_report(self, res_id, pdfreport_path, attachment_name):
@api.multi
def postprocess_pdf_report(self, record_id, buffer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, record is less confusing IMHO...

try:
self.env['ir.attachment'].create(attachment)
attachment_id = self.env['ir.attachment'].create(attachment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better naming convention maybe:

  • attachmentattachment_vals
  • attachment_idattachment

pdf_content_stream = StringIO(pdf_content)
# Build a record_map mapping id -> record
record_map = dict(
[(r.id, r) for r in self.env[self.model].browse([res_id for res_id in res_ids if res_id])])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a dict comprehension and maybe filter to filter out falsy ids:

record_map = {r.id: r for r in self.env[self.model].browse(filter(None, res_ids))}

…khtmltopdf

The old behavior was to call wkhtmltopdf for each report and to put all of them together
using the merge_pdf method. It was a very slow approach because all report needs
5 temporary files, one subprocess and then, a merge.

The new behavior is to perform a single call to wkhtmltopdf by putting all the footers/headers
html together and so, avoid call to merge_pdf to greatly improve the performances of the reporting.
@ged-odoo
Copy link
Contributor

merged in master

@ged-odoo ged-odoo closed this Jul 20, 2017
@ged-odoo ged-odoo deleted the master-rem-pdf-merge-las branch July 20, 2017 12:53
@smetl
Copy link
Contributor Author

smetl commented Jul 20, 2017

@ged-odoo Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RD research & development, internal work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants