Skip to content
Permalink
Browse files

[FIX] base: cut multipage multi-documents on top heading

When we print a report for multiple records, we call wkhtmltopdf one
time and to save each document separately, we will split the PDF based
on its outline if it is available.

The outline is generated by wkhtmltopdf based on headings
(H1,H2,H3,H4,H5,H6,H7,H8,H9 elements) for example this document:

```
<h1>hello</h1>
<h2>world</h2>
<h1>!</h1>
```

has this abbreviated outline structure:

```
/Outlines: {
    '/First': {
        '/Title': 'hello'
        '/First': {
            '/Title': 'world'
        },
        '/Next': {
            '/Title': '!'
        }
    }
}
```

But the current heuristic did not take into account level of headings,
so if the document had a lower-level headings this could break the
multi-printing of these reports.

So for the example above, the document would be cut in 3 when in reality
we only want to cut it twice over the top level heading (`<h1/>` here).

An existing issue in Odoo is in l10n_in_sale, `<h6/>` element are added
to the report invoice line so instead of only one `<h2/>` heading
containing the invoice name per document, there was an additional
heading per invoice line which broke the heuristic.

note: we also add an assertion to ensure first heading is on first page.

note: allowing several top-level heading on same page has been
implemented so reports that did not work in 11.0 but worked in 12.0 and
over (thanks to 573e577) still work after this change.

opw-2188767
closes #48099

closes #48142

X-original-commit: aeeca61
Signed-off-by: Nicolas Lempereur (nle) <nle@odoo.com>
  • Loading branch information
nle-odoo committed Mar 20, 2020
1 parent a864edc commit 345f6eafa44feebc6dc6923f2ab346aa58cab5ba
Showing with 16 additions and 6 deletions.
  1. +16 −6 odoo/addons/base/models/ir_actions_report.py
@@ -529,15 +529,25 @@ def close_streams(streams):
streams.append(pdf_content_stream)
else:
# In case of multiple docs, we need to split the pdf according the records.
# To do so, we split the pdf based on outlines computed by wkhtmltopdf.
# To do so, we split the pdf based on top outlines computed by wkhtmltopdf.
# An outline is a <h?> html tag found on the document. To retrieve this table,
# we look on the pdf structure using pypdf to compute the outlines_pages that is
# an array like [0, 3, 5] that means a new document start at page 0, 3 and 5.
# we look on the pdf structure using pypdf to compute the outlines_pages from
# the top level heading in /Outlines.
reader = PdfFileReader(pdf_content_stream)
if reader.trailer['/Root'].get('/Dests'):
outlines_pages = sorted(
[outline.getObject()[0] for outline in reader.trailer['/Root']['/Dests'].values()])
root = reader.trailer['/Root']
if '/Outlines' in root and '/First' in root['/Outlines']:
outlines_pages = []
node = root['/Outlines']['/First']
while True:
outlines_pages.append(root['/Dests'][node['/Dest']][0])
if '/Next' not in node:
break
node = node['/Next']
outlines_pages = sorted(set(outlines_pages))
# There should be only one top-level heading by document
assert len(outlines_pages) == len(res_ids)
# There should be a top-level heading on first page
assert outlines_pages[0] == 0
for i, num in enumerate(outlines_pages):
to = outlines_pages[i + 1] if i + 1 < len(outlines_pages) else reader.numPages
attachment_writer = PdfFileWriter()

0 comments on commit 345f6ea

Please sign in to comment.
You can’t perform that action at this time.