Skip to content

Commit

Permalink
[FIX] base: cut multipage multi-documents on top heading
Browse files Browse the repository at this point in the history
When we print a report for multiple records, we call wkhtmltopdf one
time and to save each document separately, we will split the PDF based
on its outline if it is available.

The outline is generated by wkhtmltopdf based on headings
(H1,H2,H3,H4,H5,H6,H7,H8,H9 elements) for example this document:

```
<h1>hello</h1>
<h2>world</h2>
<h1>!</h1>
```

has this abbreviated outline structure:

```
/Outlines: {
    '/First': {
        '/Title': 'hello'
        '/First': {
            '/Title': 'world'
        },
        '/Next': {
            '/Title': '!'
        }
    }
}
```

But the current heuristic did not take into account level of headings,
so if the document had a lower-level headings this could break the
multi-printing of these reports.

So for the example above, the document would be cut in 3 when in reality
we only want to cut it twice over the top level heading (`<h1/>` here).

An existing issue in Odoo is in l10n_in_sale, `<h6/>` element are added
to the report invoice line so instead of only one `<h2/>` heading
containing the invoice name per document, there was an additional
heading per invoice line which broke the heuristic.

note: we also add an assertion to ensure first heading is on first page.

note: allowing several top-level heading on same page has been
implemented so reports that did not work in 11.0 but worked in 12.0 and
over (thanks to 573e577) still work after this change.

opw-2188767
closes #48099

closes #48360

X-original-commit: 63c2478
Signed-off-by: Nicolas Lempereur (nle) <nle@odoo.com>
  • Loading branch information
nle-odoo committed Mar 25, 2020
1 parent ada2fbb commit 79dd226
Showing 1 changed file with 16 additions and 7 deletions.
23 changes: 16 additions & 7 deletions odoo/addons/base/models/ir_actions_report.py
Expand Up @@ -581,16 +581,25 @@ def close_streams(streams):
streams.append(pdf_content_stream)
else:
# In case of multiple docs, we need to split the pdf according the records.
# To do so, we split the pdf based on outlines computed by wkhtmltopdf.
# To do so, we split the pdf based on top outlines computed by wkhtmltopdf.
# An outline is a <h?> html tag found on the document. To retrieve this table,
# we look on the pdf structure using pypdf to compute the outlines_pages that is
# an array like [0, 3, 5] that means a new document start at page 0, 3 and 5.
# we look on the pdf structure using pypdf to compute the outlines_pages from
# the top level heading in /Outlines.
reader = PdfFileReader(pdf_content_stream)
if reader.trailer['/Root'].get('/Dests'):
outlines_pages = sorted(
set(outline.getObject()[0] for outline in reader.trailer['/Root']['/Dests'].values())
)
root = reader.trailer['/Root']
if '/Outlines' in root and '/First' in root['/Outlines']:
outlines_pages = []
node = root['/Outlines']['/First']
while True:
outlines_pages.append(root['/Dests'][node['/Dest']][0])
if '/Next' not in node:
break
node = node['/Next']
outlines_pages = sorted(set(outlines_pages))
# There should be only one top-level heading by document
assert len(outlines_pages) == len(res_ids)
# There should be a top-level heading on first page
assert outlines_pages[0] == 0
for i, num in enumerate(outlines_pages):
to = outlines_pages[i + 1] if i + 1 < len(outlines_pages) else reader.numPages
attachment_writer = PdfFileWriter()
Expand Down

0 comments on commit 79dd226

Please sign in to comment.