[FIX] base: cut multipage multi-documents on top heading

When we print a report for multiple records, we call wkhtmltopdf one time and to save each document separately, we will split the PDF based on its outline if it is available. The outline is generated by wkhtmltopdf based on headings (H1,H2,H3,H4,H5,H6,H7,H8,H9 elements) for example this document: ``` <h1>hello</h1> <h2>world</h2> <h1>!</h1> ``` has this abbreviated outline structure: ``` /Outlines: { '/First': { '/Title': 'hello' '/First': { '/Title': 'world' }, '/Next': { '/Title': '!' } } } ``` But the current heuristic did not take into account level of headings, so if the document had a lower-level headings this could break the multi-printing of these reports. So for the example above, the document would be cut in 3 when in reality we only want to cut it twice over the top level heading (`<h1/>` here). An existing issue in Odoo is in l10n_in_sale, `<h6/>` element are added to the report invoice line so instead of only one `<h2/>` heading containing the invoice name per document, there was an additional heading per invoice line which broke the heuristic. note: we also add an assertion to ensure first heading is on first page. note: allowing several top-level heading on same page has been implemented so reports that did not work in 11.0 but worked in 12.0 and over (thanks to 573e577) still work after this change. opw-2188767 closes #48099 closes #48360 X-original-commit: 63c2478 Signed-off-by: Nicolas Lempereur (nle) <nle@odoo.com>
odoo · Mar 25, 2020 · 79dd226 · 79dd226
1 parent ada2fbb
commit 79dd226
Showing 1 changed file with 16 additions and 7 deletions.
diff --git a/odoo/addons/base/models/ir_actions_report.py b/odoo/addons/base/models/ir_actions_report.py
@@ -581,16 +581,25 @@ def close_streams(streams):
                     streams.append(pdf_content_stream)
                 else:
                     # In case of multiple docs, we need to split the pdf according the records.
-                    # To do so, we split the pdf based on outlines computed by wkhtmltopdf.
+                    # To do so, we split the pdf based on top outlines computed by wkhtmltopdf.
                     # An outline is a <h?> html tag found on the document. To retrieve this table,
-                    # we look on the pdf structure using pypdf to compute the outlines_pages that is
-                    # an array like [0, 3, 5] that means a new document start at page 0, 3 and 5.
+                    # we look on the pdf structure using pypdf to compute the outlines_pages from
+                    # the top level heading in /Outlines.
                     reader = PdfFileReader(pdf_content_stream)
-                    if reader.trailer['/Root'].get('/Dests'):
-                        outlines_pages = sorted(
-                            set(outline.getObject()[0] for outline in reader.trailer['/Root']['/Dests'].values())
-                        )
+                    root = reader.trailer['/Root']
+                    if '/Outlines' in root and '/First' in root['/Outlines']:
+                        outlines_pages = []
+                        node = root['/Outlines']['/First']
+                        while True:
+                            outlines_pages.append(root['/Dests'][node['/Dest']][0])
+                            if '/Next' not in node:
+                                break
+                            node = node['/Next']
+                        outlines_pages = sorted(set(outlines_pages))
+                        # There should be only one top-level heading by document
                         assert len(outlines_pages) == len(res_ids)
+                        # There should be a top-level heading on first page
+                        assert outlines_pages[0] == 0
                         for i, num in enumerate(outlines_pages):
                             to = outlines_pages[i + 1] if i + 1 < len(outlines_pages) else reader.numPages
                             attachment_writer = PdfFileWriter()