Missing background color in HTML output

Hi there,

I've noticed that when running text extraction there are elements appearing at the bottom of the output HTML rather than in their correctly ordered position within the input PDF. 

Example PDF: (https://www.cqc.org.uk/sites/default/files/new_reports/AAAG0004.pdf)

![image](https://user-images.githubusercontent.com/49235569/76099743-78e8b700-5fc3-11ea-9e51-98ec72bb8c80.png)

For example, for the above image, the text: "This section is primarily information for the provider" appears at the top of the PDF. However in the output, the value appears in the following location:

![image](https://user-images.githubusercontent.com/49235569/76100829-46d85480-5fc5-11ea-8e2c-321ab5422f4e.png)

I know that elements within a PDF aren't ordered but I know PyMuPDF handles columns well so thought this seemed strange. Does anyone know if there is a parameter that can be modified to alter the ordering or have any other tips for correcting this? 

Hopefully someone is able to determine the cause of the problem.

Configuration

Windows 10 x64
Python 3.6
PyMuPDF version 1.16.11 (have tried older versions too)


Many thanks,

Corry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing background color in HTML output #459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing background color in HTML output #459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions