Skip to content

Tables in pdf files are not converted properly #293

@kristofmulier

Description

@kristofmulier

user_manual.pdf

I converted a pdf-file with lots of table to markdown. I had expected that markitdown would handle tables gracefully. For example, the following table:

Image

Should be converted into markdown like so:

| Register name | Description                     | Offset Address |
|---------------|---------------------------------|----------------|
| FMC_ACCTRL    | Flash access control register   | 0x00           |
| FMC_KEY       | Flash key register              | 0x04           |
| FMC_OPTKEY    | Flash option key register       | 0x08           |
| FMC_STS       | Flash state register            | 0x0C           |
| FMC_CTRL      | Flash control register          | 0x10           |
| FMC_OPTCTRL   | Flash option control register   | 0x14           |

However, what I get from markitdown is this:

  Register address mapping

Table 14 FMC Register Address Mapping

Register name

Description

Offset Address

FMC_ACCTRL

Flash access control register

FMC_KEY

Flash key register

FMC_OPTKEY

Flash option key register

FMC_STS

FMC_CTRL

Flash state register

Flash control register

FMC_OPTCTRL

Flash option control register

0x00

0x04

0x08

0x0C

0x10

0x14

The number 3.6 in the title is gone. But what's worse: the entire table is spread out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions