Skip to content

Powerpoint data extraction problems with file #227

@jamie-lemon

Description

@jamie-lemon

Using to_markdown there are the following issues with the results:

Slides 1-3 show the issue of duplicated headings, inconsistencies in md heading formats, and generic md formatting without customization.

Slide 4 shows the issue of losing hierarchal structure as the way text was extracted from the table makes it unclear on the order of information

Slide 5 shows the issue of tables not properly structured when converted to Markdown. Columns are misaligned, and data is scattered, making interpretation difficult.

Slide 6 shows indentation issues that led to bullet points blending into plain text, causing a loss of structural clarity.
In the overall md file, you don't see slide number dividers which makes it hard to differentiate which extracted text is from which slide and to pinpoint what information was lost.

Sample Slides.pptx

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions