-
Notifications
You must be signed in to change notification settings - Fork 156
Closed
Labels
Description
Hello,
I am reaching out regarding my recent experience with pymupdf4llm. I have a PDF file that was created from a PowerPoint presentation, and I am attempting to extract specific text elements from it.
pdf content :
Text 1
- sub text 1.1
- sub text 1.2
Text 2
- sub text 2.1
- sub text 2.2
I am currently using the following code to read the PDF file:
all_pages_pdf = pymupdf4llm.to_markdown(filename, `page_chunks=True)
for page in all_pages_pdf:
page_number = page['metadata']['page']
page_content = page['text']
print(page_number)
print(page_content)
Actual Output With V0.0.10 code :
Text 1
Text 2
-
sub text 1.1
-
sub text 1.2
-
sub text 2.1
-
sub text 2.2
However, I am aiming for the following desired output:
Text 1
- sub text 1.1
- sub text 1.2
Text 2
- sub text 2.1
- sub text 2.2
I would appreciate any guidance or assistance in achieving the desired output.
Thank you for your attention to this matter.