Skip to content

Describing Images Inline in PDFs for Better RAG #1346

@minhnghia2k3

Description

@minhnghia2k3

Title: Describing Images Inline in PDFs for Better RAG

I've been using LLMs to describe images like this, but its just support pass image directly:

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)

However, in many cases, my PDF files contain both text and embedded images. I want to extract both and generate inline image descriptions in Markdown to improve retrieval quality in GenAI.

Is there a way to achieve something like this?

Original PDF content:

Lorem ipsum dolor sit amet...  
<-- image 1 - contains an orange cat --!>

Expected Markdown output:

Lorem ipsum dolor sit amet...

The image shows a fat orange cat sleeping on a white background.

How can I process both text and images together like this in a single conversion step?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions