Title: Describing Images Inline in PDFs for Better RAG
I've been using LLMs to describe images like this, but its just support pass image directly:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)
However, in many cases, my PDF files contain both text and embedded images. I want to extract both and generate inline image descriptions in Markdown to improve retrieval quality in GenAI.
Is there a way to achieve something like this?
Original PDF content:
Lorem ipsum dolor sit amet...
<-- image 1 - contains an orange cat --!>
Expected Markdown output:
Lorem ipsum dolor sit amet...
The image shows a fat orange cat sleeping on a white background.
How can I process both text and images together like this in a single conversion step?
Title: Describing Images Inline in PDFs for Better RAG
I've been using LLMs to describe images like this, but its just support pass image directly:
However, in many cases, my PDF files contain both text and embedded images. I want to extract both and generate inline image descriptions in Markdown to improve retrieval quality in GenAI.
Is there a way to achieve something like this?
Original PDF content:
Expected Markdown output:
How can I process both text and images together like this in a single conversion step?