Describing Images Inline in PDFs for Better RAG

**Title:** Describing Images Inline in PDFs for Better RAG

I've been using LLMs to describe images like this, but its just support pass image directly:

```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)
```

However, in many cases, my PDF files contain both text and embedded images. I want to extract both and generate inline image descriptions in Markdown to improve retrieval quality in GenAI.

Is there a way to achieve something like this?

**Original PDF content:**

```
Lorem ipsum dolor sit amet...  
<-- image 1 - contains an orange cat --!>
```

**Expected Markdown output:**

```
Lorem ipsum dolor sit amet...

The image shows a fat orange cat sleeping on a white background.
```

How can I process both text and images together like this in a single conversion step?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Describing Images Inline in PDFs for Better RAG #1346

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Describing Images Inline in PDFs for Better RAG #1346

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions