In [1]:
# %pip install retab

The `parse` method in Retab’s document processing pipeline **converts any document into cleaned, raw markdown text with page-by-page extraction**. 

This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, such as RAG pipelines, custom ingestion pipelines, embeddings classification, and content indexing workflows.

**For more information on retab's `parse` method, check our [documentation](https://docs.retab.com/core-concepts/Parsing#parsing).**

In [None]:
# Parse a Document

from dotenv import load_dotenv
from retab import Retab

load_dotenv() # You need to create a .env file containing your RETAB_API_KEY=sk_retab_***

client = Retab()

# Parse the document
response = client.documents.parse(
    document="../assets/docs/invoice.jpeg",
    model="gemini-2.5-flash",
    table_parsing_format="markdown",  # Better for RAG
    image_resolution_dpi=128          # Higher quality for technical docs
)

print(response.model_dump_json(indent=2))

{
  "document": {
    "filename": "invoice.jpeg",
    "url": "

## **Best Practices**

### **Model Selection**
* `gemini-2.5-pro`: Most accurate and robust model, recommended for complex or high-stakes document parsing tasks.
* `gemini-2.5-flash`: Best for speed and cost-effectiveness, suitable for most general-purpose documents.
* `gemini-2.5-flash-lite`: Fastest and most cost-efficient, ideal for simple documents or high-volume batch processing where maximum throughput is needed.
​
### **Image Quality Settings**
* Standard documents: `72-96 DPI`
* Technical documents: `150 DPI`
* Fine print/small text: `300+ DPI`