Parse PDFs, images, Word, and PowerPoint files into clean Markdown or JSON using SoMark — the document intelligence API built for AI workflows.
npx skills add https://github.com/SoMarkAI/somark-document-parserWorks with Claude Code, Cursor, Cline, OpenCode, and 40+ other agents.
When you share a document with your AI agent, SoMark parses it into structured Markdown or JSON that the agent can actually reason over — not just OCR'd text, but proper headings, tables, formulas, and layout.
Supported formats:
| Type | Formats |
|---|---|
| Documents | PDF, DOC, DOCX, PPT, PPTX |
| Images | PNG, JPG, JPEG, BMP, TIFF, WEBP, HEIC, HEIF, GIF |
Example triggers:
- "Parse this PDF for me"
- "Extract the key clauses from this contract"
- "Summarize the paper I just uploaded"
- "Convert this document to Markdown"
- "What does this image say?"
Get an API key at somark.tech, then set it as an environment variable:
export SOMARK_API_KEY=sk-your-api-keyOr add it to your agent's settings. The skill will guide you through setup on first use.
Free quota: SoMark offers a free tier. Visit the purchase page and follow the instructions there to claim it.
Most agents struggle with documents because raw PDF/image data loses structure. SoMark preserves:
- Heading hierarchy — agents can understand document sections correctly
- Tables — fully reconstructed instead of flattened into plain text
- Formulas and diagrams — converted to LaTeX or described accurately
- Multi-column layouts — reading order is preserved
The result: your agent gives accurate, context-aware answers instead of hallucinating from garbled text.
| Constraint | Limit |
|---|---|
| Max file size | 200 MB |
| Max pages | 300 pages |
| QPS per account | 1 |
MIT