Skip to content

SoMarkAI/skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoMark Skills

中文

Official SoMark skills collection for document parsing, image OCR, and intelligent extraction — built for AI agent workflows.

Install

npx skills add https://github.com/SoMarkAI/skills

Works with Claude Code, Cursor, Cline, OpenCode, and 40+ other agents.


Skills

Skill Description
somark-document-parser Parse PDFs, Word, PowerPoint, and images into structured Markdown, JSON
image-parser Core image OCR capability that returns text with precise coordinates (OCR + location awareness)
document-diff Compare two documents and generate a structured diff report showing changes, additions, and deletions
contract-reviewer Review contracts for risks, unfair clauses, missing provisions, and key obligations with severity ratings
resume-parser Parse resumes and CVs into structured JSON profiles with opinionated candidate assessment
tender-analyzer Extract qualification requirements, scoring criteria, deadlines, and submission checklists from procurement documents
paper-digest Deeply analyze academic papers into structured research cards covering methods, results, and contributions
financial-report-analyzer Extract financial metrics, risk signals, and management commentary from annual reports and earnings releases
pitch-screener Screen startup pitch decks from a VC/angel investor perspective — parses the deck, runs background research via web search, and produces a pre-meeting investment memo

What it does

When you share a document with your AI agent, SoMark parses it into structured Markdown, JSON that the agent can actually reason over — not just OCR'd text, but proper headings, tables, formulas, and layout.

The image-parser skill goes further: it returns every text block with its exact pixel coordinates on the original image, enabling field extraction, region location, and document automation.

Supported formats:

Type Formats
Documents PDF, DOC, DOCX, PPT, PPTX
Images PNG, JPG, JPEG, BMP, TIFF, WEBP, HEIC, HEIF, GIF

Example triggers:

  • "Parse this PDF for me"
  • "Extract the key clauses from this contract"
  • "Review this contract for risks"
  • "Parse this resume and give me a candidate assessment"
  • "Analyze this annual report"
  • "Summarize the paper I just uploaded"
  • "What changed between these two documents?"
  • "Analyze this tender document — what are the qualification requirements?"
  • "Convert this document to Markdown"
  • "What does this image say?"
  • "Extract all text with bounding boxes from this image"
  • "Find the invoice amount and its position on the page"

Setup

Get an API key at somark.tech, then set it as an environment variable:

export SOMARK_API_KEY=sk-your-api-key

Or add it to your agent's settings. The skill will guide you through setup on first use.

Free quota: SoMark offers a free tier. Visit the purchase page and follow the instructions there to claim it.


Why SoMark

Most agents struggle with documents because raw PDF/image data loses structure. SoMark preserves:

  • Heading hierarchy — agents can understand document sections correctly
  • Tables — fully reconstructed instead of flattened into plain text
  • Formulas and diagrams — converted to LaTeX or described accurately
  • Multi-column layouts — reading order is preserved

The result: your agent gives accurate, context-aware answers instead of hallucinating from garbled text.


Limits

Constraint Limit
Max file size 200 MB
Max pages 300 pages
QPS per account 1

License

MIT

About

AI skill for parsing PDF, images, Word & PPT into Markdown/JSON via SoMark API. Works with Claude Code, Cursor, Cline and 40+ agents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages