A Ruby gem for converting PDF documents to Markdown using a locally-hosted vision LLM (OCR via AI).
Pages are rendered as high-resolution PNG images and then sent to an OpenAI-compatible API endpoint for text extraction.
- Ruby >= 3.1
- poppler-utils (
pdftoppm,pdfinfo) - OpenAI-compatible vision LLM server (e.g. vLLM, Ollama, llama.cpp)
Install poppler on Debian/Ubuntu:
sudo apt install poppler-utilsYou can get Deepseek's OCR-2 model at huggingface
Add to your Gemfile:
gem 'pdf2markdownOCR'Then run:
bundle installOr install directly:
gem install pdf2markdownOCRConfiguration can be set via a block or via environment variables. The block takes priority.
You can configure the gem using a configuration block. This are the options and its default values.
require 'pdf2markdownOCR'
Pdf2MarkdownOCR.configure do |config|
# URL of your OpenAI-compatible LLM server
config.llm_api_url = "http://localhost:8000"
# Model name to request from the server
config.llm_model = "deepseek-ai/DeepSeek-OCR-2"
# PNG resolution used when rasterising PDF pages (higher = better OCR, slower)
config.png_dpi_resolution = 300
# Conversion mode: :single_thread or :multi_thread
# :multi_thread converts all pages to pngs in parallel threads
config.mode = :multi_thread
# The gem uses Ruby's stdlib `Logger` writing to `$stdout`. You can provide your own instance. To silence it completely, just pass Logger.new("/dev/null")
config.logger = Logger.new($stdout).tap do |log|
log.progname = self.class.name.split('::').first
end
endrequire 'pdf2markdownOCR'
markdown = Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf")
puts markdownPdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md")
# => nil (content written to output.md)Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md", pages: "1,2,5-7") #Will convert pages 1,2,5,6,7 After installation the pdf2markdownocr executable is available on your PATH. Options are the same as in the configuration block
Usage: pdf2markdownocr [options] <pdf_path>
Converts a PDF file to Markdown using OCR.
Options:
-o, --output FILE Output Markdown file
--llm-api-url OpenAI compatible server URL
--llm-model MODEL
--mode Processing mode: single_thread or multi_thread
--png-dpi DPI resolution for PNG conversion
--pages Page range
-h, --help Show help message
# Basic conversion (output saved to output.md)
pdf2markdownocr document.pdf
# Custom output file
pdf2markdownocr document.pdf -o result.md
# Custom llm
pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:9800 --llm-model deepseek-ai/DeepSeek-OCR
# Print version
pdf2markdownocr --versionEasy to try, but not recommended because performance isnt great, as it doesnt process the requests in parallel
Pull the model
ollama pull deepseek-ocr:latest
ollama run deepseek-ocr:latestThen call the tool with the correct port and model
pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:11434 --llm-model deepseek-ocr:latest
- Install uv and torch, and vllm
uv venv
source .venv/bin/activateIve had problems with my GPU by using the default vllm install and I find that installing torch and torchvision separately helps. Install pytorch
uv run pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu132 #This will depend on the cuda version installed in your systemInstall vllm
uv pip install -U vllm --torch-backend autoThen run the model
uv run vllm serve deepseek-ai/DeepSeek-OCR-2 --logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0MIT