pdf2markdownOCR

A Ruby gem for converting PDF documents to Markdown using a locally-hosted vision LLM (OCR via AI).
Pages are rendered as high-resolution PNG images and then sent to an OpenAI-compatible API endpoint for text extraction.

Requirements

Ruby >= 3.1
poppler-utils (pdftoppm, pdfinfo)
OpenAI-compatible vision LLM server (e.g. vLLM, Ollama, llama.cpp)

Install poppler on Debian/Ubuntu:

sudo apt install poppler-utils

You can get Deepseek's OCR-2 model at huggingface

Installation

Add to your Gemfile:

gem 'pdf2markdownOCR'

Then run:

bundle install

Or install directly:

gem install pdf2markdownOCR

Configuration

Configuration can be set via a block or via environment variables. The block takes priority.

Via configure block

You can configure the gem using a configuration block. This are the options and its default values.

require 'pdf2markdownOCR'

Pdf2MarkdownOCR.configure do |config|
  # URL of your OpenAI-compatible LLM server
  config.llm_api_url = "http://localhost:8000"

  # Model name to request from the server
  config.llm_model = "deepseek-ai/DeepSeek-OCR-2"

  # PNG resolution used when rasterising PDF pages (higher = better OCR, slower)
  config.png_dpi_resolution = 300

  # Conversion mode: :single_thread or :multi_thread
  # :multi_thread converts all pages to pngs in parallel threads
  config.mode = :multi_thread

  # The gem uses Ruby's stdlib `Logger` writing to `$stdout`. You can provide your own instance. To silence it completely, just pass Logger.new("/dev/null") 
  
  config.logger = Logger.new($stdout).tap do |log|
    log.progname = self.class.name.split('::').first
  end
end

Usage as a library

Convert a PDF and get Markdown as a string

require 'pdf2markdownOCR'

markdown = Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf")
puts markdown

Convert a PDF and write directly to a file

Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md")
# => nil  (content written to output.md)

Convert specific page range

Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md", pages: "1,2,5-7") #Will convert pages 1,2,5,6,7

Usage as a CLI

After installation the pdf2markdownocr executable is available on your PATH. Options are the same as in the configuration block

Usage: pdf2markdownocr [options] <pdf_path>

Converts a PDF file to Markdown using OCR.

Options:
  -o, --output FILE    Output Markdown file
  --llm-api-url OpenAI compatible server URL
  --llm-model MODEL
  --mode Processing mode: single_thread or multi_thread
  --png-dpi DPI resolution for PNG conversion
  --pages Page range
  -h, --help Show help message

Examples

# Basic conversion (output saved to output.md)
pdf2markdownocr document.pdf

# Custom output file
pdf2markdownocr document.pdf -o result.md

# Custom llm

pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:9800 --llm-model deepseek-ai/DeepSeek-OCR

# Print version
pdf2markdownocr --version

Running the models

Ollama setup

Easy to try, but not recommended because performance isnt great, as it doesnt process the requests in parallel

Pull the model

ollama pull deepseek-ocr:latest
ollama run deepseek-ocr:latest

Then call the tool with the correct port and model

pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:11434 --llm-model deepseek-ocr:latest

vLLM

Official vLLM Guide

Install uv and torch, and vllm

uv venv
source .venv/bin/activate

Ive had problems with my GPU by using the default vllm install and I find that installing torch and torchvision separately helps. Install pytorch

uv run pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu132 #This will depend on the cuda version installed in your system

Install vllm

uv pip install -U vllm --torch-backend auto

Then run the model

uv run vllm serve deepseek-ai/DeepSeek-OCR-2 --logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
lib		lib
.gitignore		.gitignore
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
pdf2markdownOCR.gemspec		pdf2markdownOCR.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2markdownOCR

Requirements

Installation

Configuration

Via configure block

Usage as a library

Convert a PDF and get Markdown as a string

Convert a PDF and write directly to a file

Convert specific page range

Usage as a CLI

Examples

Running the models

Ollama setup

vLLM

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf2markdownOCR

Requirements

Installation

Configuration

Via configure block

Usage as a library

Convert a PDF and get Markdown as a string

Convert a PDF and write directly to a file

Convert specific page range

Usage as a CLI

Examples

Running the models

Ollama setup

vLLM

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages