Add Markdown export pipeline (RailReader.Export) by sjvrensburg · Pull Request #89 · sjvrensburg/railreader2

sjvrensburg · 2026-04-12T09:11:39Z

Summary

New RailReader.Export library with IMarkdownExportService interface in Core and MarkdownExportService implementation — structured PDF-to-Markdown export using layout analysis, VLM transcription, heading resolution (outline fuzzy-match), and annotation blockquotes
New railreader2-cli export command with graceful degradation: ONNX+VLM → ONNX-only → plain text fallback
Shared helpers extracted to Core: VlmService.GetBlockAction, VlmEndpointConfig.FromAppConfigWithOverrides, LayoutConstants.GetClassName
32 new tests in RailReader.Export.Tests, 0 regressions in existing 193 Core tests
Documentation updated across CLAUDE.md, README.md, user guide, and website

Test plan

dotnet build RailReader2.slnx -c Release — 0 warnings, 0 errors
dotnet test tests/RailReader.Export.Tests — 32/32 pass
dotnet test tests/RailReader.Core.Tests — 193/193 pass
railreader2-cli export --help displays correctly
railreader2-cli export <pdf> --no-vlm --output plain.md — verify heading hierarchy, [equation] placeholders, annotation blockquotes
railreader2-cli export <pdf> --pages 50-52 --endpoint ... --model ... --output rich.md — verify LaTeX equations, pipe tables, figure descriptions

🤖 Generated with Claude Code

New library providing structured PDF-to-Markdown export using layout analysis, VLM transcription, and annotation extraction. Includes CLI `export` command with graceful degradation (ONNX+VLM → ONNX-only → plain text fallback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds the new `export` CLI command and RailReader.Export library to all documentation surfaces: architecture diagrams, feature lists, CLI reference sections, and the guide.html website page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract VlmService.GetBlockAction and VlmEndpointConfig.FromAppConfigWithOverrides to Core, replacing duplicated logic in Export + VlmCommand + ExportCommand - Add LayoutConstants.GetClassName helper, replacing scattered bounds-check patterns - Cache ExtractBlockText results per block to avoid O(blocks * chars) repeated scans - Flatten PDF outline once per document instead of per page - Unify AppendAnnotations/AppendAnnotationsWithText into single method with optional PageText — enables rich highlight extraction in both layout and plain-text paths - Remove dead annotations parameter from PageMarkdownBuilder.Build - Remove redundant vlmAvailable bool (derive from vlmEndpoint nullness) - Remove double VLM endpoint resolution (ExportCommand resolves, service trusts it) - Strip unnecessary WHAT comments, keep WHY comments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sjvrensburg and others added 4 commits April 12, 2026 10:49

Bump version to 3.7.0.0

fe625a1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sjvrensburg merged commit 3b4141e into main Apr 12, 2026

sjvrensburg deleted the feature/markdown-export branch April 12, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Markdown export pipeline (RailReader.Export)#89

Add Markdown export pipeline (RailReader.Export)#89
sjvrensburg merged 4 commits into
mainfrom
feature/markdown-export

sjvrensburg commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sjvrensburg commented Apr 12, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant