Releases
v0.2.0
New OCR & Embedding Providers
Compare
Sorry, something went wrong.
No results found
Added
DatalabOCR : New OCR processor using Datalab API with marker model
Supports three processing modes: fast, balanced, and accurate
Page range filtering and max pages limiting
Image extraction with optional captions
Cost tracking per page processed
OpenAIEmbedder : New embedder using OpenAI's embedding models
Support for text-embedding-3-small (1536 dimensions)
Support for text-embedding-3-large (3072 dimensions)
Normalized vector embeddings with L2 norm ≈ 1
Token usage tracking for embedding operations
Comprehensive integration tests for both new components
Regular functionality tests
Behavior tests ensuring embedding quality and OCR accuracy
Validation of embedding dimensions, normalization, and similarity properties
Updated README with examples for DatalabOCR and OpenAIEmbedder
Added section on using alternative OCR and embedding providers
You can’t perform that action at this time.