feat: add vision embedding indexing pipeline#516
Merged
philwinder merged 3 commits intomainfrom Apr 6, 2026
Merged
Conversation
Switch from build-tag-gated CGO pdfium to an always-compiled WebAssembly backend using Wazero. This eliminates CGO dependencies, system library requirements, and build tag complexity. The WASM binary is embedded in the go-pdfium module. - Remove pdfium build tag and stub file - Use webassembly.Init instead of single_threaded.Init - Fix deferred close calls to handle return values - Add test asserting pdfium WASM is available - Tighten integration test assertions (no longer conditional on build tag) Assisted by AI. Co-Authored-By: Helix <noreply@helix.ml>
Add CreatePageImageEmbeddings pipeline step that rasterizes page-image enrichments from PDFs and stores their vision embeddings using the SigLIP2 model in a dedicated vector store. - New CreatePageImageEmbeddings handler with batch processing - Add OperationCreatePageImageEmbeddingsForCommit operation - Add WithVision flag to PrescribedOperations - Add TaskNameVision for separate vision embedding store - Wire vision model, embedding store, and handler in kodit.go - Add visionEmbeddingStore to Enrichment service for cascade deletes Assisted by AI. Co-Authored-By: Helix <noreply@helix.ml>
3 tasks
Contributor
Go Test CoverageTotal coverage: 32.3% Full coverage report |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CreatePageImageEmbeddingspipeline step that rasterizes page-image enrichments and stores vision embeddings using SigLIP2TaskNameVision) to avoid mixing with code/text embeddingsvisionflag onPrescribedOperationsgates the pipeline step (similar toexamplesfeature flag)Changes
application/handler/indexing/create_page_image_embeddings.go: New handler with batch rasterize → embed → store logicdomain/task/operation.go: New operation,WithVisionflag onPrescribedOperationsinfrastructure/persistence/embedding_store.go:TaskNameVisionhandlers.go: Register vision embedding handler when vision model is availablekodit.go: Wire vision model init, vision embedding store, prescribedOpsapplication/service/enrichment.go: AddvisionEmbeddingStorefor cascade deletesTest plan
make buildsucceedsmake test PKG=./application/handler/indexing/...passesmake test PKG=./application/service/...passesDepends on: #515
Assisted by AI. Co-Authored-By: Helix noreply@helix.ml