[lore 3/7] Embedder + BGE int8 impl#3
Merged
Conversation
5eb08c4 to
e83bbd1
Compare
Adds pkg/lore/embed with the batch-oriented Embedder interface
(Embed/Dimensions/Close), and pkg/lore/embed/bge with a reference
implementation backed by the bundled int8-quantized BGE-small model.
- Embed interface is batch-only by design; single callers wrap with []string{s}
- BGE impl: model.onnx + vocab.txt embedded via go:embed; extracted to
os.TempDir() at first New() call so ONNX Runtime can load from disk
- Library probed from system install (LORE_ONNXRUNTIME_LIB override or
default Homebrew/system paths); ErrUnsupported on missing lib
- OTel span on Embed (lore.embed.encode) with texts.count + dimensions attrs
- slog.Default() if no WithLogger; ErrInvalidArgument for empty inputs
- WordPiece tokenizer ported from guild (pure Go, bit-parity with HF BERT)
- Tests: Dimensions, Determinism, BatchSizes, EmptyInput, EmptyString, Close
- Unsupported platform stub returns ErrUnsupported cleanly
- go vet + gofmt clean; internal path refs removed from README
Adds a code example showing bge.New, Embed, Close, and ErrUnsupported fallback. Documents LORE_ONNXRUNTIME_LIB override and the WithLogger/ WithTracer options. Removes stale internal path reference.
e83bbd1 to
2ec5ad7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pkg/lore/embedwith theEmbedderinterface: batch-orientedEmbed(ctx, []string),Dimensions() int,Close(ctx) errorpkg/lore/embed/bgewith a reference implementation backed by the bundled int8-quantized BGE-small modelgo:embed; extracted toos.TempDir()at firstNew()so ONNX Runtime loads from diskLORE_ONNXRUNTIME_LIB);ErrUnsupportedon missing libEmbedcalls (lore.embed.encode) withtexts.count+dimensionsattributesErrUnsupportedon non-unix platforms (Windows etc.)Files
pkg/lore/embed/embed.go— interface + sentinel errorspkg/lore/embed/bge/bge.go— constructor + optionspkg/lore/embed/bge/bge_unix.go— unix inference pathpkg/lore/embed/bge/bge_unsupported.go— non-unix stubpkg/lore/embed/bge/probe_unix.go— library discoverypkg/lore/embed/bge/wordpiece.go— tokenizerpkg/lore/embed/bge/assets.go— go:embed directivespkg/lore/embed/bge/model/— int8 model.onnx + vocab.txtpkg/lore/embed/bge/bge_test.go— sanity testsTest plan
go build ./...passesgo vet ./...passesgofmt -l .no diffsgo test -race -count=1 ./...passes (tests skip gracefully when ONNX lib not installed on CI)