Skip to content

feat(embeddings): add GGUF quantized model support#141

Merged
donhardman merged 4 commits intomasterfrom
feature/gguf
Mar 19, 2026
Merged

feat(embeddings): add GGUF quantized model support#141
donhardman merged 4 commits intomasterfrom
feature/gguf

Conversation

@donhardman
Copy link
Copy Markdown
Member

  • Add GGUF format detection and Q4_K_M quantization preference
  • Support Gemma and Llama architectures
  • Add quantized embedding model implementation

- Add GGUF format detection and Q4_K_M quantization preference
- Support Gemma and Llama architectures
- Add quantized embedding model implementation
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 14, 2026

CLA assistant check
All committers have signed the CLA.

@donhardman donhardman requested a review from sanikolaev March 14, 2026 12:09
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 14, 2026

Linux debug test results

  8 files    8 suites   13m 3s ⏱️
511 tests 487 ✅ 24 💤 0 ❌
525 runs  501 ✅ 24 💤 0 ❌

Results for commit c1019e9.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 14, 2026

Windows test results

  5 files    5 suites   18m 22s ⏱️
491 tests 473 ✅ 18 💤 0 ❌
499 runs  481 ✅ 18 💤 0 ❌

Results for commit c1019e9.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 14, 2026

Linux release test results

  8 files    8 suites   7m 2s ⏱️
511 tests 493 ✅ 18 💤 0 ❌
525 runs  507 ✅ 18 💤 0 ❌

Results for commit c1019e9.

♻️ This comment has been updated with latest results.

@sanikolaev
Copy link
Copy Markdown
Collaborator

The related discussion in TG is https://t.me/manticore_chat/7196/21133

- Support T5 model configurations
- Add comprehensive tests
- Enable access to gated models via optional hf_token parameter
- Update all test calls to include token parameter
- Add tests for token authentication
- Fix tensor indexing to maintain batch dimension in T5 embeddings
- Add integration tests for FRIDA and Google embeddinggemma models
- Update test formatting for better readability
@donhardman donhardman merged commit 16fb238 into master Mar 19, 2026
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants