Skip to content

v0.3.0 — Multilingual embeddings, Nepali preprocessing, Ollama support

Latest

Choose a tag to compare

@irfanalidv irfanalidv released this 01 Apr 14:02
· 18 commits to main since this release
Immutable release. Only release title and notes can be modified.

Improves Nepali retrieval quality and adds fully local LLM support via Ollama.

What's new

  • Multilingual embeddings — switched to intfloat/multilingual-e5-small
    for better Nepali query understanding (was all-MiniLM-L6-v2)
  • Balanced retrieval weights — BM25 0.5 / vector 0.5 (was 0.6 / 0.4)
  • Nepali query preprocessing — NFC normalisation, question suffix
    stripping, whitespace collapse for cleaner BM25 keyword overlap
  • OllamaClient — fully offline answer generation via local Ollama
    (qwen2.5:7b recommended)
  • download_corpus() — fetch the 5-document seed corpus from GitHub
    without cloning the repo (works in Colab)
  • Cache invalidation — embedding cache clears automatically when
    embedding model changes
  • Live Colab demo — 4 validated use cases, Nepali + English,
    upload your own PDFs
  • 18 tests passing

Benchmark (5-document seed corpus via download_corpus())

Recall@1: 0.714 | Recall@3: 1.000 | Recall@5: 1.000
Keyword hit: 1.000 | Doc hit: 1.000 | Nepali recall@3: 1.000

Install

pip install nepal-gov-agent==0.3.0

Try it in Colab

Open In Colab

Built on

ragnav, ragfallback, agentensemble — all MIT licensed