Embedding model and Latin coverage: notes on paraphrase-multilingual-MiniLM-L12-v2 for NVBSE and VGCL #107

JohnRDOrazio · 2026-05-07T00:44:06Z

JohnRDOrazio
May 7, 2026
Maintainer

While provisioning the embedding deploy lane (#95) and starting to populate verse embeddings on staging, the question came up: does our chosen model handle Latin? We have two Latin Bible versions in the catalog — NVBSE (Nova Vulgata) and VGCL (Vulgata Clementina) — so this matters operationally.

TL;DR

The model paraphrase-multilingual-MiniLM-L12-v2 does not officially support Latin; it's not in the supported-languages list on the model card.
It will still produce 384-dim vectors for Latin input — they're not garbage, but the semantic signal is weaker than for the ~50 modern languages it was actually trained on.
We're populating embeddings for NVBSE and VGCL anyway as part of the staging-wide compute job, then we can measure quality empirically before deciding what to ship in production.

Why "not supported" doesn't mean "broken"

The model is a distilled sentence-encoder built on top of XLM-R. The base XLM-R was pre-trained on CommonCrawl-100, which does include some Latin. The distillation step that produced paraphrase-multilingual-MiniLM-L12-v2 then aligned the embedding space using parallel sentence pairs across ~50 modern languages — Latin not among them. So:

The encoder has rough lexical/syntactic notions of Latin.
The aligned semantic space was tuned for the modern-language pairs only.
Latin shares heavy lexical overlap with the Romance languages the model knows well (Italian, Spanish, Portuguese, French), so a lot of the encoder's intra-Romance clustering applies to Latin as a side-effect.

Net expectation:

Within-Latin similarity ("amor" ≈ "dilectio") will work tolerably.
Cross-lingual recall (English query → Latin verse) will be the weak spot.

Three concrete options

	Description	Cost	Trade-off
A. Embed them anyway, evaluate empirically	Let the current compute job populate NVBSE/VGCL embeddings; query known concepts in English/Italian and check if canonical Latin verses surface in top-N.	Zero extra work — already in flight.	Decision point shifts to: is recall acceptable for our users?
B. Skip embeddings for Latin versions, FTS only	`/search` (keyword via Postgres FTS) already works for these versions; just don't populate `embedding` for them, and have `/search/semantic` and `/search/similar` reject Latin versions or return a clear "embeddings unavailable for this version" message.	Small code change in handlers.	Loses semantic search capability for the two Latin versions, which serious Latin-Bible users may actually want.
C. Switch to a model that includes Latin	E.g. LaBSE (109 languages, 471 MB, 768-dim) handles classical languages somewhat better. Drop-in via sentence-transformers.	Significant cutover: re-embed every version, change the embedding column dimension from 384 → 768, retest.	Best long-term coverage, but a real migration cost.

Recommendation

Go with A for now. The empirical test costs nothing — the embeddings are being computed as part of the bigger run anyway. Once done, run a small evaluation:

# English query → look at top hits in NVBSE / VGCL
curl 'https://query.bibleget.io/dev/search/semantic?query=love+your+neighbor&version=NVBSE'
curl 'https://query.bibleget.io/dev/search/semantic?query=love+your+neighbor&version=VGCL'

# Latin query → similar test
curl 'https://query.bibleget.io/dev/search/semantic?query=diligite+inimicos+vestros&version=NVBSE'

Compare to the same queries against DRB/CEI2008 to gauge the relative quality. If NVBSE/VGCL recall is materially worse, fall back to B for those two versions specifically. C stays on the shelf as a future-improvement track if there's enough user demand for high-quality cross-lingual Latin search.

Why this is worth pinning

This question will recur every time someone asks "why don't I get good Latin results?" — and the answer is intentional, not a bug. Pinning the analysis here means the next person can read this, run the same evaluation curls, and decide whether the recall is acceptable for their use case. Also useful when we eventually evaluate replacement models — we'll want to A/B test against the LaBSE option.

Filed as part of the staging buildout (#95) but kept as its own discussion because future model decisions don't belong buried in a deploy thread.

JohnRDOrazio · 2026-05-07T04:31:25Z

JohnRDOrazio
May 7, 2026
Maintainer Author

Empirical Latin evaluation results

Now that all 8 versions on staging have full embeddings (259,998 verses total), here's the cross-lingual semantic search quality matrix the original post asked us to gather. Same query — "love your neighbor" — across each version, top-3 hits with similarity scores:

Modern languages (control)

DRB (English):

0.6963  Sir  27:18  Love thy neighbour, and be joined to him with fidelity.
0.6309  Mt   22:39  And the second is like to this: Thou shalt love thy neighbour as thyself.
0.6121  Mt   19:19  Honour thy father and thy mother: and, Thou shalt love thy neighbour as thyself.

NABRE (English):

0.7609  Mt   22:39  The second is like it: You shall love your neighbor as yourself.
0.7224  Lv   19:18  Take no revenge ... You shall love your neighbor as yourself.
0.6761  Rom  15:2   let each of us please our neighbor for the good, for building up.

CEI2008 (Italian):

0.7088  Sir  9:14   Per quanto puoi, mantieni buoni rapporti con i vicini ...
0.6252  Sir  4:7    Fatti amare dalla comunità ...
0.5985  Sir  13:15  Ogni vivente ama il suo simile e ogni uomo il suo vicino.

Excellent canonical recall in English, decent thematic recall via cross-lingual transfer to Italian.

Latin

NVBSE (Nova Vulgata):

(empty results)

Caveat: this is a separate, unrelated bug — NVBSE_idx is unpopulated in the source MariaDB, so the handler's join to that table drops everything before similarity matters. Filed separately as #108. Direct pgvector queries against NVBSE actually work fine — seeded from the NABRE Mt 22:39 embedding, the top 5 NVBSE matches are at similarity 0.57-0.59 (Sir 3:29 "Ne moliaris amico tuo malum" makes it into the top 1, which is thematically reasonable for "love your neighbor"). So NVBSE Latin embeddings have signal, just below the modern-language baseline.

VGCL (Vulgata Clementina):

0.5002  Psal  38:11  amove a me plagas tuas.
0.4730  1 Th  5:16   Semper gaudete.
0.4683  Iac   1:16   Nolite itaque errare, fratres mei dilectissimi.

These are wrong content and at materially lower similarity (0.46-0.50 vs the 0.60-0.76 we see for modern languages). Confirms the prediction in the original post: cross-lingual recall to classical Latin is the weak spot.

Confirmation of the predicted pattern

	Original-post prediction	Empirical
Within-Latin similarity	"tolerable"	~0.57-0.59 with thematic relevance ✓
Cross-lingual EN→Latin	"weak"	0.46-0.50, irrelevant content ✓

So the model behaves exactly as the model card would predict: it's not actively broken on Latin, the embeddings are valid 384-dim vectors with rough thematic clustering, but the aligned cross-lingual semantic space wasn't trained to include Latin so cross-lingual recall is poor.

Recommendation

Standing by option B from the original post for VGCL: keep the embeddings (they were free as part of the bigger run), but consider gating /search/semantic and /search/similar for VGCL behind FTS-only, or surface a clear "semantic search has limited recall for this version" note in the API response. NVBSE's empty results are #108 and unrelated to model quality — once that's fixed, NVBSE's recall will be in the 0.55-0.60 range, similar to VGCL.

Option C (switch to LaBSE for 109-language coverage) stays on the shelf as a future-improvement track. The cost is real — 768-dim embeddings means re-running the whole pipeline and altering the column types — but if user feedback says Latin search matters, it's the right destination.

0 replies

JohnRDOrazio · 2026-05-12T23:17:40Z

JohnRDOrazio
May 12, 2026
Maintainer Author

Empirical comparison: paraphrase-multilingual-MiniLM-L12-v2 vs LaBSE

Setup

Three Bible versions, one per language family: NABRE (English, 35,703 verses), CEI2008 (Italian, 35,668), NVBSE (Latin, 35,857).
All verses re-embedded under sentence-transformers/LaBSE (768-d) into a new embedding_labse column on each table; existing MiniLM (384-d) embedding column left untouched as baseline.
Eval: 7 well-known biblical concepts × 3 languages = 21 probes. For each, top-5 by pgvector cosine similarity (1 - (column <=> query::vector)) under both columns. Probe set lives at scripts/probe_queries.json on the feat/labse-ab-experiment branch.
Branch with the full scaffolding (migration 012, refactored compute_embeddings.py, evaluator): feat/labse-ab-experiment (commit 68c8063).

Headline — top-1 accuracy

Probe	EN (NABRE)	IT (CEI2008)	LA (NVBSE)
Genesis creation	both ✓	both ✓	both ✓
Love your neighbor	both ✓	both miss (return Matt 19:19, the parallel)	both ✓
Beatitudes — poor in spirit	both ✓	both ✓	both ✓
Last Supper	MiniLM fails badly, LaBSE ✓	MiniLM fails badly, LaBSE ✓	both ✓
Resurrection	both ✓	MiniLM partial (Luke 24:6 not Matt 28:6), LaBSE ✓	both ✓
Shepherd Psalm	both miss	both ✓	MiniLM fails, LaBSE ✓
Word made flesh	MiniLM ✓, LaBSE ✗	MiniLM ✓, LaBSE ✗	both miss (LaBSE has it at `#2`)

Score: LaBSE 15 / 21 vs MiniLM 14 / 21. Top-1 counts are nearly tied; the shape of the wins and losses is what matters.

Concrete examples

Last Supper, English — query "This is my body, which will be given for you; do this in memory of me":

MiniLM #1: Luke 4:7 "All this will be yours, if you worship me" — cosine 0.692. Surface-token match on this / yours / me / will.
LaBSE #1: 1 Cor 11:24 "…This is my body that is for you. Do this…" — cosine 0.655, with Luke 22:19 at #2.

Same shape in Italian: MiniLM returns Psalm 63:9; LaBSE returns 1 Cor 11:24 + Luke 22:19.

Shepherd Psalm, Latin — query "Dominus pascit me, et nihil mihi deerit":

MiniLM #1: Ps 94:17 (token overlap on Dominus … me). The actual Ps 23:1 is at #2.
LaBSE #1: Ps 23:1, correct.

Beatitudes top-K coherence in Latin — this was the original motivating concern from this discussion. Query "Beati pauperes spiritu, quoniam ipsorum est regnum caelorum":

MiniLM top-5 mixes in Gen 41:4 (devoraveruntque septem boves…) and Isa 28:6 (spiritus iudicii…) — unrelated verses pulled in on shallow token overlap.
LaBSE top-5: Matt 5:3, Matt 5:10, Ps 144:15 (beatus populus, cui Dominus est Deus), Ps 13:15 (a beatus psalm), Matt 5:9. All actual Beati X verses.

This is the kind of qualitative improvement the discussion was asking about — not just better top-1, but a thematically coherent top-K.

LaBSE regression — Word made flesh:

For "And the Word became flesh and dwelt among us" LaBSE returns 3 John 1:2 (propter veritatem quae permanet in nobis) — pattern-matching on dwell among us / permanere in nobis rather than the Incarnational meaning. MiniLM correctly returns John 1:14. Latin: LaBSE has John 1:14 at #2 (cosine 0.575), so it's close but not winning.

Score-scale differences

Model	Strong-match range	Weak-match floor
MiniLM	0.85–0.97	0.55–0.65
LaBSE	0.60–0.95	0.40–0.55

LaBSE compresses the high end. Any existing threshold= default on /v3/search/semantic would need recalibration before a cutover (today's 0.5 default would let LaBSE through more loose matches than MiniLM does at the same number).

Read

For Latin specifically — the issue this discussion was opened about — LaBSE is better in kind, not just degree. The improvement is most visible in top-K coherence (Beatitudes example) and in resisting surface-token traps (Shepherd Psalm Latin).

For English and Italian, the two models are roughly even on this probe set, with LaBSE winning the harder cases (Last Supper in both languages) and losing one (Word made flesh in both languages).

Suggested next steps

Expand the probe set — 7 probes × 3 languages is enough to suggest LaBSE is better; not enough to be confident the Word-made-flesh regression is isolated. Target ~20 probes, weighted toward theological / metaphorical language.
Recalibrate threshold default for LaBSE's compressed score range.
Resolve Pin SentenceTransformer model revision for reproducible builds #71 (commit-pinning) only once we've settled on which model wins — pinning to MiniLM now would just move the chains.

Full raw output

All 21 probes × top-5, side by side (click to expand)

Baseline:  paraphrase-multilingual-MiniLM-L12-v2 → column 'embedding'
Candidate: sentence-transformers/LaBSE → column 'embedding_labse'
Probes:    /app/scripts/probe_queries.json (7 concept(s))
Top-K:     5

Loading baseline model…

Loading candidate model…


========================================================================================
Concept: Genesis creation
========================================================================================

── NABRE ─────────────────────────────────────────────────────────────
Query: "In the beginning, God created the heavens and the earth"

  MiniLM  (embedding)
  ───────────────────
  0.969    1.1:1    In the beginning, when God created the heavens and the earth—
  0.819    1.2:4    This is the story of the heavens and the earth at their creation. When the L<sm>ORD</sm> …
  0.617    1.1:17   God set them in the dome of the sky, to illuminate the earth,
  0.616    1.2:1    Thus the heavens and the earth and all their array were completed.
  0.606   21.7:28   I beg you, child, to look at the heavens and the earth and see all that is in them; then …

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.922    1.1:1    In the beginning, when God created the heavens and the earth—
  0.611    1.1:17   God set them in the dome of the sky, to illuminate the earth,
  0.593   48.10:6    But from the beginning of creation, ‘God made them male and female.
  0.590    1.2:4    This is the story of the heavens and the earth at their creation. When the L<sm>ORD</sm> …
  0.553   65.1:10   and: <pof>“At the beginning, O Lord, you established the earth,</pof> <poi>and the heaven…

── CEI2008 ─────────────────────────────────────────────────────────────
Query: "In principio Dio creò il cielo e la terra"

  MiniLM  (embedding)
  ───────────────────
  0.961    1.1:1    In principio Dio creò il cielo e la terra.
  0.877    1.2:4    Queste sono le origini del cielo e della terra, quando vennero creati. Nel giorno in cui …
  0.799   29.45:18   Poiché così dice il Signore, che ha creato i cieli, egli, il Dio che ha plasmato e fatto …
  0.799   23.124:8    Il nostro aiuto è nel nome del Signore: egli ha fatto cielo e terra.
  0.798   23.121:2    Il mio aiuto viene dal Signore: egli ha fatto cielo e terra.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.966    1.1:1    In principio Dio creò il cielo e la terra.
  0.727   65.1:10   E ancora: In principio tu, Signore, hai fondato la terra e i cieli sono opera delle tue m…
  0.695    1.2:4    Queste sono le origini del cielo e della terra, quando vennero creati. Nel giorno in cui …
  0.687   23.102:26   In principio tu hai fondato la terra, i cieli sono opera delle tue mani.
  0.682   23.121:2    Il mio aiuto viene dal Signore: egli ha fatto cielo e terra.

── NVBSE ─────────────────────────────────────────────────────────────
Query: "In principio creavit Deus caelum et terram"

  MiniLM  (embedding)
  ───────────────────
  0.899    1.1:1    In principio creavit Deus cælum et terram.
  0.776    1.14:19   benedixit ei et ait: "Benedictus Abram a Deo excelso, qui creavit cælum et terram
  0.770    5.10:14   En Domini Dei tui cælum est et cælum cæli, terra et omnia, quæ in ea sunt;
  0.763   22.38:37   Quis recensebit nubes in sapientia, et utres cæli quis declinabit,
  0.756   23.96:11   Lætentur cæli, et exsultet terra, sonet mare et plenitudo eius;

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.948    1.1:1    In principio creavit Deus cælum et terram.
  0.694    1.2:4    Istæ sunt generationes cæli et terræ, quando creata sunt. In die quo fecit Dominus Deus t…
  0.681   23.121:2    Auxilium meum a Domino, qui fecit cælum et terram.
  0.672   23.115:15   Benedicti vos a Domino, qui fecit cælum et terram.
  0.662   28.17:1    Deus creavit de terra hominem et secundum imaginem suam fecit illum;

========================================================================================
Concept: Love your neighbor
========================================================================================

── NABRE ─────────────────────────────────────────────────────────────
Query: "Love your neighbor as yourself"

  MiniLM  (embedding)
  ───────────────────
  0.823   47.22:39   The second is like it: You shall love your neighbor as yourself.
  0.781    3.19:18   Take no revenge and cherish no grudge against your own people. You shall love your neighb…
  0.728   47.19:19   honor your father and your mother’; and ‘you shall love your neighbor as yourself.’”
  0.708   55.5:14   For the whole law is fulfilled in one statement, namely, “You shall love your neighbor as…
  0.637   52.15:2    let each of us please our neighbor for the good, for building up.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.563   47.22:39   The second is like it: You shall love your neighbor as yourself.
  0.520   47.19:19   honor your father and your mother’; and ‘you shall love your neighbor as yourself.’”
  0.487   28.31:15   <po>Recognize that your neighbor feels as you do,</po> <poi>and keep in mind everything y…
  0.474    5.5:21   You shall not covet your neighbor’s wife. You shall not desire your neighbor’s house or f…
  0.471   28.27:18   <po>For as one might kill another,</po> <poi>you have killed your neighbor’s friendship.<…

── CEI2008 ─────────────────────────────────────────────────────────────
Query: "Ama il tuo prossimo come te stesso"

  MiniLM  (embedding)
  ───────────────────
  0.771   47.19:19   onora il padre e la madre e amerai il prossimo tuo come te stesso ".
  0.718   50.15:17   Questo vi comando: che vi amiate gli uni gli altri.
  0.710   47.22:39   Il secondo poi è simile a quello: Amerai il tuo prossimo come te stesso.
  0.707    3.24:19   Se uno farà una lesione al suo prossimo, si farà a lui come egli ha fatto all'altro:
  0.707   28.4:7    Fatti amare dalla comunità* e davanti a un grande abbassa il capo.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.662   47.19:19   onora il padre e la madre e amerai il prossimo tuo come te stesso ".
  0.647   47.22:39   Il secondo poi è simile a quello: Amerai il tuo prossimo come te stesso.
  0.621   28.13:15   Ogni vivente ama il suo simile e ogni uomo il suo vicino.
  0.577   49.10:27   Costui rispose: " Amerai il Signore tuo Dio con tutto il tuo cuore, con tutta la tua anim…
  0.569    3.19:18   Non ti vendicherai e non serberai rancore contro i figli del tuo popolo, ma amerai il tuo…

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Diliges proximum tuum sicut te ipsum"

  MiniLM  (embedding)
  ───────────────────
  0.840   47.22:39   Secundum autem simile est huic: Diliges proximum tuum sicut teipsum.
  0.823   28.44:18   propter eum dimissum est reliquum terræ, cum factum est diluvium:
  0.805   47.19:19   honora patrem et matrem et diliges proximum tuum sicut teipsum ".
  0.788   48.12:31   Secundum est illud: "Diliges proximum tuum tamquam teipsum". Maius horum aliud mandatum n…
  0.772   23.7:13   Nonne iterum gladium suum exacuit, arcum suum tetendit et paravit illum?

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.645   47.22:39   Secundum autem simile est huic: Diliges proximum tuum sicut teipsum.
  0.602   28.13:19   Omne animal diligit simile sibi: sic et omnis homo proximum sibi.
  0.582   47.5:43   Audistis quia dictum est: "Diliges proximum tuum et odio habebis inimicum tuum".
  0.550    3.19:18   Non quæres ultionem nec irasceris civibus tuis. Diliges proximum tuum sicut teipsum. Ego …
  0.541   50.15:12   Hoc est præceptum meum, ut diligatis invicem, sicut dilexi vos;

========================================================================================
Concept: Sermon on the mount — poor in spirit
========================================================================================

── NABRE ─────────────────────────────────────────────────────────────
Query: "Blessed are the poor in spirit, for theirs is the kingdom of heaven"

  MiniLM  (embedding)
  ───────────────────
  0.784   47.5:3    <pof>“Blessed are the poor in spirit,</pof> <poi>for theirs is the kingdom of heaven.</po…
  0.722   52.14:17   For the kingdom of God is not a matter of food and drink, but of righteousness, peace, an…
  0.718   66.2:5    Listen, my beloved brothers. Did not God choose those who are poor in the world to be ric…
  0.682   56.2:8    For by grace you have been saved through faith, and this is not from you; it is the gift …
  0.674   47.5:10   <po>Blessed are they who are persecuted for the sake of righteousness,</po> <poil>for the…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.633   47.5:3    <pof>“Blessed are the poor in spirit,</pof> <poi>for theirs is the kingdom of heaven.</po…
  0.543   47.5:10   <po>Blessed are they who are persecuted for the sake of righteousness,</po> <poil>for the…
  0.490   47.5:5    <po> Blessed are the meek,</po> <poi>for they will inherit the land.</poi>
  0.471   47.13:38   the field is the world, the good seed the children of the kingdom. The weeds are the chil…
  0.471   47.3:2    [and] saying, “Repent, for the kingdom of heaven is at hand!”

── CEI2008 ─────────────────────────────────────────────────────────────
Query: "Beati i poveri in spirito, perché di essi è il regno dei cieli"

  MiniLM  (embedding)
  ───────────────────
  0.945   47.5:3    "Beati i poveri in spirito, perché di essi è il regno dei cieli.
  0.775    9.2:8    Solleva dalla polvere il debole, dall'immondizia rialza il povero, per farli sedere con i…
  0.774   30.20:13   Cantate inni al Signore, lodate il Signore, perché ha liberato la vita del povero dalle m…
  0.755   32.2:18   ma l'anima colma di afflizione, chi cammina curvo e spossato, e gli occhi languenti e l'a…
  0.749   23.140:9    Non soddisfare, Signore, i desideri dei malvagi, non favorire le loro trame. Alzano

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.930   47.5:3    "Beati i poveri in spirito, perché di essi è il regno dei cieli.
  0.776   47.5:10   Beati i perseguitati per la giustizia, perché di essi è il regno dei cieli.
  0.594   47.5:9    Beati gli operatori di pace, perché saranno chiamati figli di Dio.
  0.586   47.5:5    Beati i miti, perché avranno in eredità la terra.
  0.584   30.50:38   Spada sulle sue acque: si prosciughino! Perché essa è una terra di idoli; vanno pazzi per…

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Beati pauperes spiritu, quoniam ipsorum est regnum caelorum"

  MiniLM  (embedding)
  ───────────────────
  0.947   47.5:3    " Beati pauperes spiritu, quoniam ipsorum est regnum cælorum.
  0.789   47.5:10   Beati, qui persecutionem patiuntur propter iustitiam, quoniam ipsorum est regnum cælorum.
  0.783   29.28:6    et spiritus iudicii sedenti ad iudicandum et fortitudo vertentibus prœlium usque ad porta…
  0.781   65.3:7    Quapropter, sicut dicit Spiritus Sanctus: " Hodie, si vocem eius audieritis,
  0.778    1.41:4    devoraveruntque septem boves pulchras et crassas. Expergefactus pharao

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.840   47.5:3    " Beati pauperes spiritu, quoniam ipsorum est regnum cælorum.
  0.617   47.5:10   Beati, qui persecutionem patiuntur propter iustitiam, quoniam ipsorum est regnum cælorum.
  0.554   23.144:15   Beatus populus, cui hæc sunt; beatus populus, cui Dominus est Deus.
  0.541   17.13:15   Anima mea, benedic Domino, regi magno,
  0.541   47.5:9    Beati pacifici, quoniam filii Dei vocabuntur.

========================================================================================
Concept: Last Supper — institution of the Eucharist
========================================================================================

── NABRE ─────────────────────────────────────────────────────────────
Query: "This is my body, which will be given for you; do this in memory of me"

  MiniLM  (embedding)
  ───────────────────
  0.692   49.4:7    All this will be yours, if you worship me.”
  0.668   33.36:26   I will give you a new heart, and a new spirit I will put within you. I will remove the he…
  0.660   33.37:14   I will put my spirit in you that you may come to life, and I will settle you in your land…
  0.642   50.15:11   “I have told you this so that my joy may be in you and your joy may be complete.
  0.642   50.16:15   Everything that the Father has is mine; for this reason I told you that he will take from…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.655   53.11:24   and, after he had given thanks, broke it and said, “This is my body that is for you. Do t…
  0.629   49.22:19   Then he took the bread, said the blessing, broke it, and gave it to them, saying, “This i…
  0.540    2.22:26   for this is his only covering; it is the cloak for his body. What will he sleep in? If he…
  0.530   50.16:14   He will glorify me, because he will take from what is mine and declare it to you.
  0.528   50.16:15   Everything that the Father has is mine; for this reason I told you that he will take from…

── CEI2008 ─────────────────────────────────────────────────────────────
Query: "Questo è il mio corpo, che è dato per voi; fate questo in memoria di me"

  MiniLM  (embedding)
  ───────────────────
  0.714   23.63:9    A te si stringe l'anima mia: la tua destra mi sostiene.
  0.698   31.3:20   Ben se ne ricorda la mia anima e si accascia dentro di me.
  0.694   30.26:14   Quanto a me, eccomi in mano vostra, fate di me come vi sembra bene e giusto;
  0.686   47.26:12   Versando questo profumo sul mio corpo, lei lo ha fatto in vista della mia sepoltura.
  0.677   23.119:159  Vedi che io amo i tuoi precetti: Signore, secondo il tuo amore dammi vita.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.789   53.11:24   e, dopo aver reso grazie, lo spezzò e disse: "Questo è il mio corpo, che è per voi; fate …
  0.748   49.22:19   Poi prese il pane, rese grazie, lo spezzò e lo diede loro dicendo: "Questo è il mio corpo…
  0.642   50.17:10   Tutte le cose mie sono tue, e le tue sono mie, e io sono glorificato in loro.
  0.629   23.16:5    Il Signore è mia parte di eredità e mio calice: nelle tue mani è la mia vita.
  0.615   23.40:9    di fare la tua volontà: mio Dio, questo io desidero; la tua legge è nel mio intimo".

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Hoc est corpus meum, quod pro vobis datur; hoc facite in meam commemorationem"

  MiniLM  (embedding)
  ───────────────────
  0.931   53.11:24   et gratias agens fregit et dixit: " Hoc est corpus meum, quod pro vobis est; hoc facite i…
  0.856   49.22:19   Et accepto pane, gratias egit et fregit et dedit eis dicens: " Hoc est corpus meum, quod …
  0.801    1.17:10   Hoc est pactum meum, quod observabitis, inter me et vos et semen tuum post te. Circumcide…
  0.787   50.15:12   Hoc est præceptum meum, ut diligatis invicem, sicut dilexi vos;
  0.776   28.18:24   Memento iræ in die consummationis et, suo tempore, retributionis in conversione faciei.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.805   53.11:24   et gratias agens fregit et dixit: " Hoc est corpus meum, quod pro vobis est; hoc facite i…
  0.706   49.22:19   Et accepto pane, gratias egit et fregit et dedit eis dicens: " Hoc est corpus meum, quod …
  0.679   64.1:12   quem remisi tibi: eum, hoc est viscera mea;
  0.621   23.116:7    Convertere, anima mea, in requiem tuam, quia Dominus benefecit tibi;
  0.618    1.23:4    "Advena sum et inquilinus apud vos; date mihi possessionem sepulcri vobiscum, ut sepeliam…

========================================================================================
Concept: Resurrection — empty tomb
========================================================================================

── NABRE ─────────────────────────────────────────────────────────────
Query: "He is not here, for he has been raised just as he said"

  MiniLM  (embedding)
  ───────────────────
  0.881   47.28:6    He is not here, for he has been raised just as he said. Come and see the place where he l…
  0.738   49.24:6    He is not here, but he has been raised. Remember what he said to you while he was still i…
  0.619   51.13:30   But God raised him from the dead,
  0.597   48.16:6    He said to them, “Do not be amazed! You seek Jesus of Nazareth, the crucified. He has bee…
  0.580   49.21:8    He answered, “See that you not be deceived, for many will come in my name, saying, ‘I am …

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.830   47.28:6    He is not here, for he has been raised just as he said. Come and see the place where he l…
  0.615   49.24:6    He is not here, but he has been raised. Remember what he said to you while he was still i…
  0.471   50.11:51   He did not say this on his own, but since he was high priest for that year, he prophesied…
  0.463   50.1:8    He was not the light, but came to testify to the light.
  0.460   54.8:17   for he not only welcomed our appeal but, since he is very concerned, he has gone to you o…

── CEI2008 ─────────────────────────────────────────────────────────────
Query: "Non è qui, è risorto, come aveva detto"

  MiniLM  (embedding)
  ───────────────────
  0.932   49.24:6    Non è qui, è risorto. Ricordatevi come vi parlò quando era ancora in Galilea
  0.833   47.28:6    Non è qui. È risorto, infatti, come aveva detto; venite, guardate il luogo dove era stato…
  0.724   23.37:10   Ancora un poco e il malvagio scompare: cerchi il suo posto, ma lui non c'è più.
  0.713   10.12:23   Ma ora egli è morto: perché digiunare? Potrei forse farlo ritornare? Andrò io da lui, ma …
  0.703   50.7:36   Che discorso è quello che ha fatto: "Voi mi cercherete e non mi troverete", e: "Dove sono…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.783   47.28:6    Non è qui. È risorto, infatti, come aveva detto; venite, guardate il luogo dove era stato…
  0.695   49.24:6    Non è qui, è risorto. Ricordatevi come vi parlò quando era ancora in Galilea
  0.523    1.35:13   Dio disparve da lui, dal luogo dove gli aveva parlato.
  0.456   50.11:30   Gesù non era entrato nel villaggio, ma si trovava ancora là dove Marta gli era andata inc…
  0.453   52.4:23   E non soltanto per lui è stato scritto che gli fu accreditato ,

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Non est hic, surrexit enim sicut dixit"

  MiniLM  (embedding)
  ───────────────────
  0.874   47.28:6    Non est hic: surrexit enim, sicut dixit. Venite, videte locum, ubi positus erat.
  0.831   25.4:10   quia si unus ceciderit, ab altero fulcietur. Væ soli! Cum ceciderit, non habet sublevante…
  0.829   22.34:31   Si enim dixit quispiam Deo: "Ferre debui! Iam non perverse agam.
  0.825   25.2:25   Quis enim comedet et deliciis affluet sine eo?
  0.824   65.13:5    Sint mores sine avaritia; contenti præsentibus. Ipse enim dixit: " Non te deseram neque d…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.766   47.28:6    Non est hic: surrexit enim, sicut dixit. Venite, videte locum, ubi positus erat.
  0.617   49.24:6    Non est hic, sed surrexit. Recordamini qualiter locutus est vobis, cum adhuc in Galilæa e…
  0.578   52.4:23   Non est autem scriptum tantum propter ipsum: reputatum est illi,
  0.484   49.22:57   " Et hic cum illo erat! ". At ille negavit eum dicens:
  0.479    1.26:2    Apparuitque ei Dominus et ait: "Ne descendas in Aegyptum, sed habita in terra, quam dixer…

========================================================================================
Concept: Shepherd Psalm
========================================================================================

── NABRE ─────────────────────────────────────────────────────────────
Query: "The Lord is my shepherd, I shall not want"

  MiniLM  (embedding)
  ───────────────────
  0.718   33.34:10   Thus says the Lord G<sm>OD</sm>: Look! I am coming against these shepherds. I will take m…
  0.668    2.22:30   You shall be a people sacred to me. Flesh torn to pieces in the field you shall not eat; …
  0.660   47.23:39   I tell you, you will not see me again until you say, ‘Blessed is he who comes in the name…
  0.654   45.11:16   For I am raising up a shepherd in the land who will take no note of those that disappear,…
  0.653   50.10:11   I am the good shepherd. A good shepherd lays down his life for the sheep.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.502   23.62:7    <po>God alone is my rock and my salvation,</po> <poi>my fortress; I shall not fall.</poi>
  0.497   23.62:3    <po>God alone is my rock and salvation,</po> <poi>my fortress; I shall never fall.</poi>
  0.493   56.6:7    willingly serving the Lord and not human beings,
  0.485   30.32:38   They shall be my people, and I will be their God.
  0.468   23.71:12   <po>God, be not far from me;</po> <poi>my God, hasten to help me.</poi>

── CEI2008 ─────────────────────────────────────────────────────────────
Query: "Il Signore è il mio pastore, non manco di nulla"

  MiniLM  (embedding)
  ───────────────────
  0.961   23.23:1    Salmo. Di Davide. Il Signore è il mio pastore: non manco di nulla.
  0.850   23.146:1    Alleluia. Loda il Signore, anima mia:
  0.803   23.25:1    Di Davide. A te, Signore, innalzo l'anima mia,
  0.799   23.103:1    Di Davide. Benedici il Signore, anima mia, quanto è in me benedica il suo santo nome.
  0.791   49.1:47   e il mio spirito esulta in Dio, mio salvatore,

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.808   23.23:1    Salmo. Di Davide. Il Signore è il mio pastore: non manco di nulla.
  0.739   29.43:11   Io, io sono il Signore, fuori di me non c'è salvatore.
  0.671   29.45:5    Io sono il Signore e non c'è alcun altro, fuori di me non c'è dio; ti renderò pronto all'…
  0.666   23.118:7    Il Signore è per me, è il mio aiuto, e io guarderò dall'alto i miei nemici.
  0.659   23.94:22   Ma il Signore è il mio baluardo, roccia del mio rifugio è il mio Dio.

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Dominus pascit me, et nihil mihi deerit"

  MiniLM  (embedding)
  ───────────────────
  0.844   23.94:17   Nisi quia Dominus adiuvit me, paulo minus habitasset in loco silentii anima mea.
  0.823   23.23:1    PSALMUS. David. Dominus pascit me, et nihil mihi deerit:
  0.814   29.8:5    Et adiecit Dominus loqui ad me adhuc dicens:
  0.808   46.3:13   Invaluerunt super me verba vestra, dicit Dominus;
  0.803   23.119:57   HETH. Portio mea Dominus: dixi custodire verba tua.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.767   23.23:1    PSALMUS. David. Dominus pascit me, et nihil mihi deerit:
  0.677   23.118:18   Castigans castigavit me Dominus et morti non tradidit me.
  0.662   29.45:5    Ego Dominus, et non est amplius: extra me non est Deus. Accinxi te, et non cognovisti me,
  0.655   29.43:11   Ego, ego sum Dominus, et non est absque me salvator.
  0.627   29.50:5    Dominus Deus aperuit mihi aurem; ego autem non rebellavi, retrorsum non abii.

========================================================================================
Concept: Word made flesh
========================================================================================

── NABRE ─────────────────────────────────────────────────────────────
Query: "And the Word became flesh and dwelt among us"

  MiniLM  (embedding)
  ───────────────────
  0.643   50.1:14   <pof>And the Word became flesh</pof> <poi>and made his dwelling among us,</poi> <poi>and …
  0.605   27.14:21   <po>And this became a snare for the world,</po> <poi>that people enslaved to either grief…
  0.577    4.11:33   But while the meat was still between their teeth, before it could be chewed, the L<sm>ORD…
  0.569    1.6:5    When the L<sm>ORD</sm> saw how great the wickedness of human beings was on earth, and how…
  0.555   56.2:3    All of us once lived among them in the desires of our flesh, following the wishes of the …

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.469   70.1:2    because of the truth that dwells in us and will be with us forever.
  0.429   33.2:2    As he spoke to me, the spirit entered into me and set me on my feet, and I heard the one …
  0.422   51.13:49   and the word of the Lord continued to spread through the whole region.
  0.421   33.3:24   but the spirit entered into me, set me on my feet; he spoke to me, and said: Go, shut you…
  0.412    1.2:7    then the L<sm>ORD</sm> God formed the man out of the dust of the ground and blew into his…

── CEI2008 ─────────────────────────────────────────────────────────────
Query: "E il Verbo si fece carne e venne ad abitare in mezzo a noi"

  MiniLM  (embedding)
  ───────────────────
  0.782   50.1:14   E il Verbo si fece carne e venne ad abitare in mezzo a noi; e noi abbiamo contemplato la …
  0.699   51.17:28   In lui infatti viviamo, ci muoviamo ed esistiamo, come hanno detto anche alcuni dei vostr…
  0.690   23.44:12   Ci hai consegnati come pecore da macello, ci hai dispersi in mezzo alle genti.
  0.665   23.66:11   Ci hai fatto cadere in un agguato, hai stretto i nostri fianchi in una morsa.
  0.664   23.124:3    allora ci avrebbero inghiottiti vivi, quando divampò contro di noi la loro collera.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.505   33.3:24   Allora uno spirito entrò in me e mi fece alzare in piedi. Egli mi disse: "Va' e chiuditi …
  0.504   48.1:12   E subito lo Spirito lo sospinse nel deserto
  0.501   31.3:20   Ben se ne ricorda la mia anima e si accascia dentro di me.
  0.494   22.2:8    Giobbe prese un coccio per grattarsi e stava seduto in mezzo alla cenere.
  0.487   47.4:11   Allora il diavolo lo lasciò, ed ecco, degli angeli gli si avvicinarono e lo servivano.

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Et Verbum caro factum est et habitavit in nobis"

  MiniLM  (embedding)
  ───────────────────
  0.853   23.5:5    Quoniam non Deus volens iniquitatem tu es; neque habitabit iuxta te malignus,
  0.849   32.6:51   Cui ergo non notum est quod non sunt dii?
  0.841   50.11:10   si quis autem ambulaverit in nocte, offendit, quia lux non est in eo ".
  0.838   33.31:1    Et factum est in anno unde cimo, in tertio, una mensis, factum est verbum Domini ad me di…
  0.838   33.26:1    Et factum est in undecimo anno, prima mensis, factus est sermo Domini ad me dicens:

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.578   70.1:2    propter veritatem, quæ permanet in nobis et nobiscum erit in sempiternum.
  0.575   50.1:14   Et Verbum caro factum est et habitavit in nobis; et vidimus gloriam eius, gloriam quasi U…
  0.549   62.1:14   bonum depositum custodi per Spiritum Sanctum, qui habitat in nobis.
  0.547   29.40:8    Exsiccatum est fenum, et cecidit flos; verbum autem Dei nostri manet in æternum.
  0.524   10.23:2    Spiritus Domini locutus est per me, et sermo eius super linguam meam.

0 replies

JohnRDOrazio · 2026-05-12T23:34:10Z

JohnRDOrazio
May 12, 2026
Maintainer Author

Follow-up: expanded to 20 probes (60 total queries)

The original probe set was 7 concepts × 3 languages = 21 queries — enough to be suggestive, not enough to be decisive. Particularly, the LaBSE regression on "Word made flesh" (both English and Italian) was a single data point. So the probe set was expanded to 20 concepts × 3 languages = 60 queries, weighted toward theological / metaphorical / liturgical language where the first run hinted at LaBSE weakness.

Probe set lives in scripts/probe_queries.json on feat/labse-ab-experiment (still local-only). New concepts added: Bread of life, Way / truth / life, True vine, Light of the world, Lamb of God, Fear of the Lord, Vanity of vanities, Our Father, Magnificat, Faith / hope / love, All things work for good, A time for everything, Damascus road.

Headline — per-language top-1 (across all 20 concepts)

Language	MiniLM	LaBSE	Δ
English (NABRE)	18 / 20	17 / 20	LaBSE −1
Italian (CEI2008)	14.5 / 20 (Resurrection IT partial)	17 / 20	LaBSE +2.5
Latin (NVBSE)	10 / 20	17 / 20	LaBSE +7
Total	42.5 / 60	51 / 60	LaBSE +8.5

The original 7-probe result (15 / 21 vs 14 / 21) was nearly tied. With 20, LaBSE is clearly ahead, driven entirely by Latin.

The Latin story is now unambiguous

Of the 13 newly-added probes, MiniLM Latin gets 5 / 13 right; LaBSE Latin gets 12 / 13. The MiniLM misses are systematic — surface-token noise drags in unrelated verses:

Probe (Latin query)	MiniLM `#1` (wrong)	LaBSE `#1` (correct)
`Ego sum panis vitae…`	Ps 25:16 "Respice in me et miserere mei" (token: `me`)	John 6:35
`Ego sum via et veritas…`	John 8:15 "Vos secundum carnem iudicatis…" (token: `ego`)	John 14:6
`Ecce agnus Dei…`	1 Cor 1:29 "ut non glorietur omnis caro" (token: `omnis`)	John 1:29
`Principium sapientiae…`	Prov 3:13 "Beatus homo qui invenit sapientiam"	Prov 9:10
`Pater noster…`	Ezek 36:4 (token: `sanctificatus`)	Matt 6:9
`Diligentibus Deum omnia…`	1 Thess 5:21 "omnia probate" (token: `omnia`)	Rom 8:28
`Omnia tempus habent…`	Deut 34:2 "et omnem terram" (token: `omnia / omnem`)	Eccl 3:1
`Saule, Saule, quid me…`	Acts 22:7 ✓ (only Latin MiniLM hit)	Acts 9:4

This is exactly the failure mode this discussion was opened about. The LaBSE column doesn't just rank the right verse first — it consistently provides a thematically coherent top-5, even when the query is terse.

LaBSE regressions on the English/Italian side

Three new English regressions, plus the original Word made flesh from the first run. All on highly recognizable verses:

Probe (English query)	MiniLM `#1` (correct)	LaBSE `#1` (wrong)
`Behold, the Lamb of God…`	John 1:29 ✓	Ps 82:8 "Arise, O God, judge the earth"
`Our Father in heaven…`	Matt 6:10 ✓	2 Sam 7:16 "Your house and your kingdom" (token: `kingdom`)
`My soul proclaims the greatness…` (Italian)	Luke 1:47 ✓	Ps 84:3 "L'anima mia anela…" (close paraphrase but not Magnificat)
`And the Word became flesh…` (from prior run)	John 1:14 ✓	3 John 1:2 "propter veritatem quae permanet in nobis"

Pattern: the regressions cluster on liturgical / Christological metaphor. LaBSE may have weaker representation of these in its training. None of them push the correct verse out of the top-5 entirely — usually it's there at #2 or #3 — but the #1 slot goes to a thematically-adjacent but technically-wrong verse.

Score-scale shift, refined

With more probes, the score distribution gap is clearer:

	MiniLM (`#1` hit)	LaBSE (`#1` hit)
Min	0.692	0.445
Median	0.879	0.717
Max	0.995	0.984

LaBSE genuinely uses more of [0, 1]. The current threshold=0.5 default on /v3/search/semantic would admit significantly more weak matches under LaBSE than under MiniLM — recalibration before cutover is non-optional.

Decision-strength read

The Latin numbers more than cancel the English regression, and Italian is comfortably up. Most importantly, the original concern from this discussion (Latin token-noise hits) is unambiguously fixed by LaBSE.

Two caveats remain before flipping the switch on production:

The liturgical English regressions (Lamb of God, Our Father, Magnificat IT, Word made flesh) should get a closer look — is this consistently weaker LaBSE handling of theological metaphor, or noise? If it's the former, a hybrid or fallback approach may matter for users searching the Gospels in English.
Threshold recalibration on /v3/search/semantic is required for the LaBSE score distribution. The 0.5 default likely needs to drop to something like 0.35 to retain similar precision.

#71 (commit-pinning) gets unblocked by this analysis: once the cutover model is settled, pin its commit hash.

Raw output (Latin probes only — full set is 90KB, exceeds GitHub comment cap)

Latin probes side-by-side (click to expand)

========================================================================================
Concept: Genesis creation
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "In principio creavit Deus caelum et terram"

  MiniLM  (embedding)
  ───────────────────
  0.899    1.1:1    In principio creavit Deus cælum et terram.
  0.776    1.14:19   benedixit ei et ait: "Benedictus Abram a Deo excelso, qui creavit cælum et terram
  0.770    5.10:14   En Domini Dei tui cælum est et cælum cæli, terra et omnia, quæ in ea sunt;
  0.763   22.38:37   Quis recensebit nubes in sapientia, et utres cæli quis declinabit,
  0.756   23.96:11   Lætentur cæli, et exsultet terra, sonet mare et plenitudo eius;

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.948    1.1:1    In principio creavit Deus cælum et terram.
  0.694    1.2:4    Istæ sunt generationes cæli et terræ, quando creata sunt. In die quo fecit Dominus Deus t…
  0.681   23.121:2    Auxilium meum a Domino, qui fecit cælum et terram.
  0.672   23.115:15   Benedicti vos a Domino, qui fecit cælum et terram.
  0.662   28.17:1    Deus creavit de terra hominem et secundum imaginem suam fecit illum;

========================================================================================
Concept: Love your neighbor
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Diliges proximum tuum sicut te ipsum"

  MiniLM  (embedding)
  ───────────────────
  0.840   47.22:39   Secundum autem simile est huic: Diliges proximum tuum sicut teipsum.
  0.823   28.44:18   propter eum dimissum est reliquum terræ, cum factum est diluvium:
  0.805   47.19:19   honora patrem et matrem et diliges proximum tuum sicut teipsum ".
  0.788   48.12:31   Secundum est illud: "Diliges proximum tuum tamquam teipsum". Maius horum aliud mandatum n…
  0.772   23.7:13   Nonne iterum gladium suum exacuit, arcum suum tetendit et paravit illum?

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.645   47.22:39   Secundum autem simile est huic: Diliges proximum tuum sicut teipsum.
  0.602   28.13:19   Omne animal diligit simile sibi: sic et omnis homo proximum sibi.
  0.582   47.5:43   Audistis quia dictum est: "Diliges proximum tuum et odio habebis inimicum tuum".
  0.550    3.19:18   Non quæres ultionem nec irasceris civibus tuis. Diliges proximum tuum sicut teipsum. Ego …
  0.541   50.15:12   Hoc est præceptum meum, ut diligatis invicem, sicut dilexi vos;

========================================================================================
Concept: Sermon on the mount — poor in spirit
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Beati pauperes spiritu, quoniam ipsorum est regnum caelorum"

  MiniLM  (embedding)
  ───────────────────
  0.947   47.5:3    " Beati pauperes spiritu, quoniam ipsorum est regnum cælorum.
  0.789   47.5:10   Beati, qui persecutionem patiuntur propter iustitiam, quoniam ipsorum est regnum cælorum.
  0.783   29.28:6    et spiritus iudicii sedenti ad iudicandum et fortitudo vertentibus prœlium usque ad porta…
  0.781   65.3:7    Quapropter, sicut dicit Spiritus Sanctus: " Hodie, si vocem eius audieritis,
  0.778    1.41:4    devoraveruntque septem boves pulchras et crassas. Expergefactus pharao

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.840   47.5:3    " Beati pauperes spiritu, quoniam ipsorum est regnum cælorum.
  0.617   47.5:10   Beati, qui persecutionem patiuntur propter iustitiam, quoniam ipsorum est regnum cælorum.
  0.554   23.144:15   Beatus populus, cui hæc sunt; beatus populus, cui Dominus est Deus.
  0.541   17.13:15   Anima mea, benedic Domino, regi magno,
  0.541   47.5:9    Beati pacifici, quoniam filii Dei vocabuntur.

========================================================================================
Concept: Last Supper — institution of the Eucharist
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Hoc est corpus meum, quod pro vobis datur; hoc facite in meam commemorationem"

  MiniLM  (embedding)
  ───────────────────
  0.931   53.11:24   et gratias agens fregit et dixit: " Hoc est corpus meum, quod pro vobis est; hoc facite i…
  0.856   49.22:19   Et accepto pane, gratias egit et fregit et dedit eis dicens: " Hoc est corpus meum, quod …
  0.801    1.17:10   Hoc est pactum meum, quod observabitis, inter me et vos et semen tuum post te. Circumcide…
  0.787   50.15:12   Hoc est præceptum meum, ut diligatis invicem, sicut dilexi vos;
  0.776   28.18:24   Memento iræ in die consummationis et, suo tempore, retributionis in conversione faciei.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.805   53.11:24   et gratias agens fregit et dixit: " Hoc est corpus meum, quod pro vobis est; hoc facite i…
  0.706   49.22:19   Et accepto pane, gratias egit et fregit et dedit eis dicens: " Hoc est corpus meum, quod …
  0.679   64.1:12   quem remisi tibi: eum, hoc est viscera mea;
  0.621   23.116:7    Convertere, anima mea, in requiem tuam, quia Dominus benefecit tibi;
  0.618    1.23:4    "Advena sum et inquilinus apud vos; date mihi possessionem sepulcri vobiscum, ut sepeliam…

========================================================================================
Concept: Resurrection — empty tomb
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Non est hic, surrexit enim sicut dixit"

  MiniLM  (embedding)
  ───────────────────
  0.874   47.28:6    Non est hic: surrexit enim, sicut dixit. Venite, videte locum, ubi positus erat.
  0.831   25.4:10   quia si unus ceciderit, ab altero fulcietur. Væ soli! Cum ceciderit, non habet sublevante…
  0.829   22.34:31   Si enim dixit quispiam Deo: "Ferre debui! Iam non perverse agam.
  0.825   25.2:25   Quis enim comedet et deliciis affluet sine eo?
  0.824   65.13:5    Sint mores sine avaritia; contenti præsentibus. Ipse enim dixit: " Non te deseram neque d…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.766   47.28:6    Non est hic: surrexit enim, sicut dixit. Venite, videte locum, ubi positus erat.
  0.617   49.24:6    Non est hic, sed surrexit. Recordamini qualiter locutus est vobis, cum adhuc in Galilæa e…
  0.578   52.4:23   Non est autem scriptum tantum propter ipsum: reputatum est illi,
  0.484   49.22:57   " Et hic cum illo erat! ". At ille negavit eum dicens:
  0.479    1.26:2    Apparuitque ei Dominus et ait: "Ne descendas in Aegyptum, sed habita in terra, quam dixer…

========================================================================================
Concept: Shepherd Psalm
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Dominus pascit me, et nihil mihi deerit"

  MiniLM  (embedding)
  ───────────────────
  0.844   23.94:17   Nisi quia Dominus adiuvit me, paulo minus habitasset in loco silentii anima mea.
  0.823   23.23:1    PSALMUS. David. Dominus pascit me, et nihil mihi deerit:
  0.814   29.8:5    Et adiecit Dominus loqui ad me adhuc dicens:
  0.808   46.3:13   Invaluerunt super me verba vestra, dicit Dominus;
  0.803   23.119:57   HETH. Portio mea Dominus: dixi custodire verba tua.

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.767   23.23:1    PSALMUS. David. Dominus pascit me, et nihil mihi deerit:
  0.677   23.118:18   Castigans castigavit me Dominus et morti non tradidit me.
  0.662   29.45:5    Ego Dominus, et non est amplius: extra me non est Deus. Accinxi te, et non cognovisti me,
  0.655   29.43:11   Ego, ego sum Dominus, et non est absque me salvator.
  0.627   29.50:5    Dominus Deus aperuit mihi aurem; ego autem non rebellavi, retrorsum non abii.

========================================================================================
Concept: Word made flesh
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Et Verbum caro factum est et habitavit in nobis"

  MiniLM  (embedding)
  ───────────────────
  0.853   23.5:5    Quoniam non Deus volens iniquitatem tu es; neque habitabit iuxta te malignus,
  0.849   32.6:51   Cui ergo non notum est quod non sunt dii?
  0.841   50.11:10   si quis autem ambulaverit in nocte, offendit, quia lux non est in eo ".
  0.838   33.31:1    Et factum est in anno unde cimo, in tertio, una mensis, factum est verbum Domini ad me di…
  0.838   33.26:1    Et factum est in undecimo anno, prima mensis, factus est sermo Domini ad me dicens:

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.578   70.1:2    propter veritatem, quæ permanet in nobis et nobiscum erit in sempiternum.
  0.575   50.1:14   Et Verbum caro factum est et habitavit in nobis; et vidimus gloriam eius, gloriam quasi U…
  0.549   62.1:14   bonum depositum custodi per Spiritum Sanctum, qui habitat in nobis.
  0.547   29.40:8    Exsiccatum est fenum, et cecidit flos; verbum autem Dei nostri manet in æternum.
  0.524   10.23:2    Spiritus Domini locutus est per me, et sermo eius super linguam meam.

========================================================================================
Concept: Bread of life
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Ego sum panis vitae; qui veniet ad me non esuriet"

  MiniLM  (embedding)
  ───────────────────
  0.841   23.25:16   PHE. Respice in me et miserere mei, quia unicus et pauper sum ego.
  0.837   50.6:48   Ego sum panis vitæ.
  0.834   23.39:11   Amove a me plagas tuas: ab ictu manus tuæ ego defeci.
  0.833   23.26:11   Ego autem in innocentia mea ingressus sum; redime me et miserere mei.
  0.832   51.20:29   Ego scio quoniam intrabunt post discessionem meam lupi graves in vos non parcentes gregi;

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.592   50.6:35   Dixit eis Iesus: " Ego sum panis vitæ. Qui venit ad me, non esuriet; et, qui credit in me…
  0.553   22.7:16   Desperavi; nequaquam ultra iam vivam. Parce mihi, nihil enim sunt dies mei.
  0.546   50.11:25   Dixit ei Iesus: " Ego sum resurrectio et vita. Qui credit in me, etsi mortuus fuerit, viv…
  0.541   52.7:10   ego autem mortuus sum; et inventum est mihi mandatum, quod erat ad vitam, hoc esse ad mor…
  0.538   50.12:46   Ego lux in mundum veni, ut omnis, qui credit in me, in tenebris non maneat.

========================================================================================
Concept: Way, truth, life
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Ego sum via et veritas et vita; nemo venit ad Patrem nisi per me"

  MiniLM  (embedding)
  ───────────────────
  0.885   50.8:15   Vos secundum carnem iudicatis, ego non iudico quemquam.
  0.878   50.14:6    Dicit ei Iesus: " Ego sum via et veritas et vita; nemo venit ad Patrem nisi per me.
  0.877   50.8:50   Ego autem non quæro gloriam meam; est qui quærit et iudicat.
  0.871   50.17:16   De mundo non sunt, sicut ego non sum de mundo.
  0.871   29.43:25   Ego, ego sum ipse, qui deleo iniquitates tuas propter me et peccatorum tuorum non recorda…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.821   50.14:6    Dicit ei Iesus: " Ego sum via et veritas et vita; nemo venit ad Patrem nisi per me.
  0.629   50.16:10   de iustitia vero, quia ad Patrem vado, et iam non videtis me;
  0.607   50.16:32   Ecce venit hora et iam venit, ut dispergamini unusquisque in propria et me solum relinqua…
  0.604   50.8:16   Et si iudico ego, iudicium meum verum est, quia solus non sum, sed ego et, qui me misit, …
  0.603   50.16:28   Exivi a Patre et veni in mundum; iterum relinquo mundum et vado ad Patrem ".

========================================================================================
Concept: True vine
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Ego sum vitis, vos palmites"

  MiniLM  (embedding)
  ───────────────────
  0.847   50.6:48   Ego sum panis vitæ.
  0.778   28.24:7    Ego in altissimis habitavi, et thronus meus in columna nubis.
  0.761   50.15:5    Ego sum vitis, vos palmites. Qui manet in me, et ego in eo, hic fert fructum multum, quia…
  0.739   28.24:40   Ego sapientia effudi flumina,
  0.728   50.17:18   Sicut me misisti in mundum, et ego misi eos in mundum;

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.518   23.56:13   Super me sunt, Deus, vota tua; reddam laudationes tibi,
  0.494   23.73:23   Ego autem semper tecum; tenuisti manum dexteram meam.
  0.455   50.15:5    Ego sum vitis, vos palmites. Qui manet in me, et ego in eo, hic fert fructum multum, quia…
  0.451    3.21:19   si fracto pede vel manu,
  0.443   23.39:11   Amove a me plagas tuas: ab ictu manus tuæ ego defeci.

========================================================================================
Concept: Light of the world
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Ego sum lux mundi; qui sequitur me non ambulabit in tenebris"

  MiniLM  (embedding)
  ───────────────────
  0.902   50.12:46   Ego lux in mundum veni, ut omnis, qui credit in me, in tenebris non maneat.
  0.845   23.71:14   Ego autem semper sperabo et adiciam super omnem laudem tuam.
  0.839   57.3:4    quamquam ego habeam confidentiam et in carne. Si quis alius videtur confidere in carne, e…
  0.837   23.38:14   Ego autem tamquam surdus non audiebam et sicut mutus non aperiens os suum;
  0.834   49.22:29   et ego dispono vobis, sicut disposuit mihi Pater meus regnum,

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.717   50.12:46   Ego lux in mundum veni, ut omnis, qui credit in me, in tenebris non maneat.
  0.715   50.8:12   Iterum ergo locutus est eis Iesus dicens: " Ego sum lux mundi; qui sequitur me, non ambul…
  0.612   50.9:5    Quamdiu in mundo sum, lux sum mundi ".
  0.578   50.17:16   De mundo non sunt, sicut ego non sum de mundo.
  0.576   47.5:14   Vos estis lux mundi. Non potest civitas abscondi supra montem posita;

========================================================================================
Concept: Lamb of God
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Ecce agnus Dei, qui tollit peccatum mundi"

  MiniLM  (embedding)
  ───────────────────
  0.856   53.1:29   ut non glorietur omnis caro in conspectu Dei.
  0.846   25.2:25   Quis enim comedet et deliciis affluet sine eo?
  0.844   52.3:23   omnes enim peccaverunt et egent gloria Dei,
  0.838   25.8:2    Os regis observa et propter iuramenta Dei
  0.838   51.27:23   astitit enim mihi hac nocte angelus Dei, cuius sum ego, cui et deservio,

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.577   50.1:29   Altera die videt Iesum venientem ad se et ait: " Ecce agnus Dei, qui tollit peccatum mund…
  0.551   53.11:32   dum iudicamur autem, a Domino corripimur, ut non cum hoc mundo damnemur
  0.537   28.30:20   sic qui effugatur a Domino portans mercedes iniquitatis,
  0.522    3.7:11   Hæc est lex hostiæ pacificorum quæ offertur Domino;
  0.509   33.7:5    Hæc dicit Dominus Deus: Afflictio super afflictionem ecce venit.

========================================================================================
Concept: Fear of the Lord
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Principium sapientiae timor Domini"

  MiniLM  (embedding)
  ───────────────────
  0.799   24.3:13   Beatus homo, qui invenit sapientiam et qui affluit prudentia:
  0.798    4.14:17   Magnificetur ergo fortitudo Domini, sicut iurasti dicens:
  0.797    5.28:28   Percutiet te Dominus amentia et cæcitate ac stupore mentis;
  0.797   57.4:5    Modestia vestra nota sit omnibus hominibus. Dominus prope.
  0.791   23.9:8    HE. Dominus autem in æternum sedebit, paravit in iudicium thronum suum;

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.691   24.9:10   Principium sapientiæ timor Domini, et scientia Sancti est prudentia.
  0.568   28.21:13   consummatio timoris Dei sapientia et sensus.
  0.556   28.1:34   Sapientia enim et disciplina timor Domini, et quod beneplacitum est illi,
  0.547   28.1:14   Dilectio Dei honorabilis sapientia;
  0.537   28.1:22   Corona sapientiæ timor Domini, repollens pacem et salutis fructum:

========================================================================================
Concept: Vanity of vanities
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Vanitas vanitatum, omnia vanitas"

  MiniLM  (embedding)
  ───────────────────
  0.850   25.1:2    " Vanitas vanitatum, dixit Ecclesiastes, vanitas vanitatum et omnia vanitas ".
  0.844   25.12:8    Vanitas vanitatum, dixit Ecclesiastes, et omnia vanitas.
  0.740   61.1:6    a quibus quidam aberrantes conversi sunt in vaniloquium,
  0.719    2.35:8    et oleum ad luminaria concinnanda et aromata, ut conficiatur unguentum et thymiama suavis…
  0.718    2.31:8    mensamque et vasa eius, candelabrum purissimum cum vasis suis et altaria thymiamatis

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.540   25.1:2    " Vanitas vanitatum, dixit Ecclesiastes, vanitas vanitatum et omnia vanitas ".
  0.503   25.12:8    Vanitas vanitatum, dixit Ecclesiastes, et omnia vanitas.
  0.458   13.28:14   de auro in pondere per singula vasa ministerii, de omnibus vasis argenteis in pondere per…
  0.440   29.56:9    Omnes bestiæ agri, venite ad devorandum, universæ bestiæ saltus.
  0.432   52.12:10   caritate fraternitatis invicem diligentes, honore invicem prævenientes,

========================================================================================
Concept: Our Father
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Pater noster, qui es in caelis, sanctificetur nomen tuum; adveniat regnum tuum, fiat voluntas tua"

  MiniLM  (embedding)
  ───────────────────
  0.847   28.36:4    Sicut enim in conspectu eorum sanctificatus es in nobis, sic in conspectu nostro magnific…
  0.843   18.3:2    " Ecce nos pueri Nabuchodonosor regis magni adstamus coram te; utere nobis, quemadmodum p…
  0.843   65.1:2    in novissimis his diebus locutus est nobis in Filio, quem constituit heredem universorum,…
  0.843   29.62:2    Et videbunt gentes iustitiam tuam, et cuncti reges gloriam tuam; et vocaberis nomine novo…
  0.842   30.22:15   Numquid regnabis, quoniam gloriaris in cedris? Pater tuus numquid non comedit et bibit? S…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.779   47.6:9    Sic ergo vos orabitis: Pater noster, qui es in cælis, sanctificetur nomen tuum,
  0.732   49.11:2    Et ait illis: " Cum oratis, dicite: Pater, sanctificetur nomen tuum, adveniat regnum tuum;
  0.726   23.8:2    Domine, Dominus noster, quam admirabile est nomen tuum in universa terra, quoniam elevata…
  0.691   23.102:16   Et timebunt gentes nomen tuum, Domine, et omnes reges terræ gloriam tuam,
  0.687   23.8:10   Domine, Dominus noster, quam admirabile est nomen tuum in universa terra!

========================================================================================
Concept: Magnificat
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Magnificat anima mea Dominum, et exsultavit spiritus meus in Deo salvatore meo"

  MiniLM  (embedding)
  ───────────────────
  0.865   49.1:47   et exsultavit spiritus meus in Deo salvatore meo,
  0.854    9.26:24   Et sicut magnificata est anima tua hodie in oculis meis, sic magnificetur anima mea in oc…
  0.853   28.25:1    In tribus placitum est spiritui meo, quæ sunt probata coram Deo et hominibus:
  0.849   23.146:2    laudabo Dominum in vita mea, psallam Deo meo, quamdiu fuero.
  0.834   23.71:3    Esto mihi in rupem præsidii et in domum munitam, ut salvum me facias, quoniam fortitudo m…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.856   49.1:47   et exsultavit spiritus meus in Deo salvatore meo,
  0.760   23.104:1    Benedic, anima mea, Domino. Domine Deus meus, magnificatus es vehementer! Maiestatem et d…
  0.755   23.86:2    Custodi animam meam, quoniam sanctus sum; salvum fac servum tuum, Deus meus, sperantem in…
  0.739    2.15:2    Fortitudo mea et robur meum Dominus, et factus est mihi in salutem. Iste Deus meus, et gl…
  0.734   42.3:18   Ego autem in Domino gaudebo et exsultabo in Deo salvatore meo.

========================================================================================
Concept: Faith, hope, love
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Nunc autem manet fides, spes, caritas, tria haec; maior autem ex his est caritas"

  MiniLM  (embedding)
  ───────────────────
  0.962   53.13:13   Nunc autem manet fides, spes, caritas, tria hæc; maior autem ex his est caritas.
  0.865   20.6:58   Nunc itaque demus dextras hominibus istis et faciamus cum illis pacem et cum omni gente e…
  0.860   59.3:6    Nunc autem, veniente Timotheo ad nos a vobis et annuntiante nobis fidem et caritatem vest…
  0.857   50.17:7    Nunc cognoverunt quia omnia, quæ dedisti mihi, abs te sunt,
  0.853   52.11:30   Sicut enim aliquando vos non credidistis Deo, nunc autem misericordiam consecuti estis pr…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.972   53.13:13   Nunc autem manet fides, spes, caritas, tria hæc; maior autem ex his est caritas.
  0.612   73.2:19   Novi opera tua et caritatem et fidem et ministerium et patientiam tuam et opera tua novis…
  0.559   61.6:11   Tu autem, o homo Dei, hæc fuge; sectare vero iustitiam, pietatem, fidem, caritatem, patie…
  0.553   56.3:17   habitare Christum per fidem in cordibus vestris, in caritate radicati et fundati,
  0.549   54.8:7    Sed sicut in omnibus abundatis, fide et sermone et scientia et omni sollicitudine et cari…

========================================================================================
Concept: All things work for good
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Diligentibus Deum omnia cooperantur in bonum"

  MiniLM  (embedding)
  ───────────────────
  0.849   59.5:21   omnia autem probate, quod bonum est tenete,
  0.839   25.7:11   Bona est sapientia cum divitiis et prodest videntibus solem.
  0.836   28.20:4    Quam bonum est correptum manifestare pænitentiam! Sic enim effugies voluntarium peccatum.
  0.829   27.19:1    Impiis autem usque in novissimum sine misericor dia ira supervenit; præsciebat enim et fu…
  0.826   28.41:3    O mors, bonum est iudicium tuum homini indigenti et, qui minoratur viribus,

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.684   52.8:28   Scimus autem quoniam diligentibus Deum omnia cooperantur in bonum, his, qui secundum prop…
  0.637   21.1:17   Per omnia benedictus Deus, qui tradidi
  0.596   63.2:11   Apparuit enim gratia Dei salutaris omnibus hominibus
  0.587   58.1:10   ut ambuletis digne Domino per omnia placentes, in omni opere bono fructificantes et cresc…
  0.572   54.9:8    Potens est autem Deus omnem gratiam abundare facere in vobis, ut, in omnibus semper omnem…

========================================================================================
Concept: A time for everything
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Omnia tempus habent, et suis spatiis transeunt universa sub caelo"

  MiniLM  (embedding)
  ───────────────────
  0.787    5.34:2    et universum Nephthali terramque Ephraim et Manasse et omnem terram Iudæ usque ad mare oc…
  0.743    1.7:19   Et aquæ prævaluerunt nimis super terram, opertique sunt omnes montes excelsi sub universo…
  0.726   34.11:37   Et deos patrum suorum non reputabit neque concupiscentiam feminarum nec quemquam deorum c…
  0.720    2.9:24   Et grando et ignis immixta pariter ferebantur; tantæque fuit magnitudinis, quanta ante nu…
  0.712   58.1:16   quia in ipso condita sunt universa in cælis et in terra, visibilia et invisibilia, sive t…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.738   25.3:1    Omnia tempus habent, et momentum suum cuique negotio sub cælo:
  0.569    1.1:17   Et posuit eas Deus in firmamento cæli, ut lucerent super terram
  0.567    1.2:1    Igitur perfecti sunt cæli et terra et omnis exercitus eorum.
  0.550   23.146:6    qui fecit cælum et terram, mare et omnia, quæ in eis sunt; qui custodit veritatem in sæcu…
  0.550   23.108:6    Exaltare super cælos, Deus, et super omnem terram gloria tua.

========================================================================================
Concept: Damascus road — Saul, why do you persecute me?
========================================================================================

── NVBSE ─────────────────────────────────────────────────────────────
Query: "Saule, Saule, quid me persequeris?"

  MiniLM  (embedding)
  ───────────────────
  0.789   51.22:7    et decidi in terram et audivi vocem dicentem mihi: "Saul, Saul, quid me persequeris?".
  0.764    9.9:26   Cumque mane surrexissent, et iam elucesceret, vocavit Samuel Saul in solario dicens: " Su…
  0.755   51.9:4    et cadens in terram audivit vocem dicentem sibi: " Saul, Saul, quid me persequeris? ".
  0.708   13.10:11   Hoc cum audissent viri Iabes Galaad, omnia scilicet quæ Philisthim fecerunt super Saul,
  0.708   23.57:1    Magistro chori. Secundum " Ne destruxeris ". David. Miktam. Quando a Saul in cavernam fug…

  LaBSE   (embedding_labse)
  ─────────────────────────
  0.666   51.9:4    et cadens in terram audivit vocem dicentem sibi: " Saul, Saul, quid me persequeris? ".
  0.657   51.22:7    et decidi in terram et audivi vocem dicentem mihi: "Saul, Saul, quid me persequeris?".
  0.576   40.6:3    " Popule meus, quid feci tibi et quid molestus fui tibi? Responde mihi.
  0.562   24.31:2    Quid, fili mi? Quid, fili uteri mei? Quid, fili votorum meorum?
  0.536   23.39:8    Et nunc quæ est exspectatio mea, Domine? Spes mea apud te est.

The English and Italian probe outputs are produced by scripts/evaluate_labse_vs_minilm.py on the branch — re-runnable locally.

1 reply

JohnRDOrazio May 12, 2026
Maintainer Author

probe_queries.json
Here is the full probe output

JohnRDOrazio · 2026-05-12T23:43:27Z

JohnRDOrazio
May 12, 2026
Maintainer Author

Investigation: are the LaBSE English regressions a systematic weakness?

The previous follow-up identified four cases where LaBSE missed verses MiniLM caught: Lamb of God EN, Our Father EN, Magnificat IT, Word made flesh EN/IT/LA. The shared property — all liturgical / Christological — suggested a possible category-level weakness in LaBSE's English handling. That would be a real concern if true, because English is the largest user base for the API.

To test, ran a focused supplementary probe set of 10 additional English-only Christological / liturgical verses: I-Am sayings, eucharistic narrative, Annunciation, Isaian messianic prophecies, Pauline Christological compression, the Gloria. Probe file: scripts/probe_queries_liturgical_followup.json on the branch.

Result: the hypothesis does not survive

`#`	Concept	MiniLM	LaBSE	Notes
1	I am the resurrection and the life	✓	✓	tie
2	I am the good shepherd	✓	✓	tie (both 0.92+)
3	I am the Alpha and the Omega	✓	✓	tie
4	Take, eat — this is my body	✗ `Acts 27:34` "take some food"	✓	LaBSE wins
5	Annunciation — Gabriel sent to a virgin	✗ `Luke 1:19` (Gabriel's other speech)	✓	LaBSE wins
6	For a child is born	✓	✓	tie
7	Virgin shall conceive	✓	✓	tie
8	He humbled himself unto death	✓	✗ (`2 Macc 6:19`)	MiniLM wins — `Phil 2:8` not in LaBSE top-5
9	Christ in you, the hope of glory	✗ `Titus 2:13` (token: `glory`)	✓	LaBSE wins
10	Gloria — Glory to God in the highest	✗ `Rom 1:7` "grace to you and peace"	✓	LaBSE wins

Tally: LaBSE 4 wins, MiniLM 1 win, 5 ties.

On a dedicated set of 10 English Christological / liturgical probes, LaBSE out-performs MiniLM. The MiniLM #1 misses here are the exact token-noise pattern observed in the Latin column earlier (random Pauline openings, take some food, etc.), just less frequent in English than in Latin.

What the original four regressions actually were

Looking back at the misses with this updated context, they read as isolated per-verse artifacts rather than a class:

Verse	Diagnosis
`Lamb of God EN`	Query is unusually short. LaBSE's neighbors are generic theophany / judgment psalms. Probable tokenization / sentence-length interaction.
`Our Father EN`	Query is unusually long with two parallel clauses. LaBSE picks up `your kingdom` and `name` as independent anchors, surfacing kingship verses. Possible clause-blending.
`Magnificat IT`	LaBSE picks `Ps 84:3` ("l'anima mia anela…") — same opening phrase, different verse. Close paraphrase, not a wild miss.
`Word made flesh EN/IT/LA`	The only case with consistent cross-language failure (in Latin the correct verse is `#2`, not `#1`). Highly compressed theological language. Single data point.
`He humbled himself unto death EN` (new)	Same shape as Word-made-flesh: Pauline metaphor in compressed form (`obedient unto death`).

Of the five, two share a property — highly compressed Pauline/Johannine theological metaphor (Verbum caro factum est, humiliavit semetipsum). These may be a genuine LaBSE blind spot at the tightest end of theological compression. The other three are language-/length-specific quirks.

Updated read

The case for LaBSE strengthens. Across the now 30 probes × 3 languages = 90 queries (plus 10 EN-only = 100 queries total):

Latin: LaBSE clearly better — the original concern of this discussion, decisively addressed.
Italian: LaBSE slightly better.
English: roughly equal, with LaBSE recovering more token-noise misses than it loses on tightly-compressed theological metaphor.

The previously-flagged "liturgical regression" caveat from the prior follow-up was overstated. The actual residual issue is narrower: ~1–2 known verses with abstract Pauline / Johannine theological compression. Worth tracking post-cutover but not blocking.

Updated recommended next steps

Proceed toward LaBSE cutover. The Latin gains are decisive and English is at worst neutral.
Recalibrate threshold default on /v3/search/semantic (LaBSE median #1 score is 0.72 vs MiniLM's 0.88; today's 0.5 default would let through significantly more weak matches).
Document known-edge-case verses (Word made flesh, Phil 2:8) so they're in the user-visible release notes rather than discovered in the wild.
Resolve Pin SentenceTransformer model revision for reproducible builds #71 (commit-pinning) by pinning to the LaBSE commit hash chosen for the cutover.

1 reply

JohnRDOrazio May 12, 2026
Maintainer Author

probe_queries_liturgical_followup.json

JohnRDOrazio · 2026-05-17T05:35:06Z

JohnRDOrazio
May 17, 2026
Maintainer Author

LaBSE is highly effective for Biblical and Ecclesiastical Latin because its training corpus includes large-scale parallel data from sources like the Latin Vulgate and subsequent liturgical translations. [1, 2, 3]

Why LaBSE works for Church Latin

Parallel Training: Unlike "monolingual" models (e.g., LatinBERT), LaBSE was trained on bitext (sentence pairs). Since the Bible is the most translated text in history, the model has a very deep "understanding" of the semantic relationship between the Latin Vulgate and its modern language counterparts.
Vocabulary Coverage: It handles the specialized theological vocabulary (e.g., gratia, salvatio, incarnatio) better than models trained only on Classical texts (like Cicero or Virgil).
Intertextual Mapping: Research shows LaBSE is particularly strong at identifying biblical citations and allusions in Patristic literature (e.g., the works of St. Augustine). [1, 4, 5, 6, 7, 8]

Performance vs. Alternatives

Feature [4, 9, 10, 11]	LaBSE	LatinBERT
Best Use Case	Semantic Search & Cross-lingual retrieval	Grammatical analysis & Word-level tasks
Source Data	Multilingual bitext (inc. Vulgate)	Large Latin-only corpus (Classical focused)
Search Accuracy	Higher for sentence-level similarity	Lower for search without fine-tuning

Tips for your Implementation

Normalization: Ecclesiastical Latin often uses "j" for "i" (e.g., Jesus vs. Iesus) and "v" for "u". While LaBSE is robust, pre-processing your text to a standard orthography (usually all "u" and "i") can slightly improve matching.
Cross-Lingual Queries: You can search your Latin corpus using English or Italian queries directly. The model's shared vector space makes it possible to find "The Prodigal Son" by searching for those English terms against a Latin text of the Vulgate.
Use Cosine Similarity: For the retrieval step, always use cosine similarity rather than Euclidean distance for the best semantic results with LaBSE. [1, 10, 11, 12, 13]

Are you building a tool for scholarly research (like tracking how a specific verse is used across different Church Fathers) or a more general liturgical search engine?

[1] [https://iris.unimore.it](https://iris.unimore.it/handle/11380/1371269)
[2] [https://iris.unimore.it](https://iris.unimore.it/handle/11380/1371269)
[3] [https://discourse.computational-humanities-research.org](https://discourse.computational-humanities-research.org/t/testing-the-limits-of-neural-sentence-alignment-models-on-classical-greek-and-latin-texts-and-translations/2076)
[4] [https://www.youtube.com](https://www.youtube.com/watch?v=7tAWk_Coj-s)
[5] [https://www.youtube.com](https://www.youtube.com/watch?v=WJqz9kzLjF8&t=34)
[6] [https://www.researchgate.net](https://www.researchgate.net/publication/352365029_Profiling_of_Intertextuality_in_Latin_Literature_Using_Word_Embeddings)
[7] [https://journals.openedition.org](https://journals.openedition.org/ijcol/624)
[8] [https://www.itserr.it](https://www.itserr.it/the-biblical-heritage-in-ancient-latin-christian-literature-advancing-intertextual-mapping-through-sentence-embeddings/)
[9] [https://pubs.galatolo.me](https://pubs.galatolo.me/galatololatin.pdf)
[10] [https://www.scitepress.org](https://www.scitepress.org/Papers/2023/121347/121347.pdf)
[11] [https://arpi.unipi.it](https://arpi.unipi.it/handle/11568/1217095)
[12] [https://arxiv.org](https://arxiv.org/html/2501.10731v1)
[13] [https://arxiv.org](https://arxiv.org/pdf/2211.00046)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BibleGet-I-O

Embedding model and Latin coverage: notes on paraphrase-multilingual-MiniLM-L12-v2 for NVBSE and VGCL #107

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

BibleGet-I-O

Embedding model and Latin coverage: notes on paraphrase-multilingual-MiniLM-L12-v2 for NVBSE and VGCL #107

Uh oh!

JohnRDOrazio May 7, 2026 Maintainer

TL;DR

Why "not supported" doesn't mean "broken"

Three concrete options

Recommendation

Why this is worth pinning

Replies: 5 comments · 2 replies

Uh oh!

JohnRDOrazio May 7, 2026 Maintainer Author

Empirical Latin evaluation results

Modern languages (control)

Latin

Confirmation of the predicted pattern

Recommendation

Uh oh!

Uh oh!

JohnRDOrazio May 12, 2026 Maintainer Author

Empirical comparison: paraphrase-multilingual-MiniLM-L12-v2 vs LaBSE

Setup

Headline — top-1 accuracy

Concrete examples

Score-scale differences

Read

Suggested next steps

Full raw output

Uh oh!

JohnRDOrazio May 12, 2026 Maintainer Author

Follow-up: expanded to 20 probes (60 total queries)

Headline — per-language top-1 (across all 20 concepts)

The Latin story is now unambiguous

LaBSE regressions on the English/Italian side

Score-scale shift, refined

Decision-strength read

Raw output (Latin probes only — full set is 90KB, exceeds GitHub comment cap)

Uh oh!

JohnRDOrazio May 12, 2026 Maintainer Author

Uh oh!

JohnRDOrazio May 12, 2026 Maintainer Author

Investigation: are the LaBSE English regressions a systematic weakness?

Result: the hypothesis does not survive

What the original four regressions actually were

Updated read

Updated recommended next steps

Uh oh!

JohnRDOrazio May 12, 2026 Maintainer Author

Uh oh!

JohnRDOrazio May 17, 2026 Maintainer Author

Why LaBSE works for Church Latin

Performance vs. Alternatives

Tips for your Implementation

JohnRDOrazio
May 7, 2026
Maintainer

Replies: 5 comments 2 replies

JohnRDOrazio
May 7, 2026
Maintainer Author

JohnRDOrazio
May 12, 2026
Maintainer Author

JohnRDOrazio
May 12, 2026
Maintainer Author

JohnRDOrazio May 12, 2026
Maintainer Author

JohnRDOrazio
May 12, 2026
Maintainer Author

JohnRDOrazio May 12, 2026
Maintainer Author

JohnRDOrazio
May 17, 2026
Maintainer Author