Skip to content

fix(search): point jina-code alias at published HF repo#1053

Merged
carlos-alm merged 1 commit intomainfrom
fix/jina-code-alias
May 4, 2026
Merged

fix(search): point jina-code alias at published HF repo#1053
carlos-alm merged 1 commit intomainfrom
fix/jina-code-alias

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Re-applies the fix from Fix jina-code model alias #1026 since that author has not signed the CLA.
  • Maps jina-code to jinaai/jina-embeddings-v2-base-code (the published code embedding model). The previous alias Xenova/jina-embeddings-v2-base-code 404s on Hugging Face.
  • Drops the stale "requires HF token" note from the README — the published repo is public.
  • Adds a regression test asserting the alias resolves to the correct repo.

Fixes #1025.

Test plan

  • npx vitest run tests/search/embedding-strategy.test.ts (26/26 pass, including the new test)
  • npx tsc --noEmit
  • npx biome check src/domain/search/models.ts tests/search/embedding-strategy.test.ts

The `jina-code` alias mapped to `Xenova/jina-embeddings-v2-base-code`,
which 404s on Hugging Face. Point it at `jinaai/jina-embeddings-v2-base-code`,
the published code embedding model, drop the stale "requires HF token" note
in the README, and add a regression test for the alias.

Fixes #1025.
@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Reviewing PR #1053: fix(search): point jina-code alias at published HF repo

Todo List:

  • Read and analyze changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance rating
  • Provide final assessment and recommendation

Root Cause Analysis

Problem Being Solved: The jina-code embedding model alias was pointing to Xenova/jina-embeddings-v2-base-code, which returns a 404 error on Hugging Face Hub. This breaks semantic search functionality when users try to use the code-specific embedding model.

Why This Approach: The fix correctly redirects the alias to jinaai/jina-embeddings-v2-base-code, which is the actual published repository for Jina's code embedding model. The approach is surgical and targeted - it fixes the broken URL without changing any other behavior.

Risk Assessment: Very low risk. This is a simple string replacement that fixes a broken external dependency reference. The change is well-tested with a regression test.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Maintains local-first approach, fixes broken external model reference
  • Problem-fit: 4/5 — Directly fixes broken functionality that affects AI agents using semantic search for code understanding
  • Breaking: No — Purely fixes broken functionality, no API changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Assessment

Changes Analysis:

  1. models.ts:45 - Changes alias from Xenova/jina-embeddings-v2-base-codejinaai/jina-embeddings-v2-base-code
  2. README.md:431 - Removes "(requires HF token)" note since the new repo is public
  3. Tests - Adds regression test asserting the correct alias mapping

Strengths:

  • Precise fix: Only changes the broken URL, nothing else
  • Well-tested: Includes regression test to prevent future breakage
  • Documentation sync: Updates README to reflect that HF token is no longer needed
  • Clean commit: Single concern, good commit message, references the issue

Test Quality:
The regression test is simple but effective:

test('jina-code points to the published code embedding model', () => {
  expect(MODELS['jina-code'].name).toBe('jinaai/jina-embeddings-v2-base-code');
});

This will catch any future regressions where the alias gets pointed back to the wrong repository.

Technical Verification:

  • I verified the old URL (Xenova/jina-embeddings-v2-base-code) would indeed 404
  • The new URL (jinaai/jina-embeddings-v2-base-code) points to the official Jina AI repository
  • The model dimensions and other config remain unchanged, maintaining compatibility

Minor Observations

  • The fix correctly maintains all other model metadata (dim: 768, contextWindow: 8192, etc.)
  • No version bump needed since this fixes broken functionality rather than changing behavior
  • The change follows the existing patterns in the codebase

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE
  • Reasoning: This is a clean, well-tested fix for broken functionality. The change is minimal and surgical, includes appropriate test coverage, and maintains all compatibility. The only reason it's not 5/5 is that it's a relatively straightforward bug fix rather than an exceptional contribution.

The PR successfully resolves issue #1025 and restores working semantic search functionality for code-specific embeddings.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 3, 2026

Greptile Summary

This PR fixes the broken jina-code model alias by pointing it at the canonical jinaai/jina-embeddings-v2-base-code HuggingFace repo instead of the now-404 Xenova mirror, drops the stale HF-token requirement note from the README, and adds a regression test to guard against future regressions.

Confidence Score: 5/5

Safe to merge — minimal, targeted fix with no logic changes beyond the broken alias

Single-line model-name fix backed by a regression test and a matching README cleanup; no logic, API, or schema changes; no issues found

No files require special attention

Important Files Changed

Filename Overview
src/domain/search/models.ts Updates jina-code alias from the 404-ing Xenova mirror to the canonical jinaai/jina-embeddings-v2-base-code HF repo
tests/search/embedding-strategy.test.ts Adds a targeted regression test asserting the jina-code alias resolves to the correct HF repo name
README.md Removes the stale "requires HF token" note from the jina-code row, consistent with the model now pointing to a public repo

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["User requests 'jina-code' model"] --> B["getModelConfig('jina-code')"]
    B --> C["MODELS['jina-code'].name"]
    C -->|Before PR| D["Xenova/jina-embeddings-v2-base-code\n(404 on HuggingFace)"]
    C -->|After PR| E["jinaai/jina-embeddings-v2-base-code\n(public, published repo)"]
    E --> F["loadModel → HuggingFace download"]
    F --> G["Embeddings generated"]
Loading

Reviews (1): Last reviewed commit: "fix(search): point jina-code alias at pu..." | Re-trigger Greptile

@carlos-alm carlos-alm merged commit a9d9f74 into main May 4, 2026
24 checks passed
@carlos-alm carlos-alm deleted the fix/jina-code-alias branch May 4, 2026 04:08
@github-actions github-actions Bot locked and limited conversation to collaborators May 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix jina-code model alias

1 participant