fix(intrinsics): pin catalogue entries to HF revision SHAs + deduplicate requirement_check (#1135)#1157
fix(intrinsics): pin catalogue entries to HF revision SHAs + deduplicate requirement_check (#1135)#1157planetf1 wants to merge 5 commits into
Conversation
…ate requirement_check - Add `revision` field to `IntriniscsCatalogEntry` (required; 40-char lowercase hex SHA or literal "main") - Add `validate_revision()` validation function alongside the type - Populate all 13 catalogue entries with upstream HEAD SHAs pinned 2026-05-26 (re-fetch before merge per issue guidance) - Collapse `requirement_check` / `requirement-check` duplicate to a single canonical `requirement_check` entry backed by `_CORE_R1_REPO`; remove deprecated `_CORE_REPO` (`rag-intrinsics-lib`) constant - Update `CustomIntrinsicAdapter` to pass `revision="main"` when injecting user-defined entries into the catalogue - Add `test/backends/test_adapters/test_catalog_revision.py` covering all acceptance criteria from issue generative-computing#1135 Closes generative-computing#1135 Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- core.py:57 — call `call_intrinsic("requirement_check", …)` (underscore)
instead of the deleted hyphenated catalogue key; previously raised
`ValueError: Unknown intrinsic name 'requirement-check'` at runtime
- catalog.py — switch `_REVISION_HEX_RE.match` to `re.fullmatch` and drop
redundant `^…$` anchors (idiomatic Python ≥3.4)
- catalog.py — annotate each pinned SHA with `# main @ 2026-05-26` so
future bumps are auditable without git archaeology
- test_catalog_revision.py — iterate via the public
`known_intrinsic_names()` / `fetch_intrinsic_metadata()` API instead of
the private `_INTRINSICS_CATALOG_ENTRIES` symbol
Verified: 10 adapter tests + 277-test fast suite green.
Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…#1135 - Export `validate_revision` from `mellea.backends.adapters` so users authoring `CustomIntrinsicAdapter` subclasses can validate their own revision strings with the same contract as the catalogue - Drop the post-merge-stale "re-fetch upstream HEAD before merging" reminder; per-SHA `# main @ 2026-05-26` annotations remain - Add `# TODO(phase-2.2)` markers at the two `obtain_io_yaml` / `obtain_lora` call sites so the inert-revision gap is visible - Test: replace inline format check with a `validate_revision()` call so the assertion can't drift from the validator's contract - Test: make `test_revision_round_trip_via_fetch` tolerant of `"main"` in case `answerability` is ever flipped to track-latest - Test: drop the `if __name__ == "__main__":` block (suite runs via `uv run pytest`) Tests: 10 adapter tests pass; ruff clean. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Upstream test_catalog.py constructs `IntriniscsCatalogEntry` directly without `revision`, which now fails after the field was made required. Adds `revision="main"` so the test re-passes against the new contract. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
83c131c to
71582ba
Compare
There was a problem hiding this comment.
I'm slightly skeptical that this PR pinning version numbers should happen first. After seeing it, it seems to stick us into the catalog model quite firmly and limit our ability to add custom adapters (with the different function / download path) as mentioned in the proposal.
Additionally, we may want to introduce version warnings. Currently, we check if an adapter with the same name is currently added; but with explicit versioning, should we also compare that the version metadata matches?
I do think this is the correct direction though. I think the use case I'm most worried about is the following:
- Someone adds an adapter with the name "requirement-check" pointing to a custom adapter
- Our top-level wrapper function
check_requirementdoes it's check to add the adapter - We see that the adapter with the same name has already been added (since we aren't checking additional version metadata)
- Our wrapper fails because the provided custom adapter with the same name results in a different output structure
It might be enough in those cases to check the io.yaml declared output structure and not require a strict version match (or allow overriding the version check), etc...
-------------------- Edit --------------------
The more I think about this, the more I'm okay with version pinning these catalog entries. I think the other option is to actually pin on functionality as well (I started commenting in this one: #1158 (comment)). @psschwei said he will share a document about versioning so it might be best to wait until we get all that sorted out to merge this one and so that we can have the versioning conversation all in one spot.
| revision (str): HuggingFace commit SHA (40 lowercase hex chars) pinned | ||
| at catalogue-write time, or ``"main"`` to track the latest commit. |
There was a problem hiding this comment.
Is there a reason to limit this to these options. It looks like the huggingface_hub revision parameter accepts:
An optional Git revision id which can be a branch name, a tag, or a
commit hash.
There was a problem hiding this comment.
+1 for accepting tags/branches
There was a problem hiding this comment.
the other thing to take into account is that we will probably want to support a range of versions, similar to how one can do something like pytorch >= 1.2.3, <=2.0 for python dependencies (not something we can do today, but soon, so should make sure that our approach is eventually compatible with that)
There was a problem hiding this comment.
Same answer as the line 20 thread — relaxing the validator to accept any non-empty string covers tags and branches, no good reason to be stricter than HF.
On the version-range point: a plain string field can carry >=1.2.3,<=2.0 syntax just as well as a SHA or a tag, so this doesn't close off future options. Whether we eventually want a structured revision_spec (or a separate field) for ranges is a question for the versioning discussion.
There was a problem hiding this comment.
Same change as the line 20 thread — relaxed in the latest commit; tags and branch names accepted.
| """ | ||
| result_json = call_intrinsic( | ||
| "requirement-check", context, backend, kwargs={"requirement": requirement} | ||
| "requirement_check", context, backend, kwargs={"requirement": requirement} |
There was a problem hiding this comment.
Please run the tests to confirm that this works? My understanding is that this should fail. requirement-check is the new name for the adapter function. requirement_check was the name of a previous iteration that was only released for granite3.2 and 3.3. I am fine with removing support for those adapters / model versions.
There was a problem hiding this comment.
Confirmed. Direct HF check at the pinned SHA: HfApi().list_repo_files('ibm-granite/granitelib-core-r1.0', revision=d0a2a96...) shows requirement-check/ exists for granite-4.0-micro and granite-4.1-3b/8b/30b (both lora and alora variants), and no requirement_check/ path at all. So the current catalogue entry would fail to resolve at download time for any granite-4 model.
Will rename the catalogue entry and the call here to requirement-check. On dropping granite-3.2/3.3 support: with revision pinning targeted at granite-4 SHAs, that support is effectively already gone — supporting older models would need their own catalogue entries pointing at older repo SHAs, which we're not adding.
There was a problem hiding this comment.
Confirmed against the pinned SHA: HfApi().list_repo_files('ibm-granite/granitelib-core-r1.0', revision=d0a2a9…) returns requirement-check/ for every supported base model and no underscored variant. Rename applied in the latest commit — catalogue entry, this lookup, and the test assertion all flip to the hyphen form.
Do we want to allow people to override the canonical adapters? Or should we require them to use a distinct name? (I think I lean towards requiring unique names, but I could be easily nudged the other way too) |
| name="context-attribution", repo_id=_CORE_R1_REPO, revision=_CORE_R1_SHA | ||
| ), | ||
| IntriniscsCatalogEntry( | ||
| name="requirement_check", repo_id=_CORE_R1_REPO, revision=_CORE_R1_SHA |
There was a problem hiding this comment.
inconsistency here in terms of using dashes or underscores in names
I know python favors using underscores, but since the actual adapter names use dashes maybe it's better to use those? (I can see someone getting mildly annoyed if they copied a name from huggingface but it didn't work)
There was a problem hiding this comment.
also, claude flagged a couple of places where the hyphen version is still used:
- mellea/backends/openai.py:488
- mellea/backends/huggingface.py:400
- mellea/stdlib/requirements/requirement.py:83
- example docs/examples/intrinsics/intrinsics.py:31
There was a problem hiding this comment.
The underscores / dashes in the name field are required unless we add a distinct role field. The granite libraries team is not consistent in their naming conventions. (Also please address my above comment before making any changes to the underscore / dashed name.)
There was a problem hiding this comment.
Agreeing with Jake: the name field has to mirror whatever granite-libs publishes, and they're internally inconsistent — some intrinsics use hyphens, some underscores. We can't normalise here without a separate role field, which is a #929 concern.
The catalogue entry at line 110 is the only outlier. Every call already uses requirement-check — openai.py:488, huggingface.py:454, requirement.py:83, intrinsics.py:32. Renaming the entry to requirement-check aligns the codebase. The requirement_check reference at requirement.py:38 is the adapter's output JSON key (set by the model), not the adapter name, so that stays.
Verified Jake's hypothesis via HF: HfApi().list_repo_files('ibm-granite/granitelib-core-r1.0', revision=<pinned SHA>) shows requirement-check/ exists for all granite-4 variants (granite-4.0-micro, 4.1-3b/8b/30b, both lora and alora) and no requirement_check/ path at all. The current catalogue entry points at a non-existent folder.
There was a problem hiding this comment.
Rename applied: catalogue entry is now requirement-check, with the matching update in mellea/stdlib/components/intrinsic/core.py (see the line-49 thread). Test assertion flipped. The output JSON key ("requirement_check") is the adapter's own schema and stays as-is.
| # Mellea will update which repositories are linked as new ones come online. The original | ||
| # repos are on an older layout that will be changed. | ||
| _RAG_REPO = "ibm-granite/granitelib-rag-r1.0" | ||
| _CORE_REPO = "ibm-granite/rag-intrinsics-lib" # Temporary; used by requirement checker |
There was a problem hiding this comment.
I know the repo variables predate this PR, but given the scope of the parent epic then this seems like the time to ask: would it be worth splitting this into separate variables, say for org, repo, and maybe host (?), and then combine into a final one? there could be some value in being able to access the individual parts without having to parse them out
(it's a minor thing, but if we want it figured it'll never be easier to do than now so...)
There was a problem hiding this comment.
Fair point — worth thinking through alongside the versioning discussion rather than deciding here. Two considerations to bring into that:
First, the HF API takes repo_id as a single string (hf_hub_download(repo_id="org/repo", ...)). Any structured split has to be recombined at every call site, so it's only worth the extra work if the structured form gives us something we actually need.
Second, the obvious thing it could give us is access to the version part of the name (-r1.0) — and how we want to handle versions is exactly what Paul's note will settle.
There was a problem hiding this comment.
We should confirm with the granite-libraries team before we utilize that -r1.0 as an actual version. They've currently been updating the adapters in place rather than creating new hf repos.
| revision (str): Either a 40-character lowercase hex commit SHA or the | ||
| literal string ``"main"``. |
There was a problem hiding this comment.
I could also see wanting to use branch or tag as possible revision values
There was a problem hiding this comment.
Agreed. The current validator restricts to SHA-or-"main" because catalogue entries should be pinned for reproducibility — but that's a goal for catalogue entries, not a constraint on the field itself. Users supplying their own revision=... should be free to use any value HF accepts, including tags and branches.
I'll relax the validator to accept any non-empty string. If we want to keep the catalogue pinning convention enforced, we could add a build/CI step that re-resolves the named branch (main) against the pinned SHA and fails or warns when upstream has moved — so drift gets noticed at build time rather than being prevented by the field type. Worth doing? Or leave it to review discipline for now? Opinions welcome.
Same fix covers Jake's note at line 61, so I'll consolidate the discussion there.
There was a problem hiding this comment.
Validator now relaxed in the latest commit. Accepts any non-empty string — branch, tag, or SHA. The SHA-pinning convention is enforced by review; a build-time check is on the table on the line 99 thread.
| revision (str): HuggingFace commit SHA (40 lowercase hex chars) pinned | ||
| at catalogue-write time, or ``"main"`` to track the latest commit. |
There was a problem hiding this comment.
the other thing to take into account is that we will probably want to support a range of versions, similar to how one can do something like pytorch >= 1.2.3, <=2.0 for python dependencies (not something we can do today, but soon, so should make sure that our approach is eventually compatible with that)
| _RAG_SHA = "2f0b2c79c6731068625aca8045c2eb2e8912b353" # main @ 2026-05-26 | ||
| _CORE_R1_SHA = "d0a2a96a4cd07e96f0fe7ca29a42bfe088299d43" # main @ 2026-05-26 | ||
| _GUARDIAN_SHA = "773b254e98f993a605ec4b6259634906e0e64e8e" # main @ 2026-05-26 |
There was a problem hiding this comment.
not for this PR, but as part of the epic do you envision any kind of validator for the SHAs?
There was a problem hiding this comment.
Yes — in scope for the epic, but the exact form depends on what the versioning discussion settles. This question sits in the same conversation as the related threads here (lines 20 and 59) — they're all really about how versioning is expressed end-to-end, so the answers will come together rather than separately.
HF gives us cheap options either way: a successful download against a pinned SHA inherently verifies (the SHA is the content hash), or HfApi().model_info(repo_id, revision=...) returns metadata without pulling weights.
I think we must at least allow users to override the default in a way that allows them to auto-route to requirement-check functionality for AloraRequirements. |
|
Thanks — happy to wait for Paul's versioning note. Is #1111 the right home for it, or is it being written up separately as an issue, PR, or discussion I should follow? Worth noting this needs settling before #1135 can go further, and likely before several other phases of #929 too — so agreeing the versioning approach is now the priority. Not pushing back on the wait; just pointing it out so the epic plan reflects it. Meanwhile I'll do the in-scope tidy-ups: rename the catalogue entry to |
+1 to Jake — The genuine open question is what happens when both a canonical and a custom adapter are loaded under the same name: which one runs, and how does the caller pick? That's an adapter-lifecycle / override-semantics question and belongs with #929 and Paul's versioning note. Since this PR is waiting for the versioning discussion before it can merge anyway, the dedup tweak doesn't need settling in isolation. The minimal change I had in mind for the silent-misroute case Jake described was switching the dedup key from name-only to |
it'll start as an internal doc, but eventually we'll put out a public version |
Where we arePosition: agree the versioning approach first (psschwei's note, see #1111), then settle the rest. A few in-scope tidy-ups don't need to wait — applied in the latest commit; detail on the relevant threads. Applied
Deferred to the versioning discussion
PR remains gated on the versioning agreement before merge. |
…k entry - validate_revision now accepts any non-empty string (branch, tag, or commit SHA), matching HuggingFace's revision contract. Catalogue entries continue to pin to commit SHAs by convention; that is enforced by review and (optionally) a build-time check rather than by the validator. Field description and class docstring updated. - Catalogue entry renamed requirement_check -> requirement-check to match the actual folder layout in granitelib-core-r1.0; the lookup in mellea/stdlib/components/intrinsic/core.py is updated to suit. The "requirement_check" output JSON key (adapter's own schema) is unchanged. - Tests updated: drop over-strict format checks (short / long / non-hex / upper / "HEAD" / "latest" are now valid HF revisions), keep the empty-string rejection, add a positive case for tags and branch names, and flip the duplicate-entry assertion. Refs generative-computing#1135 Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
This PR has a blocking dependency on it, so at least part needs to be agreed and documented. I'm not sure if you're planning something with broader scope If we think this is a significant change with wide scope we may want to make an interim decision to unblock this epic - or put the whole lot on hold for longer (which risks some decay of the design as other changes go on) |
|
Thinking about how to unblock this PR and others in the epic from more versioning discussion. Most of the open points here are already settled in #1080: schema versioning was deliberately deferred, with two things in its place — pin the HF revision, and raise a mismatch error when the parser can't satisfy the declared output contract. #1111 is where we revisit the topic later, and it lists the triggers for doing so; none of them have happened. psschwei has raised a good point, but I don't think it needs to come before Phase 0. The thought is to split the work into three layers:
Layers 1 and 2 can move now and can be reversed if we change our minds. Layer 3 is where the versioning decision shows up in behaviour, so it can wait without slowing the rest down. Sensible defaults for the open threads, all matching today's behaviour and all reversible:
One safeguard: a single guard test that asserts HF requests match the pre-Phase-0 baseline. Cheap protection against regressions while the versioning conversation continues separately. If we're agreed on this, I'll move the conclusion into #929 and update any sibling issues affected by the sequencing. |
|
I am fine with the |
Summary
Implements issue #1135 (Epic #929 Phase 0, Wave 1).
Why
Mellea ships intrinsics by downloading LoRA / aLoRA adapters from HuggingFace at runtime. The catalogue (
mellea/backends/adapters/catalog.py) records which repository each intrinsic lives in — but until now, not which version. When upstream pushes new weights, every Mellea install picks them up silently.PR #1008 was the worked example: the
requirement-checkadapter's output schema flipped upstream andrequirement_check_to_boolstarted returningFalsefor every call until someone noticed.This PR closes that gap. Each catalogue entry now carries a 40-character HuggingFace commit SHA, and a Pydantic validator enforces the format at construction time — so accidental drift (typos, branch names, partial SHAs) fails at module load rather than at first download. Callers who genuinely want to track-latest opt in explicitly with
revision="main".Where this fits in Epic #929
Epic #929 is the broader adapter-lifecycle redesign. This PR is Phase 0, Wave 1 — the metadata-only foundation. PR #1158 (issue #1134) is the parallel Phase 0 / Wave 1 work, adding the new type scaffolding (
Adapter,Identity,IOContract,WeightsBinding); the two PRs are independent and can land in either order. The newrevisionfield here is not yet threaded through to the actualobtain_lora/obtain_io_yamlcalls; those still resolve againstmain. Wiring the SHA through to the download path is Phase 2.2 in the epic and is explicitly out of scope per the issue.# TODO(phase-2.2)markers at both call sites flag the gap. See #929 for the full phase plan.While in the catalogue, the PR also takes the opportunity to perform a small bit of cleanup the epic depends on: collapsing the duplicate
requirement_check/requirement-checkentries into a single canonical key. The two had drifted to different repositories and confused downstream callers; one canonicalrequirement_checkentry on_CORE_R1_REPO(granitelib-core-r1.0) replaces them. The downstream caller incore.pywas updated to match.What changed
revisionfield toIntriniscsCatalogEntry— a 40-char lowercase hex commit SHA or the literal"main". Avalidate_revision()helper enforces the constraint; a Pydanticfield_validatorapplies it at model construction time.validate_revisionis exported frommellea.backends.adaptersfor use byCustomIntrinsicAdapterauthors.requirement_check/requirement-checkduplicate: removes the hyphenated entry and the temporary_CORE_REPO(rag-intrinsics-lib) constant; the canonicalrequirement_check(underscore) entry now points to_CORE_R1_REPO(granitelib-core-r1.0).mellea/stdlib/components/intrinsic/core.py:57(requirement_check()) to call the renamed canonical key, so the public stdlib helper continues to work after the collapse.CustomIntrinsicAdapterto passrevision="main"when injecting user-defined entries into the catalogue (custom adapters track latest by default).Before / After
Before:
IntriniscsCatalogEntry(name="answerability", repo_id=_RAG_REPO)— no revision pinning, silently tracks whatever upstream pushes.After:
IntriniscsCatalogEntry(name="answerability", repo_id=_RAG_REPO, revision=_RAG_SHA)— locked to a specific commit;revision="main"is the explicit opt-in for tracking-latest.Testing
9 tests added covering:
test_catalog_entries_have_revision— every entry passesvalidate_revision()test_revision_validation_rejects_malformed— short, long, non-hex, uppercase,HEAD,latest, emptytest_revision_validation_accepts_valid_shatest_revision_validation_accepts_main_literaltest_revision_field_rejects_malformed_via_pydantictest_revision_field_rejects_none_via_pydantictest_revision_round_triptest_revision_round_trip_via_fetchtest_no_duplicate_requirement_check_entryFull non-qualitative suite (388 tests across
test/excludingtest/cli/which has unrelated optional-extras collection errors): all pass.Acceptance criteria checklist
revisionfield with a valid 40-char hex SHA"main"accepted as explicit opt-in for tracking-latest"main"accepted,Nonehandling (rejected)requirement_checkandrequirement-checkentries collapsed to one; no duplicate keyruff format,ruff checkclean; mypy clean on changed files (pre-existing optional-extra errors skipped per AGENTS.md)Notes
IntriniscsCatalogEntryclass-name typo preserved per issue scope (fix(intrinsics): pin catalogue entries to HF revision SHAs + deduplicate requirement_check entries (Epic #929 Phase 0) #1135 explicitly says do not fix; a follow-up rename with a deprecation alias is the right approach)._CORE_REPO/rag-intrinsics-libconstant removed — only the now-collapsedrequirement_checkentry used it, and it was already labelled "Temporary".CustomIntrinsicAdapterexamples (stembolts_intrinsic.py,101_example.py) do not constructIntriniscsCatalogEntrydirectly; only the internal monkey-patch path needed updating.granitelib-core-r1.0hostrequirement_checkadapters for every base model previously served fromrag-intrinsics-lib(Granite 3.2 / 3.3 / 4.0)? The GPU-gated formatter test still references the legacy repo. If a gap exists, downstreamobtain_loracalls will fail for older base models. Recommend filing as a separate verification issue.Closes #1135