Skip to content

fix(intrinsics): pin catalogue entries to HF revision SHAs + deduplicate requirement_check (#1135)#1157

Open
planetf1 wants to merge 5 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1135
Open

fix(intrinsics): pin catalogue entries to HF revision SHAs + deduplicate requirement_check (#1135)#1157
planetf1 wants to merge 5 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1135

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 26, 2026

Summary

Implements issue #1135 (Epic #929 Phase 0, Wave 1).

Why

Mellea ships intrinsics by downloading LoRA / aLoRA adapters from HuggingFace at runtime. The catalogue (mellea/backends/adapters/catalog.py) records which repository each intrinsic lives in — but until now, not which version. When upstream pushes new weights, every Mellea install picks them up silently.

PR #1008 was the worked example: the requirement-check adapter's output schema flipped upstream and requirement_check_to_bool started returning False for every call until someone noticed.

This PR closes that gap. Each catalogue entry now carries a 40-character HuggingFace commit SHA, and a Pydantic validator enforces the format at construction time — so accidental drift (typos, branch names, partial SHAs) fails at module load rather than at first download. Callers who genuinely want to track-latest opt in explicitly with revision="main".

Where this fits in Epic #929

Epic #929 is the broader adapter-lifecycle redesign. This PR is Phase 0, Wave 1 — the metadata-only foundation. PR #1158 (issue #1134) is the parallel Phase 0 / Wave 1 work, adding the new type scaffolding (Adapter, Identity, IOContract, WeightsBinding); the two PRs are independent and can land in either order. The new revision field here is not yet threaded through to the actual obtain_lora / obtain_io_yaml calls; those still resolve against main. Wiring the SHA through to the download path is Phase 2.2 in the epic and is explicitly out of scope per the issue. # TODO(phase-2.2) markers at both call sites flag the gap. See #929 for the full phase plan.

While in the catalogue, the PR also takes the opportunity to perform a small bit of cleanup the epic depends on: collapsing the duplicate requirement_check / requirement-check entries into a single canonical key. The two had drifted to different repositories and confused downstream callers; one canonical requirement_check entry on _CORE_R1_REPO (granitelib-core-r1.0) replaces them. The downstream caller in core.py was updated to match.

What changed

  • Adds a required revision field to IntriniscsCatalogEntry — a 40-char lowercase hex commit SHA or the literal "main". A validate_revision() helper enforces the constraint; a Pydantic field_validator applies it at model construction time. validate_revision is exported from mellea.backends.adapters for use by CustomIntrinsicAdapter authors.
  • Pins all 13 catalogue entries to the upstream HuggingFace HEAD SHAs as of 2026-05-26. Reviewer: re-fetch upstream HEAD before merging (see issue warning).
  • Collapses the requirement_check / requirement-check duplicate: removes the hyphenated entry and the temporary _CORE_REPO (rag-intrinsics-lib) constant; the canonical requirement_check (underscore) entry now points to _CORE_R1_REPO (granitelib-core-r1.0).
  • Updates mellea/stdlib/components/intrinsic/core.py:57 (requirement_check()) to call the renamed canonical key, so the public stdlib helper continues to work after the collapse.
  • Updates CustomIntrinsicAdapter to pass revision="main" when injecting user-defined entries into the catalogue (custom adapters track latest by default).

Before / After

Before: IntriniscsCatalogEntry(name="answerability", repo_id=_RAG_REPO) — no revision pinning, silently tracks whatever upstream pushes.

After: IntriniscsCatalogEntry(name="answerability", repo_id=_RAG_REPO, revision=_RAG_SHA) — locked to a specific commit; revision="main" is the explicit opt-in for tracking-latest.

Testing

uv run pytest test/backends/test_adapters/test_catalog_revision.py -v

9 tests added covering:

  • test_catalog_entries_have_revision — every entry passes validate_revision()
  • test_revision_validation_rejects_malformed — short, long, non-hex, uppercase, HEAD, latest, empty
  • test_revision_validation_accepts_valid_sha
  • test_revision_validation_accepts_main_literal
  • test_revision_field_rejects_malformed_via_pydantic
  • test_revision_field_rejects_none_via_pydantic
  • test_revision_round_trip
  • test_revision_round_trip_via_fetch
  • test_no_duplicate_requirement_check_entry

Full non-qualitative suite (388 tests across test/ excluding test/cli/ which has unrelated optional-extras collection errors): all pass.

Acceptance criteria checklist

  • All 13 catalogue entries have a revision field with a valid 40-char hex SHA
  • Revision validation rejects malformed values with a clear error
  • "main" accepted as explicit opt-in for tracking-latest
  • Tests cover: valid SHA accepted, malformed SHA rejected, "main" accepted, None handling (rejected)
  • requirement_check and requirement-check entries collapsed to one; no duplicate key
  • Existing tests pass; helper functions unchanged (new field is metadata-only)
  • ruff format, ruff check clean; mypy clean on changed files (pre-existing optional-extra errors skipped per AGENTS.md)

Notes

  • IntriniscsCatalogEntry class-name typo preserved per issue scope (fix(intrinsics): pin catalogue entries to HF revision SHAs + deduplicate requirement_check entries (Epic #929 Phase 0) #1135 explicitly says do not fix; a follow-up rename with a deprecation alias is the right approach).
  • _CORE_REPO / rag-intrinsics-lib constant removed — only the now-collapsed requirement_check entry used it, and it was already labelled "Temporary".
  • The CustomIntrinsicAdapter examples (stembolts_intrinsic.py, 101_example.py) do not construct IntriniscsCatalogEntry directly; only the internal monkey-patch path needed updating.
  • A follow-up question worth tracking: does granitelib-core-r1.0 host requirement_check adapters for every base model previously served from rag-intrinsics-lib (Granite 3.2 / 3.3 / 4.0)? The GPU-gated formatter test still references the legacy repo. If a gap exists, downstream obtain_lora calls will fail for older base models. Recommend filing as a separate verification issue.

Closes #1135

@github-actions github-actions Bot added the bug Something isn't working label May 26, 2026
@planetf1 planetf1 marked this pull request as ready for review May 26, 2026 17:15
@planetf1 planetf1 requested a review from a team as a code owner May 26, 2026 17:15
@planetf1 planetf1 requested review from ajbozarth and jakelorocco May 26, 2026 17:15
planetf1 added 3 commits May 26, 2026 18:18
…ate requirement_check

- Add `revision` field to `IntriniscsCatalogEntry` (required; 40-char
  lowercase hex SHA or literal "main")
- Add `validate_revision()` validation function alongside the type
- Populate all 13 catalogue entries with upstream HEAD SHAs pinned
  2026-05-26 (re-fetch before merge per issue guidance)
- Collapse `requirement_check` / `requirement-check` duplicate to a
  single canonical `requirement_check` entry backed by `_CORE_R1_REPO`;
  remove deprecated `_CORE_REPO` (`rag-intrinsics-lib`) constant
- Update `CustomIntrinsicAdapter` to pass `revision="main"` when
  injecting user-defined entries into the catalogue
- Add `test/backends/test_adapters/test_catalog_revision.py` covering
  all acceptance criteria from issue generative-computing#1135

Closes generative-computing#1135

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- core.py:57 — call `call_intrinsic("requirement_check", …)` (underscore)
  instead of the deleted hyphenated catalogue key; previously raised
  `ValueError: Unknown intrinsic name 'requirement-check'` at runtime
- catalog.py — switch `_REVISION_HEX_RE.match` to `re.fullmatch` and drop
  redundant `^…$` anchors (idiomatic Python ≥3.4)
- catalog.py — annotate each pinned SHA with `# main @ 2026-05-26` so
  future bumps are auditable without git archaeology
- test_catalog_revision.py — iterate via the public
  `known_intrinsic_names()` / `fetch_intrinsic_metadata()` API instead of
  the private `_INTRINSICS_CATALOG_ENTRIES` symbol

Verified: 10 adapter tests + 277-test fast suite green.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…#1135

- Export `validate_revision` from `mellea.backends.adapters` so users
  authoring `CustomIntrinsicAdapter` subclasses can validate their own
  revision strings with the same contract as the catalogue
- Drop the post-merge-stale "re-fetch upstream HEAD before merging"
  reminder; per-SHA `# main @ 2026-05-26` annotations remain
- Add `# TODO(phase-2.2)` markers at the two `obtain_io_yaml` /
  `obtain_lora` call sites so the inert-revision gap is visible
- Test: replace inline format check with a `validate_revision()` call
  so the assertion can't drift from the validator's contract
- Test: make `test_revision_round_trip_via_fetch` tolerant of `"main"`
  in case `answerability` is ever flipped to track-latest
- Test: drop the `if __name__ == "__main__":` block (suite runs via
  `uv run pytest`)

Tests: 10 adapter tests pass; ruff clean.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 requested review from nrfulton and psschwei May 26, 2026 17:21
Upstream test_catalog.py constructs `IntriniscsCatalogEntry` directly
without `revision`, which now fails after the field was made required.
Adds `revision="main"` so the test re-passes against the new contract.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the worktree-issue-1135 branch from 83c131c to 71582ba Compare May 26, 2026 17:27
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly skeptical that this PR pinning version numbers should happen first. After seeing it, it seems to stick us into the catalog model quite firmly and limit our ability to add custom adapters (with the different function / download path) as mentioned in the proposal.

Additionally, we may want to introduce version warnings. Currently, we check if an adapter with the same name is currently added; but with explicit versioning, should we also compare that the version metadata matches?

I do think this is the correct direction though. I think the use case I'm most worried about is the following:

  1. Someone adds an adapter with the name "requirement-check" pointing to a custom adapter
  2. Our top-level wrapper function check_requirement does it's check to add the adapter
  3. We see that the adapter with the same name has already been added (since we aren't checking additional version metadata)
  4. Our wrapper fails because the provided custom adapter with the same name results in a different output structure

It might be enough in those cases to check the io.yaml declared output structure and not require a strict version match (or allow overriding the version check), etc...

-------------------- Edit --------------------

The more I think about this, the more I'm okay with version pinning these catalog entries. I think the other option is to actually pin on functionality as well (I started commenting in this one: #1158 (comment)). @psschwei said he will share a document about versioning so it might be best to wait until we get all that sorted out to merge this one and so that we can have the versioning conversation all in one spot.

Comment thread mellea/backends/adapters/catalog.py Outdated
Comment on lines +60 to +61
revision (str): HuggingFace commit SHA (40 lowercase hex chars) pinned
at catalogue-write time, or ``"main"`` to track the latest commit.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to limit this to these options. It looks like the huggingface_hub revision parameter accepts:

            An optional Git revision id which can be a branch name, a tag, or a
            commit hash.

Copy link
Copy Markdown
Member

@psschwei psschwei May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for accepting tags/branches

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other thing to take into account is that we will probably want to support a range of versions, similar to how one can do something like pytorch >= 1.2.3, <=2.0 for python dependencies (not something we can do today, but soon, so should make sure that our approach is eventually compatible with that)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as the line 20 thread — relaxing the validator to accept any non-empty string covers tags and branches, no good reason to be stricter than HF.

On the version-range point: a plain string field can carry >=1.2.3,<=2.0 syntax just as well as a SHA or a tag, so this doesn't close off future options. Whether we eventually want a structured revision_spec (or a separate field) for ranges is a question for the versioning discussion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same change as the line 20 thread — relaxed in the latest commit; tags and branch names accepted.

"""
result_json = call_intrinsic(
"requirement-check", context, backend, kwargs={"requirement": requirement}
"requirement_check", context, backend, kwargs={"requirement": requirement}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the tests to confirm that this works? My understanding is that this should fail. requirement-check is the new name for the adapter function. requirement_check was the name of a previous iteration that was only released for granite3.2 and 3.3. I am fine with removing support for those adapters / model versions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. Direct HF check at the pinned SHA: HfApi().list_repo_files('ibm-granite/granitelib-core-r1.0', revision=d0a2a96...) shows requirement-check/ exists for granite-4.0-micro and granite-4.1-3b/8b/30b (both lora and alora variants), and no requirement_check/ path at all. So the current catalogue entry would fail to resolve at download time for any granite-4 model.

Will rename the catalogue entry and the call here to requirement-check. On dropping granite-3.2/3.3 support: with revision pinning targeted at granite-4 SHAs, that support is effectively already gone — supporting older models would need their own catalogue entries pointing at older repo SHAs, which we're not adding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed against the pinned SHA: HfApi().list_repo_files('ibm-granite/granitelib-core-r1.0', revision=d0a2a9…) returns requirement-check/ for every supported base model and no underscored variant. Rename applied in the latest commit — catalogue entry, this lookup, and the test assertion all flip to the hyphen form.

@psschwei
Copy link
Copy Markdown
Member

  1. Someone adds an adapter with the name "requirement-check" pointing to a custom adapter

Do we want to allow people to override the canonical adapters? Or should we require them to use a distinct name? (I think I lean towards requiring unique names, but I could be easily nudged the other way too)

Comment thread mellea/backends/adapters/catalog.py Outdated
Comment on lines +107 to +110
name="context-attribution", repo_id=_CORE_R1_REPO, revision=_CORE_R1_SHA
),
IntriniscsCatalogEntry(
name="requirement_check", repo_id=_CORE_R1_REPO, revision=_CORE_R1_SHA
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistency here in terms of using dashes or underscores in names
I know python favors using underscores, but since the actual adapter names use dashes maybe it's better to use those? (I can see someone getting mildly annoyed if they copied a name from huggingface but it didn't work)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, claude flagged a couple of places where the hyphen version is still used:

  • mellea/backends/openai.py:488
  • mellea/backends/huggingface.py:400
  • mellea/stdlib/requirements/requirement.py:83
  • example docs/examples/intrinsics/intrinsics.py:31

Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underscores / dashes in the name field are required unless we add a distinct role field. The granite libraries team is not consistent in their naming conventions. (Also please address my above comment before making any changes to the underscore / dashed name.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with Jake: the name field has to mirror whatever granite-libs publishes, and they're internally inconsistent — some intrinsics use hyphens, some underscores. We can't normalise here without a separate role field, which is a #929 concern.

The catalogue entry at line 110 is the only outlier. Every call already uses requirement-checkopenai.py:488, huggingface.py:454, requirement.py:83, intrinsics.py:32. Renaming the entry to requirement-check aligns the codebase. The requirement_check reference at requirement.py:38 is the adapter's output JSON key (set by the model), not the adapter name, so that stays.

Verified Jake's hypothesis via HF: HfApi().list_repo_files('ibm-granite/granitelib-core-r1.0', revision=<pinned SHA>) shows requirement-check/ exists for all granite-4 variants (granite-4.0-micro, 4.1-3b/8b/30b, both lora and alora) and no requirement_check/ path at all. The current catalogue entry points at a non-existent folder.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename applied: catalogue entry is now requirement-check, with the matching update in mellea/stdlib/components/intrinsic/core.py (see the line-49 thread). Test assertion flipped. The output JSON key ("requirement_check") is the adapter's own schema and stays as-is.

# Mellea will update which repositories are linked as new ones come online. The original
# repos are on an older layout that will be changed.
_RAG_REPO = "ibm-granite/granitelib-rag-r1.0"
_CORE_REPO = "ibm-granite/rag-intrinsics-lib" # Temporary; used by requirement checker
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the repo variables predate this PR, but given the scope of the parent epic then this seems like the time to ask: would it be worth splitting this into separate variables, say for org, repo, and maybe host (?), and then combine into a final one? there could be some value in being able to access the individual parts without having to parse them out

(it's a minor thing, but if we want it figured it'll never be easier to do than now so...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — worth thinking through alongside the versioning discussion rather than deciding here. Two considerations to bring into that:

First, the HF API takes repo_id as a single string (hf_hub_download(repo_id="org/repo", ...)). Any structured split has to be recombined at every call site, so it's only worth the extra work if the structured form gives us something we actually need.

Second, the obvious thing it could give us is access to the version part of the name (-r1.0) — and how we want to handle versions is exactly what Paul's note will settle.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should confirm with the granite-libraries team before we utilize that -r1.0 as an actual version. They've currently been updating the adapters in place rather than creating new hf repos.

Comment thread mellea/backends/adapters/catalog.py Outdated
Comment on lines +19 to +20
revision (str): Either a 40-character lowercase hex commit SHA or the
literal string ``"main"``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also see wanting to use branch or tag as possible revision values

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The current validator restricts to SHA-or-"main" because catalogue entries should be pinned for reproducibility — but that's a goal for catalogue entries, not a constraint on the field itself. Users supplying their own revision=... should be free to use any value HF accepts, including tags and branches.

I'll relax the validator to accept any non-empty string. If we want to keep the catalogue pinning convention enforced, we could add a build/CI step that re-resolves the named branch (main) against the pinned SHA and fails or warns when upstream has moved — so drift gets noticed at build time rather than being prevented by the field type. Worth doing? Or leave it to review discipline for now? Opinions welcome.

Same fix covers Jake's note at line 61, so I'll consolidate the discussion there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validator now relaxed in the latest commit. Accepts any non-empty string — branch, tag, or SHA. The SHA-pinning convention is enforced by review; a build-time check is on the table on the line 99 thread.

Comment thread mellea/backends/adapters/catalog.py Outdated
Comment on lines +60 to +61
revision (str): HuggingFace commit SHA (40 lowercase hex chars) pinned
at catalogue-write time, or ``"main"`` to track the latest commit.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other thing to take into account is that we will probably want to support a range of versions, similar to how one can do something like pytorch >= 1.2.3, <=2.0 for python dependencies (not something we can do today, but soon, so should make sure that our approach is eventually compatible with that)

Comment on lines +97 to +99
_RAG_SHA = "2f0b2c79c6731068625aca8045c2eb2e8912b353" # main @ 2026-05-26
_CORE_R1_SHA = "d0a2a96a4cd07e96f0fe7ca29a42bfe088299d43" # main @ 2026-05-26
_GUARDIAN_SHA = "773b254e98f993a605ec4b6259634906e0e64e8e" # main @ 2026-05-26
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR, but as part of the epic do you envision any kind of validator for the SHAs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — in scope for the epic, but the exact form depends on what the versioning discussion settles. This question sits in the same conversation as the related threads here (lines 20 and 59) — they're all really about how versioning is expressed end-to-end, so the answers will come together rather than separately.

HF gives us cheap options either way: a successful download against a pinned SHA inherently verifies (the SHA is the content hash), or HfApi().model_info(repo_id, revision=...) returns metadata without pulling weights.

@jakelorocco
Copy link
Copy Markdown
Contributor

  1. Someone adds an adapter with the name "requirement-check" pointing to a custom adapter

Do we want to allow people to override the canonical adapters? Or should we require them to use a distinct name? (I think I lean towards requiring unique names, but I could be easily nudged the other way too)

I think we must at least allow users to override the default in a way that allows them to auto-route to requirement-check functionality for AloraRequirements.

@planetf1
Copy link
Copy Markdown
Contributor Author

Thanks — happy to wait for Paul's versioning note. Is #1111 the right home for it, or is it being written up separately as an issue, PR, or discussion I should follow?

Worth noting this needs settling before #1135 can go further, and likely before several other phases of #929 too — so agreeing the versioning approach is now the priority. Not pushing back on the wait; just pointing it out so the epic plan reflects it.

Meanwhile I'll do the in-scope tidy-ups: rename the catalogue entry to requirement-check, relax the revision validator to accept any HF-valid string, and switch dedup to (name, repo_id, revision). That last one closes the silent-misroute case without committing to override semantics, which belong with #929.

@planetf1
Copy link
Copy Markdown
Contributor Author

Do we want to allow people to override the canonical adapters? Or should we require them to use a distinct name?

+1 to Jake — AloraRequirements already routes requirement checks by the canonical name requirement-check, and users can supply their own adapter to fulfil that role. If we forced unique names only, that override pattern would break.

The genuine open question is what happens when both a canonical and a custom adapter are loaded under the same name: which one runs, and how does the caller pick? That's an adapter-lifecycle / override-semantics question and belongs with #929 and Paul's versioning note.

Since this PR is waiting for the versioning discussion before it can merge anyway, the dedup tweak doesn't need settling in isolation. The minimal change I had in mind for the silent-misroute case Jake described was switching the dedup key from name-only to (name, repo_id, revision), so a canonical and a same-named custom adapter are recognised as distinct entries. Whether that's the right answer or whether Paul's note suggests something better can be picked up at the same time.

@psschwei
Copy link
Copy Markdown
Member

happy to wait for Paul's versioning note. Is #1111 the right home for it, or is it being written up separately as an issue, PR, or discussion I should follow?

it'll start as an internal doc, but eventually we'll put out a public version

@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented May 28, 2026

Where we are

Position: agree the versioning approach first (psschwei's note, see #1111), then settle the rest. A few in-scope tidy-ups don't need to wait — applied in the latest commit; detail on the relevant threads.

Applied

  • Validator relaxed to accept any non-empty HF revision — line 20 / line 61 threads.
  • requirement_checkrequirement-check rename — line 110 thread and core.py:49 thread.
  • Test suite updated to suit.

Deferred to the versioning discussion

  • Dedup key change to (name, repo_id, revision) — on closer look more invasive than my inline reply implied; needs Adapter to carry repo_id / revision and changes in three call paths. Belongs with Epic: Fix Intrinsic Adapter Lifecycle & Consistency in Mellea #929; walking that back here.
  • CI / build-time SHA check — line 99 thread.
  • repo_id org / library / version split — line 59 thread.
  • Override semantics for a canonical-vs-custom adapter — Jake's thread.

PR remains gated on the versioning agreement before merge.

…k entry

- validate_revision now accepts any non-empty string (branch, tag, or
  commit SHA), matching HuggingFace's revision contract. Catalogue
  entries continue to pin to commit SHAs by convention; that is
  enforced by review and (optionally) a build-time check rather than
  by the validator. Field description and class docstring updated.
- Catalogue entry renamed requirement_check -> requirement-check to
  match the actual folder layout in granitelib-core-r1.0; the lookup
  in mellea/stdlib/components/intrinsic/core.py is updated to suit.
  The "requirement_check" output JSON key (adapter's own schema) is
  unchanged.
- Tests updated: drop over-strict format checks (short / long /
  non-hex / upper / "HEAD" / "latest" are now valid HF revisions),
  keep the empty-string rejection, add a positive case for tags and
  branch names, and flip the duplicate-entry assertion.

Refs generative-computing#1135

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1
Copy link
Copy Markdown
Contributor Author

happy to wait for Paul's versioning note. Is #1111 the right home for it, or is it being written up separately as an issue, PR, or discussion I should follow?

it'll start as an internal doc, but eventually we'll put out a public version

This PR has a blocking dependency on it, so at least part needs to be agreed and documented. I'm not sure if you're planning something with broader scope

If we think this is a significant change with wide scope we may want to make an interim decision to unblock this epic - or put the whole lot on hold for longer (which risks some decay of the design as other changes go on)

@planetf1
Copy link
Copy Markdown
Contributor Author

Thinking about how to unblock this PR and others in the epic from more versioning discussion.

Most of the open points here are already settled in #1080: schema versioning was deliberately deferred, with two things in its place — pin the HF revision, and raise a mismatch error when the parser can't satisfy the declared output contract. #1111 is where we revisit the topic later, and it lists the triggers for doing so; none of them have happened. psschwei has raised a good point, but I don't think it needs to come before Phase 0.

The thought is to split the work into three layers:

  1. Metadata (this PR and the feat(intrinsics): introduce Adapter/Identity/IOContract/WeightsBinding scaffolding (Epic #929 Phase 0) #1158 scaffolding) — fields and types only, nothing changes at runtime.
  2. Mechanism (AdapterSchemaMismatchError, parser hook points) — new code paths, but nothing calls into them yet.
  3. Wiring (Phase 2.2: revision threaded through obtain_lora / obtain_io_yaml) — the only step that genuinely depends on the versioning conclusion.

Layers 1 and 2 can move now and can be reversed if we change our minds. Layer 3 is where the versioning decision shows up in behaviour, so it can wait without slowing the rest down.

Sensible defaults for the open threads, all matching today's behaviour and all reversible:

  • dedup key: name-only
  • repo_id: single string
  • override semantics: last-write-wins
  • no CI drift check yet
  • pinned-SHA-only at the catalogue level

One safeguard: a single guard test that asserts HF requests match the pre-Phase-0 baseline. Cheap protection against regressions while the versioning conversation continues separately.

If we're agreed on this, I'll move the conclusion into #929 and update any sibling issues affected by the sequencing.

@jakelorocco
Copy link
Copy Markdown
Contributor

I am fine with the Sensible defaults. We can postpone a larger versioning discussion as mentioned. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(intrinsics): pin catalogue entries to HF revision SHAs + deduplicate requirement_check entries (Epic #929 Phase 0)

3 participants