Skip to content

feat: port context-attribution intrinsic from granite-common#661

Open
dennislwei wants to merge 6 commits intogenerative-computing:mainfrom
dennislwei:feat/context-attribution
Open

feat: port context-attribution intrinsic from granite-common#661
dennislwei wants to merge 6 commits intogenerative-computing:mainfrom
dennislwei:feat/context-attribution

Conversation

@dennislwei
Copy link

@dennislwei dennislwei commented Mar 16, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Ports the context-attribution intrinsic from the granite-common library into mellea. This follows up on granite-common-related PRs #571 and #609. Context attribution identifies which sentences in the context (including previous conversation turns and any documents) were most important to the LLM when generating each sentence in its response.

Changes:

  • Extends sentence boundary marking to previous conversation turns with shared index with documents
  • Extends DecodeSentences to previous conversation turns and multi-source decoding
  • Adds context-attribution catalog entry and updates repo to ibm-granite/granitelib-core-r1.0
  • Adds context-attribution test data and formatter tests

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

dennislwei and others added 5 commits March 13, 2026 17:32
…hared index

- Add `index` parameter to `mark_sentence_boundaries()` to allow callers to
  continue numbering across multiple calls; return the next available index
- Add `all_but_last_message` as a valid `sentence_boundaries` key
- Extend `_mark_sentence_boundaries()` to tag prior conversation turns when
  `all_but_last_message` is configured, using a shared running index with
  documents so that each context sentence has a globally unique tag

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>
… decoding

- Accept `source: str | list[str]` to allow a single DecodeSentences rule to
  decode sentences from multiple locations in one pass
- Add `all_but_last_message` as a valid source, decoding prior conversation
  turns with a running sentence index shared across all sources
- Add optional `message_index` output field that records which conversation
  turn each attributed sentence came from

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>
- Update _CORE_REPO to "ibm-granite/granitelib-core-r1.0"
- Add context-attribution intrinsic pointing to _CORE_REPO

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>
- Add input/test_canned_input/test_canned_output/expected_result JSON test data files
- Add YamlJsonCombo entry for context-attribution pointing to
  ibm-granite/granitelib-core-r1.0
- Exclude context-attribution from Ollama inference tests via _NO_OLLAMA_ADAPTER
  since an Ollama LoRA adapter is not yet available on the HF Hub

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>
The model consistently produces {"r": 1, "c": [2, 0, 1, 19, 3]} with the
mellea codebase, yielding 7 attribution records rather than the 12 produced
on the granite-common side. Update the expected output accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>
@dennislwei dennislwei requested a review from a team as a code owner March 16, 2026 21:56
@github-actions
Copy link
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Mar 16, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One comment about an unused file and you'll have to fix conflicts. I'm also assuming these two PRs will cause more conflicts (https://github.com/generative-computing/mellea/pull/551/changes and #666).

I think there is some test duplication that we will need to sort out now that granite common is in mellea, but for now, can also please add a test in the same format as the other intrinsic tests? https://github.com/generative-computing/mellea/pull/551/changes#diff-ce0d7b5a78b679a6137c7e41d361d6e1ff3bcb5cccbbb1b5221f666cc3b81619R1

Additionally, if you want others to use this intrinsic, I would suggest adding a snippet to the README and an example as well. Similar to what's done here: https://github.com/generative-computing/mellea/pull/551/changes#diff-1cf997e9d5dc1b90157611dba815452dcea041114ecce99f976af70654aba299L31

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just future proofing since ollama tests are skipped for this intrinsic currently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Should I keep this file or remove it for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to keep it. Just wanted to confirm.

@dennislwei
Copy link
Author

Thank you @jakelorocco for reviewing so quickly.

Regarding the conflict in mellea/backends/adapters/catalog.py, I believe that _CORE_R1_REPO should eventually be renamed as just _CORE_REPO, i.e., _CORE_REPO = "ibm-granite/granitelib-core-r1.0". Do you know if anything is still using _CORE_REPO = "ibm-granite/rag-intrinsics-lib" (outdated version of _RAG_REPO)? If not, then I suggest that we rename _CORE_R1_REPO -> _CORE_REPO now.

I will fix the other conflicts.

I think there is some test duplication that we will need to sort out now that granite common is in mellea, but for now, can also please add a test in the same format as the other intrinsic tests? https://github.com/generative-computing/mellea/pull/551/changes#diff-ce0d7b5a78b679a6137c7e41d361d6e1ff3bcb5cccbbb1b5221f666cc3b81619R1

Additionally, if you want others to use this intrinsic, I would suggest adding a snippet to the README and an example as well. Similar to what's done here: https://github.com/generative-computing/mellea/pull/551/changes#diff-1cf997e9d5dc1b90157611dba815452dcea041114ecce99f976af70654aba299L31

An intrinsic function and associated test, example, and README snippet will be coming very soon in a separate PR. Perhaps I should have explained ahead of time, but I wanted to have two separate PRs for more clarity, one for the granite-common part, one for the intrinsic part. Let me know if you want it to be one larger PR instead.

@jakelorocco
Copy link
Contributor

Regarding the conflict in mellea/backends/adapters/catalog.py, I believe that _CORE_R1_REPO should eventually be renamed as just _CORE_REPO, i.e., _CORE_REPO = "ibm-granite/granitelib-core-r1.0". Do you know if anything is still using _CORE_REPO = "ibm-granite/rag-intrinsics-lib" (outdated version of _RAG_REPO)? If not, then I suggest that we rename _CORE_R1_REPO -> _CORE_REPO now.

I don't. I haven't looked through the code to see what is referencing which repo. To be honest, the intrinsics repos are somewhat confusing to me and I'm not sure what the expectations are with them all.

An intrinsic function and associated test, example, and README snippet will be coming very soon in a separate PR. Perhaps I should have explained ahead of time, but I wanted to have two separate PRs for more clarity, one for the granite-common part, one for the intrinsic part. Let me know if you want it to be one larger PR instead.

Works for me. Thank you for the clarification.

Conflict resolution in catalog.py: keep _CORE_REPO = "ibm-granite/rag-intrinsics-lib"
for now pending full migration away from it; add _CORE_R1_REPO =
"ibm-granite/granitelib-core-r1.0" for core intrinsics (context-attribution,
requirement-check, uncertainty).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Dennis Wei <dwei@us.ibm.com>
@dennislwei dennislwei requested a review from a team as a code owner March 17, 2026 21:24
@dennislwei
Copy link
Author

Conflicts have been fixed.

Regarding the conflict in mellea/backends/adapters/catalog.py, I believe that _CORE_R1_REPO should eventually be renamed as just _CORE_REPO, i.e., _CORE_REPO = "ibm-granite/granitelib-core-r1.0". Do you know if anything is still using _CORE_REPO = "ibm-granite/rag-intrinsics-lib" (outdated version of _RAG_REPO)? If not, then I suggest that we rename _CORE_R1_REPO -> _CORE_REPO now.

Claude Code found a few places where _CORE_REPO = "ibm-granite/rag-intrinsics-lib" is still used. While I think these are legacy uses, I will defer on this and use _CORE_R1_REPO = "ibm-granite/granitelib-core-r1.0" for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants