Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
690e1ee
test: remove hf .to(device)
jakelorocco Oct 25, 2025
cdca73a
fix: remove alora model field from hf
jakelorocco Oct 25, 2025
3a1c1e8
test: init commit; need to refactor a bunch
jakelorocco Oct 26, 2025
f61efd5
test: basic doc support for openai
jakelorocco Oct 27, 2025
7377410
fix: move intrinsic config setup to adapters
jakelorocco Oct 28, 2025
2d627ad
test: push most recent changes
jakelorocco Oct 28, 2025
610ff74
test: split adapter add and load functions
jakelorocco Oct 29, 2025
e93b629
fix: start cleaning up some code
jakelorocco Oct 31, 2025
123eb30
Merge branch 'main' into jal/intrinsic-updates
jakelorocco Oct 31, 2025
d9394ac
Merge pull request #1 from generative-computing/jal/intrinsic-updates
frreiss Oct 31, 2025
0286003
feat: update vllm serve script for granite common intrinsic downloads
jakelorocco Nov 3, 2025
1229206
feat: add intrinsics; remove bespoke aloras
jakelorocco Nov 3, 2025
78364ad
feat: add docs
jakelorocco Nov 3, 2025
17b7f8e
feat: add specific intrinsic and adapter tests
jakelorocco Nov 3, 2025
51e9c50
fix: add note that only gcommon adapters/intrinsics are supported
jakelorocco Nov 3, 2025
ff91cbd
fix: add docstrings for types
jakelorocco Nov 3, 2025
9145f56
fix: update model in openai vllm test
jakelorocco Nov 3, 2025
4c61578
Merge branch 'main' into jal/intrinsic-updates
jakelorocco Nov 3, 2025
de75055
Merge pull request #2 from generative-computing/jal/intrinsic-updates
frreiss Nov 4, 2025
929c201
Add entry point for answerability
frreiss Nov 6, 2025
f5c1311
Rewrite, citations, and context relevance
frreiss Nov 8, 2025
bb10bd4
Add additional intrinsics
frreiss Nov 10, 2025
51cb257
Merge branch 'main' into intrinsics
frreiss Nov 11, 2025
d9ac085
Fix broken test after merge
frreiss Nov 11, 2025
da6de1e
Make linter happy and allow GPU usage with HF backend
frreiss Nov 11, 2025
765c91f
Fix stuff that was missed during manual merge
frreiss Nov 11, 2025
0582f24
More manual merge issues
frreiss Nov 11, 2025
869459e
More manual merge issues
frreiss Nov 11, 2025
d5732c3
Fix broken test
frreiss Nov 11, 2025
ac9ee44
Adjust type hints
frreiss Nov 11, 2025
c80109f
Merge branch 'main' into intrinsics
frreiss Nov 12, 2025
6c29eae
Add qualitative mark to tests
frreiss Nov 11, 2025
9ce2e32
Change import of obtain_lora()
frreiss Nov 12, 2025
c33d440
Update import
frreiss Nov 12, 2025
e6d8c1a
Merge branch 'main' into intrinsics
frreiss Nov 12, 2025
2c37d62
Update code after breaking API changes
frreiss Nov 12, 2025
0bb2087
feat: small changes to intrinsics; along with fixes to docs / tests
jakelorocco Nov 13, 2025
dd40832
Merge pull request #3 from jakelorocco/intrinsics
frreiss Nov 14, 2025
ff8ec87
Merge branch 'generative-computing:main' into intrinsics
frreiss Nov 14, 2025
59bf5a0
Fix errors introduced by merge
frreiss Nov 14, 2025
9ba28d1
Fix minor merge-induced issue
frreiss Nov 14, 2025
427ac23
Remove repo_id parameter
frreiss Nov 14, 2025
2a393b2
Upgrade to latest peft and granite-common
frreiss Nov 15, 2025
4a89e8a
Upgrade dependencies again
frreiss Nov 15, 2025
628a7d0
Fix broken test
frreiss Nov 15, 2025
42268d7
fix: make huggingface adapter test qualitative
jakelorocco Nov 15, 2025
25196e1
fix: issues with tests after refactor
jakelorocco Nov 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions docs/dev/intrinsics_and_adapters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Intrinsics and Adapters
Note: Mellea currently only supports GraniteCommonAdapters and Intrinsics.

## Basics
In Mellea, intrinsics are a type of Component that signals one or more of the following to a backend:
- a special adapter must be used for generation
- the input/output for generation must be transformed in a particular way
- the model options must be modified in a particular way

These changes only happen when the intrinsic is the "action" of the request. Intrinsics should usually not be used as an item in the context of generation (in fact, by default, Intrinsics have no string representation).

These changes are specified by the Adapter that corresponds to a given Intrinsic. Matching happens based on the adapter name and type.

## Parts of an Intrinsic
Intrinsics specify:
- an adapter name (ie requirement_check)
- types of adapters suitable to be used (ie alora)
- any kwargs necessary (ie a requirement like "make sure the last user message is...")

## Parts of an Adapter
Adapters specify:
- compatible backends
- adapter type
- functions for getting a path to load them

## Using Intrinsics
Mellea Intrinsics currently utilize the granite-common package for loading adapters and formatting input/outputs (https://github.com/ibm-granite/granite-common). This means Mellea only allows intrinsics/adapters that follow this pattern.

## Needed Future Work
### Custom Adapters / Intrinsics
Mellea should support custom intrinsic / adapter implementations. To do this:
- make backend `_generate_from_intrinsic` functions generic and utilize only common adapter functions
- adapters must specify a transformation function that encapsulates the input/output modifications necessary for their generation requests

### Concurrency Checks
Some backends (currently only LocalHFBackend) that allow adapters to be loaded, cannot independently utilize these adapters without impacting other generation requests.

These backends should support a generation lock that ensures requests are only performed when the correct set of adapters (or no adapters) are active.
9 changes: 5 additions & 4 deletions docs/dev/requirement_aLoRA_rerouting.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ The actual rule is slightly more complicated.

## The Actual Rule

If a `Requirement` is validated using a backend that could either use a `constraint` aLoRA or perform an LLMaJ prompt on the underlying model, then the aLoRA is used for validation, even if the `backend.generate_from_context` method is called instead of the `alora.generate_from_strings` method.
If a `Requirement` is validated using a backend that could either use a `requirement_check` aLoRA or perform an LLMaJ prompt on the underlying model, then the aLoRA is used for validation, even if the `backend.generate_from_context` method is called instead of the `backend._generate_from_intrinsic` method.

There are three exceptions to this rule:
1. `Backend.default_to_constraint_checking_alora` is set to `False` (this parameter defaults to `True`).
2. The `Requirement` has a more specific subtype that indicates a more specific intent (`LLMaJRequirement`).
3. The `ALoRA` requirement checker throws an exception.

There is an exception (or disambiguation) to the first exception: If the user provides an `ALoRARequirement`, then the `backend.generate_from_context` call is rerouted to the constraint checking LoRA, regardless of the value of `deault_to_constraint_checking_alora`.
There is an exception (or disambiguation) to the first exception: If the user provides an `ALoRARequirement`, then the `backend.generate_from_context` call is rerouted to the constraint checking LoRA, regardless of the value of `default_to_constraint_checking_alora`.

## Decision Rationale

Expand All @@ -33,12 +33,13 @@ Suppose that the user creates a backend and then adds a generic constraint check

```python
from mellea import start_session
from mellea.backends.aloras.granite_aloras import add_granite_aloras
from mellea.stdlib.requirement import Requirement

m = start_session(
"huggingface.LocalHFBackend:ibm-granite/granite-3.2-8b-instruct")
add_granite_aloras(m) # This will load the Constraint checint aLoRA.

# By default, the AloraRequirement uses a GraniteCommonAdapter with "requirement_check".
m.backend.add_adapter(GraniteCommonAdapter("ibm-granite/rag-intrinsics-lib", "requirement_check", base_model_name="granite-3.2-8b-instruct"))

m.instruct(
"Corporate wants you to find the difference between these two strings:\n\naaa\naba")
Expand Down
54 changes: 54 additions & 0 deletions docs/examples/intrinsics/intrinsics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
from mellea.backends.huggingface import LocalHFBackend
from mellea.backends.openai import OpenAIBackend, _ServerType
from mellea.backends.adapters.adapter import AdapterType, GraniteCommonAdapter
from mellea.stdlib.base import ChatContext, ModelOutputThunk
from mellea.stdlib.chat import Message
import mellea.stdlib.functional as mfuncs
from mellea.stdlib.intrinsics.intrinsic import Intrinsic

# This is an example for how you would directly use intrinsics. See `mellea/stdlib/intrinsics/rag.py`
# for helper functions.

# Create the backend. Example for a VLLM Server. Commented out in favor of the hugging face code for now.
# # Assumes a locally running VLLM server.
# backend = OpenAIBackend(
# model_id="ibm-granite/granite-3.3-8b-instruct",
# base_url="http://0.0.0.0:8000/v1",
# api_key="EMPTY",
# )

# # If using a remote VLLM server, utilize the `test/backends/test_openai_vllm/serve.sh`
# # script with `export VLLM_DOWNLOAD_RAG_INTRINSICS=True`. This will download the granite_common
# # adapters on the server.
# backend._server_type = _ServerType.REMOTE_VLLM

backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-8b-instruct")

# Create the Adapter. GraniteCommonAdapter's default to ALORAs.
req_adapter = GraniteCommonAdapter(
"requirement_check", base_model_name=backend.base_model_name
)

# Add the adapter to the backend.
backend.add_adapter(req_adapter)

ctx = ChatContext()
ctx = ctx.add(Message("user", "Hi, can you help me?"))
ctx = ctx.add(Message("assistant", "Hello; yes! What can I help with?"))

# Generate from an intrinsic with the same name as the adapter. By default, it will look for
# ALORA and then LORA adapters.
out, new_ctx = mfuncs.act(
Intrinsic(
"requirement_check",
intrinsic_kwargs={"requirement": "The assistant is helpful."},
),
ctx,
backend,
)

# Print the output. The requirement_check adapter has a specific output format:
print(out) # {"requirement_likelihood": 1.0}

# The AloraRequirement uses this adapter. It automatically parses that output
# when validating the output.
25 changes: 0 additions & 25 deletions mellea/backends/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from collections.abc import Callable
from typing import Any, Literal

from mellea.backends.aloras import Alora
from mellea.backends.formatter import Formatter
from mellea.backends.tools import parse_tools
from mellea.helpers.fancy_logger import FancyLogger
Expand Down Expand Up @@ -57,30 +56,6 @@ def to_chat(
return ctx_as_conversation


def use_alora(
action: Component | CBlock,
alora: Alora | None,
default_to_constraint_checking_alora: bool,
) -> bool:
"""Returns True when the condition for using alora is met.

See `docs/dev/requirement_aLoRA_rerouting.md` for an explanation of the following code block.
"""
if issubclass(type(action), Requirement):
# The general rule is that we reroute to the alora if it exists.
reroute_to_alora = alora is not None
# However, there are some exceptions:
if not default_to_constraint_checking_alora:
reroute_to_alora = False
if issubclass(type(action), LLMaJRequirement):
reroute_to_alora = False
if issubclass(type(action), ALoraRequirement):
reroute_to_alora = True
return reroute_to_alora
else:
return False


def to_tool_calls(
tools: dict[str, Callable], decoded_result: str
) -> dict[str, ModelToolCall] | None:
Expand Down
Loading