generative-computing · jakelorocco · Nov 17, 2025 · Oct 25, 2025 · Oct 25, 2025 · Oct 26, 2025
diff --git a/docs/dev/intrinsics_and_adapters.md b/docs/dev/intrinsics_and_adapters.md
@@ -0,0 +1,38 @@
+# Intrinsics and Adapters
+Note: Mellea currently only supports GraniteCommonAdapters and Intrinsics.
+
+## Basics
+In Mellea, intrinsics are a type of Component that signals one or more of the following to a backend:
+- a special adapter must be used for generation
+- the input/output for generation must be transformed in a particular way
+- the model options must be modified in a particular way
+
+These changes only happen when the intrinsic is the "action" of the request. Intrinsics should usually not be used as an item in the context of generation (in fact, by default, Intrinsics have no string representation).
+
+These changes are specified by the Adapter that corresponds to a given Intrinsic. Matching happens based on the adapter name and type.
+
+## Parts of an Intrinsic
+Intrinsics specify:
+- an adapter name (ie requirement_check)
+- types of adapters suitable to be used (ie alora)
+- any kwargs necessary (ie a requirement like "make sure the last user message is...")
+
+## Parts of an Adapter
+Adapters specify:
+- compatible backends
+- adapter type
+- functions for getting a path to load them
+
+## Using Intrinsics
+Mellea Intrinsics currently utilize the granite-common package for loading adapters and formatting input/outputs (https://github.com/ibm-granite/granite-common). This means Mellea only allows intrinsics/adapters that follow this pattern.
+
+## Needed Future Work
+### Custom Adapters / Intrinsics
+Mellea should support custom intrinsic / adapter implementations. To do this:
+- make backend `_generate_from_intrinsic` functions generic and utilize only common adapter functions
+- adapters must specify a transformation function that encapsulates the input/output modifications necessary for their generation requests
+
+### Concurrency Checks
+Some backends (currently only LocalHFBackend) that allow adapters to be loaded, cannot independently utilize these adapters without impacting other generation requests.
+
+These backends should support a generation lock that ensures requests are only performed when the correct set of adapters (or no adapters) are active.
diff --git a/docs/dev/requirement_aLoRA_rerouting.md b/docs/dev/requirement_aLoRA_rerouting.md
@@ -14,14 +14,14 @@ The actual rule is slightly more complicated.
 
 ## The Actual Rule
 
-If a `Requirement` is validated using a backend that could either use a `constraint` aLoRA or perform an LLMaJ prompt on the underlying model, then the aLoRA is used for validation, even if the `backend.generate_from_context` method is called instead of the `alora.generate_from_strings` method.
+If a `Requirement` is validated using a backend that could either use a `requirement_check` aLoRA or perform an LLMaJ prompt on the underlying model, then the aLoRA is used for validation, even if the `backend.generate_from_context` method is called instead of the `backend._generate_from_intrinsic` method.
 
 There are three exceptions to this rule:
 1. `Backend.default_to_constraint_checking_alora` is set to `False` (this parameter defaults to `True`).
 2. The `Requirement` has a more specific subtype that indicates a more specific intent (`LLMaJRequirement`). 
 3. The `ALoRA` requirement checker throws an exception.
 
-There is an exception (or disambiguation) to the first exception: If the user provides an `ALoRARequirement`, then the `backend.generate_from_context` call is rerouted to the constraint checking LoRA, regardless of the value of `deault_to_constraint_checking_alora`.
+There is an exception (or disambiguation) to the first exception: If the user provides an `ALoRARequirement`, then the `backend.generate_from_context` call is rerouted to the constraint checking LoRA, regardless of the value of `default_to_constraint_checking_alora`.
 
 ## Decision Rationale
 
@@ -33,12 +33,13 @@ Suppose that the user creates a backend and then adds a generic constraint check
 
 ```python
 from mellea import start_session
-from mellea.backends.aloras.granite_aloras import add_granite_aloras
 from mellea.stdlib.requirement import Requirement
 
 m = start_session(
     "huggingface.LocalHFBackend:ibm-granite/granite-3.2-8b-instruct")
-add_granite_aloras(m)  # This will load the Constraint checint aLoRA.
+
+# By default, the AloraRequirement uses a GraniteCommonAdapter with "requirement_check".
+m.backend.add_adapter(GraniteCommonAdapter("ibm-granite/rag-intrinsics-lib", "requirement_check", base_model_name="granite-3.2-8b-instruct"))
 
 m.instruct(
     "Corporate wants you to find the difference between these two strings:\n\naaa\naba")

diff --git a/docs/examples/intrinsics/intrinsics.py b/docs/examples/intrinsics/intrinsics.py
@@ -0,0 +1,54 @@
+from mellea.backends.huggingface import LocalHFBackend
+from mellea.backends.openai import OpenAIBackend, _ServerType
+from mellea.backends.adapters.adapter import AdapterType, GraniteCommonAdapter
+from mellea.stdlib.base import ChatContext, ModelOutputThunk
+from mellea.stdlib.chat import Message
+import mellea.stdlib.functional as mfuncs
+from mellea.stdlib.intrinsics.intrinsic import Intrinsic
+
+# This is an example for how you would directly use intrinsics. See `mellea/stdlib/intrinsics/rag.py`
+# for helper functions.
+
+# Create the backend. Example for a VLLM Server. Commented out in favor of the hugging face code for now.
+# # Assumes a locally running VLLM server.
+# backend = OpenAIBackend(
+#     model_id="ibm-granite/granite-3.3-8b-instruct",
+#     base_url="http://0.0.0.0:8000/v1",
+#     api_key="EMPTY",
+# )
+
+# # If using a remote VLLM server, utilize the `test/backends/test_openai_vllm/serve.sh`
+# # script with `export VLLM_DOWNLOAD_RAG_INTRINSICS=True`. This will download the granite_common
+# # adapters on the server.
+# backend._server_type = _ServerType.REMOTE_VLLM
+
+backend = LocalHFBackend(model_id="ibm-granite/granite-3.3-8b-instruct")
+
+# Create the Adapter. GraniteCommonAdapter's default to ALORAs.
+req_adapter = GraniteCommonAdapter(
+    "requirement_check", base_model_name=backend.base_model_name
+)
+
+# Add the adapter to the backend.
+backend.add_adapter(req_adapter)
+
+ctx = ChatContext()
+ctx = ctx.add(Message("user", "Hi, can you help me?"))
+ctx = ctx.add(Message("assistant", "Hello; yes! What can I help with?"))
+
+# Generate from an intrinsic with the same name as the adapter. By default, it will look for
+# ALORA and then LORA adapters.
+out, new_ctx = mfuncs.act(
+    Intrinsic(
+        "requirement_check",
+        intrinsic_kwargs={"requirement": "The assistant is helpful."},
+    ),
+    ctx,
+    backend,
+)
+
+# Print the output. The requirement_check adapter has a specific output format:
+print(out)  # {"requirement_likelihood": 1.0}
+
+# The AloraRequirement uses this adapter. It automatically parses that output
+# when validating the output.
diff --git a/mellea/backends/_utils.py b/mellea/backends/_utils.py
@@ -4,7 +4,6 @@
 from collections.abc import Callable
 from typing import Any, Literal
 
-from mellea.backends.aloras import Alora
 from mellea.backends.formatter import Formatter
 from mellea.backends.tools import parse_tools
 from mellea.helpers.fancy_logger import FancyLogger
@@ -57,30 +56,6 @@ def to_chat(
     return ctx_as_conversation
 
 
-def use_alora(
-    action: Component | CBlock,
-    alora: Alora | None,
-    default_to_constraint_checking_alora: bool,
-) -> bool:
-    """Returns True when the condition for using alora is met.
-
-    See `docs/dev/requirement_aLoRA_rerouting.md` for an explanation of the following code block.
-    """
-    if issubclass(type(action), Requirement):
-        # The general rule is that we reroute to the alora if it exists.
-        reroute_to_alora = alora is not None
-        # However, there are some exceptions:
-        if not default_to_constraint_checking_alora:
-            reroute_to_alora = False
-        if issubclass(type(action), LLMaJRequirement):
-            reroute_to_alora = False
-        if issubclass(type(action), ALoraRequirement):
-            reroute_to_alora = True
-        return reroute_to_alora
-    else:
-        return False
-
-
 def to_tool_calls(
     tools: dict[str, Callable], decoded_result: str
 ) -> dict[str, ModelToolCall] | None: