Skip to content

fix: evict Ollama models between test modules to prevent memory starvation#804

Merged
planetf1 merged 4 commits intogenerative-computing:mainfrom
planetf1:fix/example-ollama-eviction-798
Apr 9, 2026
Merged

fix: evict Ollama models between test modules to prevent memory starvation#804
planetf1 merged 4 commits intogenerative-computing:mainfrom
planetf1:fix/example-ollama-eviction-798

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Apr 9, 2026

Type of PR

  • Bug Fix

Description

Problem

When running the full test suite (uv run pytest), Ollama's default 5-minute
keep-alive means models from earlier tests stay resident in memory long after
their subprocess or test function exits. Heavyweight models accumulate and
starve later tests of headroom — the SOFAI graph-colouring example crashed
with a 500 because an 11 GB guardian model from an earlier example was still
resident.

test/conftest.py already managed this for the regular test suite via
--group-by-backend warm-up/eviction, but that flag is not on by default and
docs/examples/ had no equivalent at all.

Ollama models across the test suite

The examples use a wide variety of models that cannot be hardcoded:

Model ~Size Where
granite4:micro 2 GB test (conftest warmup, payloads, intrinsics)
granite4:micro-h 2 GB test (telemetry, components, genslot), examples (m_serve)
granite3.2-vision 3 GB test (conftest warmup, vision), examples (vision)
llama3.2:1b 1.3 GB test (SOFAI sampling, graph colouring)
granite4:latest 5 GB examples (melp ×5)
granite3.3:8b 5 GB examples (decompose)
deepseek-r1:8b 5 GB examples (guardian, guardian_hf)
qwen2.5vl:7b 6 GB examples (vision_openai)
pielee/qwen3-4b-thinking-2507_q8 4 GB examples (SOFAI graph colouring)
phi:2.7b 2 GB examples (SOFAI graph colouring)
granite3-guardian:2b 1.5 GB examples (mini_researcher)
llama3.2:3b 2 GB examples (tutorial, mify, notebook)

Running even a handful of these consecutively without eviction can exhaust
available memory.

Solution

Add per-module Ollama model eviction to test/conftest.py via a
pytest_runtest_teardown hook. When pytest crosses a file boundary between
Ollama-marked tests, the hook queries /api/ps for all loaded models and
evicts each with keep_alive=0.

  • Always active — no flags required
  • Covers both test/ and docs/examples/
  • Discovers models dynamically rather than hardcoding a list

Compromise

Eviction happens at module boundaries, not per-test. Tests within a single
file share the loaded model with no overhead. When crossing files, any loaded
model is evicted — even if the next file uses the same one. This means a
redundant unload/reload (~5–15 s) when consecutive files share a model.

This is a deliberate trade-off: predictable memory behaviour matters more than
saving a reload, particularly on constrained CI runners where an OOM crash
costs far more than a few seconds of model loading.

Eviction also targets all loaded Ollama models, not just those loaded by the
test. If you are using Ollama interactively whilst the suite runs, your model
will be evicted between test modules.

Missing ollama markers

Two test files under test/core/ called start_session() (real Ollama
backend) but lacked pytest.mark.ollama. They had the inert
# pytest: ollama, llm comment syntax, which is only parsed for
docs/examples/ items — regular test files require a proper pytestmark
or decorator. Without the marker, the eviction hook never fired for them,
leaving models resident through subsequent non-Ollama tests.

  • test/core/test_streaming_sync_functions.py — added module-level
    pytestmark = [pytest.mark.ollama, pytest.mark.e2e]
  • test/core/test_computed_model_output_thunk.py — added per-function
    @pytest.mark.ollama and @pytest.mark.e2e on the 3 tests that use
    a real Ollama session (8 pure unit tests left unmarked)

Other changes

  • Removed the now-redundant OLLAMA_KEEP_ALIVE=1m tip from test/README.md
    (active eviction supersedes a manual idle-timeout workaround)
  • Added an "Ollama Model Eviction" section to test/README.md documenting
    both eviction mechanisms

Testing

The eviction logic is best-effort infrastructure — it queries a live Ollama
server, so unit testing without mocking the HTTP calls is not meaningful.
Verified locally by running the full suite with -s and confirming models
are evicted between files via ollama-evict log output.

@github-actions github-actions Bot added the bug Something isn't working label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@planetf1 planetf1 changed the title fix: evict Ollama models between test modules to prevent memory starv… fix: evict Ollama models between test modules to prevent memory starvation Apr 9, 2026
@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

This worked significantly better -- I was able to run through all (selected) ollama tests. Memory usage correctly dropped at various points. Checking on tty shows models were unloaded. (the little peaks and drops in the memory usage). The largest model (14GB) loaded ok as it was the only one (in 32GB machine) and memory was cleared up after. Reloading small models took about 1s. Looks like a good fix - will run the full test suite next
Screenshot 2026-04-09 at 18 03 32
Screenshot 2026-04-09 at 18 03 45
Screenshot 2026-04-09 at 18 06 26
Screenshot 2026-04-09 at 18 08 37

@ajbozarth
Copy link
Copy Markdown
Contributor

I ran uv run pytest -v on this branch and got the following:

Terminal output
$ uv run pytest -v
      Built mellea @ file:///Users/ajbozarth/workspace/ai/mellea
Uninstalled 91 packages in 1.58s
Installed 92 packages in 257ms
============================================================================================================ test session starts ============================================================================================================
platform darwin -- Python 3.12.8, pytest-9.0.2, pluggy-1.6.0 -- /Users/ajbozarth/workspace/ai/mellea/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.3.1-arm64-arm-64bit', 'Packages': {'pytest': '9.0.2', 'pluggy': '1.6.0'}, 'Plugins': {'nbmake': '1.5.5', 'recording': '0.13.4', 'cov': '7.1.0', 'xdist': '3.8.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'asyncio': '1.3.0', 'Faker': '37.12.0', 'langsmith': '0.7.24', 'anyio': '4.13.0'}}
rootdir: /Users/ajbozarth/workspace/ai/mellea
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, recording-0.13.4, cov-7.1.0, xdist-3.8.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, asyncio-1.3.0, Faker-37.12.0, langsmith-0.7.24, anyio-4.13.0
timeout: 900.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1017 items / 18 deselected / 999 selected                                                                                                                                                                                         

test/backends/test_adapters/test_adapter.py::test_adapter_init PASSED                                                                                                                                                                 ...
test/telemetry/test_tracing_backend.py::test_streaming_span_duration SKIPPED (Telemetry not initialized)                                                                                                                              [100%]

================================================================================================================= FAILURES ==================================================================================================================
______________________________________________________________________________________________________ test_find_context_attributions _______________________________________________________________________________________________________

backend = <mellea.backends.huggingface.LocalHFBackend object at 0x151aee090>

    @pytest.mark.qualitative
    def test_find_context_attributions(backend):
        """Verify that the context-attribution intrinsic functions properly."""
        context, assistant_response, documents = _read_rag_input_json(
            "context-attribution.json"
        )
        expected = _read_rag_output_json("context-attribution.json")
    
>       result = core.find_context_attributions(
            assistant_response, documents, context, backend
        )

test/stdlib/components/intrinsic/test_core.py:102: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/stdlib/components/intrinsic/core.py:90: in find_context_attributions
    result_json = call_intrinsic(
mellea/stdlib/components/intrinsic/_util.py:39: in call_intrinsic
    model_output_thunk, _ = mfuncs.act(
mellea/stdlib/functional.py:100: in act
    out = _run_async_in_thread(
mellea/helpers/event_loop_helper.py:105: in _run_async_in_thread
    return __event_loop_handler(co)
           ^^^^^^^^^^^^^^^^^^^^^^^^
mellea/helpers/event_loop_helper.py:77: in __call__
    return asyncio.run_coroutine_threadsafe(co, self._event_loop).result()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/_base.py:456: in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
    raise self._exception
mellea/stdlib/functional.py:634: in aact
    await result.avalue()
mellea/core/base.py:393: in avalue
    await self.astream()
mellea/core/base.py:480: in astream
    raise chunks[-1]
mellea/helpers/async_helpers.py:31: in send_to_queue
    aresponse = await co
                ^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/asyncio/threads.py:25: in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/thread.py:59: in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/backends/huggingface.py:464: in _generate_with_adapter_lock
    out = generate_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/formatters/granite/base/util.py:363: in generate_with_transformers
    generate_result = model.generate(input_tokens, **generate_input)  # type: ignore[operator]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:120: in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2566: in generate
    result = decoding_method(
.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2805: in _sample
    next_token_scores = logits_processor(input_ids, next_token_logits)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/generation/logits_process.py:93: in __call__
    scores = processor(input_ids, scores)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xgrammar.contrib.hf.LogitsProcessor object at 0x1473514c0>, input_ids = tensor([[100264,   9125, 100265,  ...,     58,    220,      0]],
       device='mps:0')
scores = tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='mps:0')

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        """
        Accept token sampled in the last iteration, fill in bitmask, and apply bitmask to logits.
    
        Returns:
            scores: Logits modified with bitmask.
        """
        # Lazily initialize GrammarMatchers and bitmask
        if len(self.matchers) == 0:
            self.batch_size = input_ids.shape[0]
            self.compiled_grammars = (
                self.compiled_grammars
                if len(self.compiled_grammars) > 1
                else self.compiled_grammars * self.batch_size
            )
            assert (
                len(self.compiled_grammars) == self.batch_size
            ), "The number of compiled grammars must be equal to the batch size."
            self.matchers = [
                xgr.GrammarMatcher(self.compiled_grammars[i]) for i in range(self.batch_size)
            ]
            self.token_bitmask = xgr.allocate_token_bitmask(self.batch_size, self.full_vocab_size)
    
        if input_ids.shape[0] != self.batch_size:
            raise RuntimeError(
                "Expect input_ids.shape[0] to be LogitsProcessor.batch_size."
                + f"Got {input_ids.shape[0]} for the former, and {self.batch_size} for the latter."
            )
    
        if not self.prefilled:
            # Have not sampled a token yet
            self.prefilled = True
        else:
            for i in range(self.batch_size):
                if not self.matchers[i].is_terminated():
                    sampled_token = input_ids[i][-1]
>                   assert self.matchers[i].accept_token(sampled_token)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                   AssertionError

.venv/lib/python3.12/site-packages/xgrammar/contrib/hf.py:96: AssertionError
----------------------------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------------------------
=== 12:12:18-INFO ======
passing in model options when generating with an adapter; some model options may be overwritten / ignored
------------------------------------------------------------------------------------------------------------- Captured log call -------------------------------------------------------------------------------------------------------------
INFO     fancy_logger:huggingface.py:475 passing in model options when generating with an adapter; some model options may be overwritten / ignored
--------------------------------------------------------------------------------------------------------- Captured stdout teardown ----------------------------------------------------------------------------------------------------------
=== 12:12:25-INFO ======
Cleaning up test_core backend GPU memory...
=== 12:12:25-INFO ======
  Cleared LRU cache
=== 12:12:25-INFO ======
  Removed accelerate dispatch hooks
----------------------------------------------------------------------------------------------------------- Captured log teardown -----------------------------------------------------------------------------------------------------------
INFO     fancy_logger:conftest.py:281 Cleaning up test_core backend GPU memory...
INFO     fancy_logger:conftest.py:304   Cleared LRU cache
INFO     fancy_logger:conftest.py:341   Removed accelerate dispatch hooks
============================================================================================================= warnings summary ==============================================================================================================
test/backends/test_openai_vllm.py:18
  /Users/ajbozarth/workspace/ai/mellea/test/backends/test_openai_vllm.py:18: PytestUnknownMarkWarning: Unknown pytest.mark.vllm - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    pytest.mark.vllm,

test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
test/backends/test_litellm_ollama.py::test_generate_from_raw
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/aiohttp/connector.py:1003: DeprecationWarning: enable_cleanup_closed ignored because https://github.com/python/cpython/pull/118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=8, releaselevel='final', serial=0)
    super().__init__(

test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='The answ...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Subject:...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='no', rol...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='yes', ro...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Subject:...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_gen_slot
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='{\n    "...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py:1855: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
    obj_dict = processed_chunk.dict()

test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/backends/test_litellm_ollama.py::test_async_avalue
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Hello! H...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_litellm_ollama.py::test_async_parallel_requests
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Goodbye!...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/backends/test_tool_calls.py::test_tool_called_from_context_action
  <frozen abc>:106: DeprecationWarning: Use BaseMetaSerializer() instead.

test/cli/test_alora_train.py::test_alora_config_creation
test/cli/test_alora_train.py::test_lora_config_creation
test/cli/test_alora_train.py::test_invocation_prompt_tokenization
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/trl/trainer/sft_config.py:257: DeprecationWarning: `max_seq_length` is deprecated and will be removed in version 0.20.0. Use `max_length` instead.
    warnings.warn(

test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[answerability_simple]
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[answerability_simple]
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

test/helpers/test_event_loop_helper.py::test_event_loop_handler_with_forking
  /Users/ajbozarth/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=40757) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics
test/stdlib/components/docs/test_richdocument.py::test_richdocument_markdown
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490: DeprecationWarning: Field `annotations` is deprecated; use `meta` instead.
    for ann in item.annotations

test/stdlib/components/intrinsic/test_core.py: 2 warnings
test/stdlib/components/intrinsic/test_guardian.py: 3 warnings
test/stdlib/components/intrinsic/test_rag.py: 5 warnings
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
    warnings.warn(

test/stdlib/test_spans.py::test_lazy_spans
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/torch/nn/functional.py:5294: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:468.)
    return torch._C._nn.pad(input, pad, mode, value)

test/telemetry/test_logging.py::test_otlp_logging_enabled_without_endpoint_warns
  /Users/ajbozarth/workspace/ai/mellea/mellea/telemetry/logging.py:97: UserWarning: OTLP logs exporter is enabled (MELLEA_LOGS_OTLP=true) but no endpoint is configured. Set OTEL_EXPORTER_OTLP_LOGS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT to export logs.
    _logger_provider = _setup_logger_provider()

test/telemetry/test_metrics.py: 24 warnings
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics_token.py: 4 warnings
  /Users/ajbozarth/workspace/ai/mellea/mellea/telemetry/metrics.py:245: UserWarning: Metrics are enabled (MELLEA_METRICS_ENABLED=true) but no exporters are configured. Metrics will be collected but not exported. Set MELLEA_METRICS_PROMETHEUS=true, set MELLEA_METRICS_OTLP=true with an endpoint (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT), or set MELLEA_METRICS_CONSOLE=true to export metrics.
    _meter_provider = _setup_meter_provider()

test/telemetry/test_metrics.py: 28 warnings
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics_token.py: 4 warnings
  /Users/ajbozarth/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/importlib/__init__.py:131: UserWarning: TokenMetricsPlugin already registered: Plugin token_metrics.generation_post_call already registered
    _bootstrap._exec(spec, module)

test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_tracing.py::test_session_with_tracing_disabled
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Understo...ields={'refusal': None}), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_computed_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
    or callable(getattr(delta, attr_name))

test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
    or callable(getattr(delta, attr_name))

test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
  /Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="It appea...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================= Skipped Examples ==============================================================================================================
The following examples were skipped during collection:

  • 102_example.py: Example marked with skip marker
  • example_readme_generator.py: Example marked with skip marker
  • make_training_data.py: Example marked with skip marker
  • stembolts_intrinsic.py: Example marked with skip marker
  • bedrock_litellm_example.py: Example marked with skip marker
  • bedrock_openai_example.py: Example marked with skip marker
  • qiskit_code_validation.py: Example marked with skip marker
  • validation_helpers.py: Example marked with skip marker
  • python_decompose_result.py: Example marked to always skip (skip_always marker)
  • m_decomp_result.py: Example marked to always skip (skip_always marker)
  • client.py: Example marked to always skip (skip_always marker)
  • pii_serve.py: Example marked to always skip (skip_always marker)
  • mcp_example.py: Example marked to always skip (skip_always marker)
  • rich_document_advanced.py: Example marked with skip marker
  • mellea_pdf.py: Example marked to always skip (skip_always marker)
  • simple_rag_with_filter.py: Example marked to always skip (skip_always marker)
============================================================================================================== tests coverage ===============================================================================================================
_____________________________________________________________________________________________ coverage: platform darwin, python 3.12.8-final-0 ______________________________________________________________________________________________

Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
========================================================================================================== short test summary info ==========================================================================================================
FAILED test/stdlib/components/intrinsic/test_core.py::test_find_context_attributions - AssertionError
================================================================= 1 failed, 935 passed, 59 skipped, 18 deselected, 2 xfailed, 2 xpassed, 124 warnings in 904.59s (0:15:04) ==================================================================

Only one test failed for me (qualitative hf test with high ram need) and the memory pressure looked ok, but I didn't watch it like a hawk while it was running. (just had it at the side so I could see if yellow/red showed)

Overall I think this is a good step forward.

@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

Spotted two files missing markers. updated (this would mean the model isn't evicted.. and it runs just before some local transformers tests which use lots of memory).

@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

@ajbozarth Thanks - will look at your failure here & other issue later or tomorrow.

@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

test_hallucination_detection failure

Not memory-related. Qualitative flake — faithfulness_likelihood drifted outside tolerance.

_________________________ test_hallucination_detection _________________________

>       assert pytest.approx(r, abs=3e-2) == e
E       AssertionError: assert approx({'resp...he sentence.}) == {'explanation...end': 31, ...}
E
E         comparison failed. Mismatched elements: 1 / 5:
E         Max absolute difference: 0.03558173654518482
E         Max relative difference: 0.05138319547510355
E         Index                   | Obtained           | Expected
E         faithfulness_likelihood | 0.7280598165124975 | 0.6924780799673127 ± 0.03

test/stdlib/components/intrinsic/test_rag.py:161: AssertionError
FAILED test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection
= 1 failed, 935 passed, 59 skipped, 18 deselected, 2 xfailed, 2 xpassed in 982.41s (0:16:22) =

Model produced valid structured output — not garbled/incoherent. This is standard floating-point non-determinism in transformer inference; the abs=3e-2 tolerance is too tight.

Not related to #783. That issue produces raw token IDs under memory pressure. Here the model ran fine, the score just wobbled. Fix: widen tolerance (e.g. abs=5e-2) or mark xfail.

planetf1 added 3 commits April 9, 2026 19:38
…ation (generative-computing#798)

Add per-module Ollama model eviction to test/conftest.py. When pytest
crosses a file boundary between Ollama-marked tests, all loaded models
are discovered via /api/ps and evicted with keep_alive=0. This prevents
heavyweight models from accumulating in memory across the test suite.

Covers both test/ and docs/examples/ without requiring --group-by-backend.
Restore docs/examples/README.md to match main — the original
command and heading were correct.
test_streaming_sync_functions.py and test_computed_model_output_thunk.py
call start_session() (Ollama backend) but lacked pytest.mark.ollama,
so the per-module eviction hook never fired for them. The inert
`# pytest: ollama` comment is only parsed for docs/examples/.
@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

@ajbozarth I didn't see the issues you did. I propose that this change is a positive (even though it doesn't address your garbled output). It should make test execution more predictable, but it doesn't change the baseline. Let's continue looking at your failures under #783

I also have now rebased on main so will again do a full test run so that we cover examples too (the fix I did separately is now merged, and fixes 'collection' so we now get the examples too)

@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

Second run with examples failed with two failures, both pre-existing and unrelated to the eviction changes:

  1. test_hallucination_detection — same flakiness as before. Tolerance too tight (abs=3e-2), score drifted by 0.036. Not memory-related.

  2. sofai_graph_coloring.py — Ollama returned a 500:model runner has unexpectedly stopped, this may be due to resource limitationsor an internal error, check ollama server logs for details

On 2 -- the logs (use -s) confirm the previous model (granite3.3-guardian:8b) was evicted - so the fix here is working as intended. The crash in fact doesn't look ram related, but is a crash when running the phi:2.7b model - which is only 1.7GB. Running the test standalone also failed, and memory usage remains low - no memory pressure, solid line of usage. Raising a new issue #806 to address

Testing concluded - propose ready to merge

@planetf1 planetf1 marked this pull request as ready for review April 9, 2026 21:42
@planetf1 planetf1 requested a review from a team as a code owner April 9, 2026 21:42
Examples run as isolated subprocesses. Ollama's default keep_alive keeps
models resident after exit, starving later examples of memory. Add
teardown hook to evict after every ollama-marked example.
@planetf1 planetf1 force-pushed the fix/example-ollama-eviction-798 branch from 0bf06cb to 0b09d4e Compare April 9, 2026 22:07
@ajbozarth
Copy link
Copy Markdown
Contributor

I noticed you had to add the helpers to both conftest files, should we look at creating a test util file to hold duplicated conftest code? (as a follow up issue)

@planetf1 planetf1 enabled auto-merge April 9, 2026 22:18
@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

I noticed you had to add the helpers to both conftest files, should we look at creating a test util file to hold duplicated conftest code? (as a follow up issue)

sounds reasonable!

@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 9, 2026

The test_hallucination_detection flakiness reported here is tracked in #809.

@planetf1 planetf1 added this pull request to the merge queue Apr 9, 2026
Merged via the queue into generative-computing:main with commit 417b7c8 Apr 9, 2026
6 checks passed
@planetf1 planetf1 deleted the fix/example-ollama-eviction-798 branch April 9, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Example tests leave Ollama models resident, starving subsequent tests

2 participants