fix: evict Ollama models between test modules to prevent memory starvation#804
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
|
I ran Terminal output$ uv run pytest -v
Built mellea @ file:///Users/ajbozarth/workspace/ai/mellea
Uninstalled 91 packages in 1.58s
Installed 92 packages in 257ms
============================================================================================================ test session starts ============================================================================================================
platform darwin -- Python 3.12.8, pytest-9.0.2, pluggy-1.6.0 -- /Users/ajbozarth/workspace/ai/mellea/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.3.1-arm64-arm-64bit', 'Packages': {'pytest': '9.0.2', 'pluggy': '1.6.0'}, 'Plugins': {'nbmake': '1.5.5', 'recording': '0.13.4', 'cov': '7.1.0', 'xdist': '3.8.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'asyncio': '1.3.0', 'Faker': '37.12.0', 'langsmith': '0.7.24', 'anyio': '4.13.0'}}
rootdir: /Users/ajbozarth/workspace/ai/mellea
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, recording-0.13.4, cov-7.1.0, xdist-3.8.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, asyncio-1.3.0, Faker-37.12.0, langsmith-0.7.24, anyio-4.13.0
timeout: 900.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1017 items / 18 deselected / 999 selected
test/backends/test_adapters/test_adapter.py::test_adapter_init PASSED ...
test/telemetry/test_tracing_backend.py::test_streaming_span_duration SKIPPED (Telemetry not initialized) [100%]
================================================================================================================= FAILURES ==================================================================================================================
______________________________________________________________________________________________________ test_find_context_attributions _______________________________________________________________________________________________________
backend = <mellea.backends.huggingface.LocalHFBackend object at 0x151aee090>
@pytest.mark.qualitative
def test_find_context_attributions(backend):
"""Verify that the context-attribution intrinsic functions properly."""
context, assistant_response, documents = _read_rag_input_json(
"context-attribution.json"
)
expected = _read_rag_output_json("context-attribution.json")
> result = core.find_context_attributions(
assistant_response, documents, context, backend
)
test/stdlib/components/intrinsic/test_core.py:102:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/stdlib/components/intrinsic/core.py:90: in find_context_attributions
result_json = call_intrinsic(
mellea/stdlib/components/intrinsic/_util.py:39: in call_intrinsic
model_output_thunk, _ = mfuncs.act(
mellea/stdlib/functional.py:100: in act
out = _run_async_in_thread(
mellea/helpers/event_loop_helper.py:105: in _run_async_in_thread
return __event_loop_handler(co)
^^^^^^^^^^^^^^^^^^^^^^^^
mellea/helpers/event_loop_helper.py:77: in __call__
return asyncio.run_coroutine_threadsafe(co, self._event_loop).result()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
mellea/stdlib/functional.py:634: in aact
await result.avalue()
mellea/core/base.py:393: in avalue
await self.astream()
mellea/core/base.py:480: in astream
raise chunks[-1]
mellea/helpers/async_helpers.py:31: in send_to_queue
aresponse = await co
^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/asyncio/threads.py:25: in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/backends/huggingface.py:464: in _generate_with_adapter_lock
out = generate_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/formatters/granite/base/util.py:363: in generate_with_transformers
generate_result = model.generate(input_tokens, **generate_input) # type: ignore[operator]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:120: in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2566: in generate
result = decoding_method(
.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2805: in _sample
next_token_scores = logits_processor(input_ids, next_token_logits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/generation/logits_process.py:93: in __call__
scores = processor(input_ids, scores)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <xgrammar.contrib.hf.LogitsProcessor object at 0x1473514c0>, input_ids = tensor([[100264, 9125, 100265, ..., 58, 220, 0]],
device='mps:0')
scores = tensor([[nan, nan, nan, ..., nan, nan, nan]], device='mps:0')
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
"""
Accept token sampled in the last iteration, fill in bitmask, and apply bitmask to logits.
Returns:
scores: Logits modified with bitmask.
"""
# Lazily initialize GrammarMatchers and bitmask
if len(self.matchers) == 0:
self.batch_size = input_ids.shape[0]
self.compiled_grammars = (
self.compiled_grammars
if len(self.compiled_grammars) > 1
else self.compiled_grammars * self.batch_size
)
assert (
len(self.compiled_grammars) == self.batch_size
), "The number of compiled grammars must be equal to the batch size."
self.matchers = [
xgr.GrammarMatcher(self.compiled_grammars[i]) for i in range(self.batch_size)
]
self.token_bitmask = xgr.allocate_token_bitmask(self.batch_size, self.full_vocab_size)
if input_ids.shape[0] != self.batch_size:
raise RuntimeError(
"Expect input_ids.shape[0] to be LogitsProcessor.batch_size."
+ f"Got {input_ids.shape[0]} for the former, and {self.batch_size} for the latter."
)
if not self.prefilled:
# Have not sampled a token yet
self.prefilled = True
else:
for i in range(self.batch_size):
if not self.matchers[i].is_terminated():
sampled_token = input_ids[i][-1]
> assert self.matchers[i].accept_token(sampled_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E AssertionError
.venv/lib/python3.12/site-packages/xgrammar/contrib/hf.py:96: AssertionError
----------------------------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------------------------
=== 12:12:18-INFO ======
passing in model options when generating with an adapter; some model options may be overwritten / ignored
------------------------------------------------------------------------------------------------------------- Captured log call -------------------------------------------------------------------------------------------------------------
INFO fancy_logger:huggingface.py:475 passing in model options when generating with an adapter; some model options may be overwritten / ignored
--------------------------------------------------------------------------------------------------------- Captured stdout teardown ----------------------------------------------------------------------------------------------------------
=== 12:12:25-INFO ======
Cleaning up test_core backend GPU memory...
=== 12:12:25-INFO ======
Cleared LRU cache
=== 12:12:25-INFO ======
Removed accelerate dispatch hooks
----------------------------------------------------------------------------------------------------------- Captured log teardown -----------------------------------------------------------------------------------------------------------
INFO fancy_logger:conftest.py:281 Cleaning up test_core backend GPU memory...
INFO fancy_logger:conftest.py:304 Cleared LRU cache
INFO fancy_logger:conftest.py:341 Removed accelerate dispatch hooks
============================================================================================================= warnings summary ==============================================================================================================
test/backends/test_openai_vllm.py:18
/Users/ajbozarth/workspace/ai/mellea/test/backends/test_openai_vllm.py:18: PytestUnknownMarkWarning: Unknown pytest.mark.vllm - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
pytest.mark.vllm,
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
test/backends/test_litellm_ollama.py::test_generate_from_raw
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/aiohttp/connector.py:1003: DeprecationWarning: enable_cleanup_closed ignored because https://github.com/python/cpython/pull/118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=8, releaselevel='final', serial=0)
super().__init__(
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='The answ...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Subject:...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='no', rol...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='yes', ro...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Subject:...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_gen_slot
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='{\n "...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py:1855: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
obj_dict = processed_chunk.dict()
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/backends/test_litellm_ollama.py::test_async_avalue
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Hello! H...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_async_parallel_requests
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Goodbye!...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_tool_calls.py::test_tool_called_from_context_action
<frozen abc>:106: DeprecationWarning: Use BaseMetaSerializer() instead.
test/cli/test_alora_train.py::test_alora_config_creation
test/cli/test_alora_train.py::test_lora_config_creation
test/cli/test_alora_train.py::test_invocation_prompt_tokenization
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/trl/trainer/sft_config.py:257: DeprecationWarning: `max_seq_length` is deprecated and will be removed in version 0.20.0. Use `max_length` instead.
warnings.warn(
test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[answerability_simple]
<frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
test/formatters/granite/test_intrinsics_formatters.py::test_run_transformers[answerability_simple]
<frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
test/helpers/test_event_loop_helper.py::test_event_loop_handler_with_forking
/Users/ajbozarth/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=40757) is multi-threaded, use of fork() may lead to deadlocks in the child.
self.pid = os.fork()
test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics
test/stdlib/components/docs/test_richdocument.py::test_richdocument_markdown
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490: DeprecationWarning: Field `annotations` is deprecated; use `meta` instead.
for ann in item.annotations
test/stdlib/components/intrinsic/test_core.py: 2 warnings
test/stdlib/components/intrinsic/test_guardian.py: 3 warnings
test/stdlib/components/intrinsic/test_rag.py: 5 warnings
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
warnings.warn(
test/stdlib/test_spans.py::test_lazy_spans
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/torch/nn/functional.py:5294: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:468.)
return torch._C._nn.pad(input, pad, mode, value)
test/telemetry/test_logging.py::test_otlp_logging_enabled_without_endpoint_warns
/Users/ajbozarth/workspace/ai/mellea/mellea/telemetry/logging.py:97: UserWarning: OTLP logs exporter is enabled (MELLEA_LOGS_OTLP=true) but no endpoint is configured. Set OTEL_EXPORTER_OTLP_LOGS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT to export logs.
_logger_provider = _setup_logger_provider()
test/telemetry/test_metrics.py: 24 warnings
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/Users/ajbozarth/workspace/ai/mellea/mellea/telemetry/metrics.py:245: UserWarning: Metrics are enabled (MELLEA_METRICS_ENABLED=true) but no exporters are configured. Metrics will be collected but not exported. Set MELLEA_METRICS_PROMETHEUS=true, set MELLEA_METRICS_OTLP=true with an endpoint (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT), or set MELLEA_METRICS_CONSOLE=true to export metrics.
_meter_provider = _setup_meter_provider()
test/telemetry/test_metrics.py: 28 warnings
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/Users/ajbozarth/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/importlib/__init__.py:131: UserWarning: TokenMetricsPlugin already registered: Plugin token_metrics.generation_post_call already registered
_bootstrap._exec(spec, module)
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_tracing.py::test_session_with_tracing_disabled
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Understo...ields={'refusal': None}), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_computed_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
or callable(getattr(delta, attr_name))
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
or callable(getattr(delta, attr_name))
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="It appea...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================= Skipped Examples ==============================================================================================================
The following examples were skipped during collection:
• 102_example.py: Example marked with skip marker
• example_readme_generator.py: Example marked with skip marker
• make_training_data.py: Example marked with skip marker
• stembolts_intrinsic.py: Example marked with skip marker
• bedrock_litellm_example.py: Example marked with skip marker
• bedrock_openai_example.py: Example marked with skip marker
• qiskit_code_validation.py: Example marked with skip marker
• validation_helpers.py: Example marked with skip marker
• python_decompose_result.py: Example marked to always skip (skip_always marker)
• m_decomp_result.py: Example marked to always skip (skip_always marker)
• client.py: Example marked to always skip (skip_always marker)
• pii_serve.py: Example marked to always skip (skip_always marker)
• mcp_example.py: Example marked to always skip (skip_always marker)
• rich_document_advanced.py: Example marked with skip marker
• mellea_pdf.py: Example marked to always skip (skip_always marker)
• simple_rag_with_filter.py: Example marked to always skip (skip_always marker)
============================================================================================================== tests coverage ===============================================================================================================
_____________________________________________________________________________________________ coverage: platform darwin, python 3.12.8-final-0 ______________________________________________________________________________________________
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
========================================================================================================== short test summary info ==========================================================================================================
FAILED test/stdlib/components/intrinsic/test_core.py::test_find_context_attributions - AssertionError
================================================================= 1 failed, 935 passed, 59 skipped, 18 deselected, 2 xfailed, 2 xpassed, 124 warnings in 904.59s (0:15:04) ==================================================================Only one test failed for me (qualitative hf test with high ram need) and the memory pressure looked ok, but I didn't watch it like a hawk while it was running. (just had it at the side so I could see if yellow/red showed) Overall I think this is a good step forward. |
|
Spotted two files missing markers. updated (this would mean the model isn't evicted.. and it runs just before some local transformers tests which use lots of memory). |
|
@ajbozarth Thanks - will look at your failure here & other issue later or tomorrow. |
test_hallucination_detection failureNot memory-related. Qualitative flake — Model produced valid structured output — not garbled/incoherent. This is standard floating-point non-determinism in transformer inference; the Not related to #783. That issue produces raw token IDs under memory pressure. Here the model ran fine, the score just wobbled. Fix: widen tolerance (e.g. |
…ation (generative-computing#798) Add per-module Ollama model eviction to test/conftest.py. When pytest crosses a file boundary between Ollama-marked tests, all loaded models are discovered via /api/ps and evicted with keep_alive=0. This prevents heavyweight models from accumulating in memory across the test suite. Covers both test/ and docs/examples/ without requiring --group-by-backend.
Restore docs/examples/README.md to match main — the original command and heading were correct.
test_streaming_sync_functions.py and test_computed_model_output_thunk.py call start_session() (Ollama backend) but lacked pytest.mark.ollama, so the per-module eviction hook never fired for them. The inert `# pytest: ollama` comment is only parsed for docs/examples/.
|
@ajbozarth I didn't see the issues you did. I propose that this change is a positive (even though it doesn't address your garbled output). It should make test execution more predictable, but it doesn't change the baseline. Let's continue looking at your failures under #783 I also have now rebased on main so will again do a full test run so that we cover examples too (the fix I did separately is now merged, and fixes 'collection' so we now get the examples too) |
|
Second run with examples failed with two failures, both pre-existing and unrelated to the eviction changes:
On 2 -- the logs (use Testing concluded - propose ready to merge |
Examples run as isolated subprocesses. Ollama's default keep_alive keeps models resident after exit, starving later examples of memory. Add teardown hook to evict after every ollama-marked example.
0bf06cb to
0b09d4e
Compare
|
I noticed you had to add the helpers to both conftest files, should we look at creating a test util file to hold duplicated conftest code? (as a follow up issue) |
sounds reasonable! |
|
The |




Type of PR
Description
Problem
When running the full test suite (
uv run pytest), Ollama's default 5-minutekeep-alive means models from earlier tests stay resident in memory long after
their subprocess or test function exits. Heavyweight models accumulate and
starve later tests of headroom — the SOFAI graph-colouring example crashed
with a 500 because an 11 GB guardian model from an earlier example was still
resident.
test/conftest.pyalready managed this for the regular test suite via--group-by-backendwarm-up/eviction, but that flag is not on by default anddocs/examples/had no equivalent at all.Ollama models across the test suite
The examples use a wide variety of models that cannot be hardcoded:
granite4:microgranite4:micro-hgranite3.2-visionllama3.2:1bgranite4:latestgranite3.3:8bdeepseek-r1:8bqwen2.5vl:7bpielee/qwen3-4b-thinking-2507_q8phi:2.7bgranite3-guardian:2bllama3.2:3bRunning even a handful of these consecutively without eviction can exhaust
available memory.
Solution
Add per-module Ollama model eviction to
test/conftest.pyvia apytest_runtest_teardownhook. When pytest crosses a file boundary betweenOllama-marked tests, the hook queries
/api/psfor all loaded models andevicts each with
keep_alive=0.test/anddocs/examples/Compromise
Eviction happens at module boundaries, not per-test. Tests within a single
file share the loaded model with no overhead. When crossing files, any loaded
model is evicted — even if the next file uses the same one. This means a
redundant unload/reload (~5–15 s) when consecutive files share a model.
This is a deliberate trade-off: predictable memory behaviour matters more than
saving a reload, particularly on constrained CI runners where an OOM crash
costs far more than a few seconds of model loading.
Eviction also targets all loaded Ollama models, not just those loaded by the
test. If you are using Ollama interactively whilst the suite runs, your model
will be evicted between test modules.
Missing
ollamamarkersTwo test files under
test/core/calledstart_session()(real Ollamabackend) but lacked
pytest.mark.ollama. They had the inert# pytest: ollama, llmcomment syntax, which is only parsed fordocs/examples/items — regular test files require a properpytestmarkor decorator. Without the marker, the eviction hook never fired for them,
leaving models resident through subsequent non-Ollama tests.
test/core/test_streaming_sync_functions.py— added module-levelpytestmark = [pytest.mark.ollama, pytest.mark.e2e]test/core/test_computed_model_output_thunk.py— added per-function@pytest.mark.ollamaand@pytest.mark.e2eon the 3 tests that usea real Ollama session (8 pure unit tests left unmarked)
Other changes
OLLAMA_KEEP_ALIVE=1mtip fromtest/README.md(active eviction supersedes a manual idle-timeout workaround)
test/README.mddocumentingboth eviction mechanisms
Testing
The eviction logic is best-effort infrastructure — it queries a live Ollama
server, so unit testing without mocking the HTTP calls is not meaningful.
Verified locally by running the full suite with
-sand confirming modelsare evicted between files via
ollama-evictlog output.