ci: memory management in tests by avinash2692 · Pull Request #721 · generative-computing/mellea

avinash2692 · 2026-03-23T15:32:16Z

CI: memory management in tests

Type of PR

Bug Fix
New Feature
Documentation
Other

Issues that it fixes

Problem

Running the full test suite on a single GPU causes OOM errors when vLLM tests start after HuggingFace tests. This could be because the 8B HF model stays resident in GPU memory because the existing cleanup (gc.collect() + empty_cache()) cannot free tensors held by indirect references — LRU caches, PEFT adapter hooks, accelerate dispatch hooks, and class-level _cached_blocks. There are also redundancies in backends and this simplifies some of it.

Solution

Introduce cleanup_gpu_backend() — a unified cleanup function that calls model.cpu() to forcefully move tensors off GPU, then clears all GPU-resident state. This replaces the previous cleanup_vllm_backend() and aggressive_gpu_cleanup() which relied solely on gc.collect().

Cleanup logs before/after GPU memory for easy debugging:

Cleaning up huggingface backend GPU memory...
  GPU before cleanup: 62.0GB free / 79.2GB total
  Cleared LRU cache
  Removed accelerate dispatch hooks
  GPU after cleanup: 78.1GB free / 79.2GB total (reclaimed 16.1GB)

There is a new pytest marker --group-by-backend that groups tests based on the backend marker. This gives us an opportunity to
- have a unified vllm backend that is shared between tests
- have aggressive cleanup of gpu memory between grouped backends.
- eliminate the need for process isolation (CUDA lock still exists but there is enough GPU memory in an 80 GB GPU to run all the tests)

Files changed

File	What changed
`test/conftest.py`	Added `cleanup_gpu_backend()`. Removed redundant cleanup functions. Ungated `memory_cleaner()`. Replaced `get_device_properties()` with `nvidia-smi` to prevent CUDA fork errors. Added between-group GPU cleanup.
`test/backends/test_huggingface.py`	`return` → `yield` + `cleanup_gpu_backend()`
`test/backends/test_huggingface_tools.py`	Same
`test/backends/test_vllm.py`	Fixed yield bug (`return` inside generator), updated cleanup
`test/backends/test_vllm_tools.py`	Same
`test/telemetry/test_metrics_backend.py`	Replaced bare `del` with `cleanup_gpu_backend()`
`test/stdlib/components/intrinsic/test_rag.py`	Same
`test/stdlib/test_spans.py`	Added `cleanup_gpu_backend()` on teardown
`test/cli/test_alora_train_integration.py`	Added `model.cpu()` + cleanup after GPU usage
`test/backends/test_openai_vllm.py`	Captured `vllm serve` subprocess output — skip reasons now show actual errors
`mellea/backends/vllm.py`	Updated error messages for fork/exclusive_process errors. Early bail-out for non-OOM failures.
`test/scripts/run_tests_with_ollama.sh`	New. End-to-end test runner — downloads ollama (no sudo), starts server, pulls/warms models, runs pytest, shuts down.

How to run

# End-to-end script with ollama startup and tear down. 
"./test/scripts/run_tests_with_ollama.sh --group-by-backend --timeout=1200 -v -rs -s"
 
# GPU tests only (no ollama)
uv run pytest test/ --group-by-backend -v

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

- Add session-scoped shared_vllm_backend fixture using Granite 4 Micro - Update test_vllm.py and test_vllm_tools.py to use shared backend - Fall back to module-scoped backends when --isolate-heavy flag is set - Both modules now use consistent Granite 4 Micro model - Enhance CUDA OOM error message with actionable solutions - Maintains backward compatibility with existing isolation mechanism Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ent-in-tests

github-actions · 2026-03-23T15:32:30Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

mergify · 2026-03-23T15:32:51Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

avinash2692 · 2026-03-23T19:30:44Z

I got:

Result: 1 failed, 832 passed, 37 skipped, 2 xfailed, 1 xpassed — 25:50 total

The failing test was test/backends/test_openai_ollama.py::test_chat_stream

Additionally the VLLM tests failed (skipped) with OOM (it allocated 60GB KV cache then ran out during test warmup). I think there was still another vllm server running, plus Ollama

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 784.00 MiB. GPU 0 has a total capacity of 79.19 GiB of which 449.06 MiB is free. Process 1986221 has 58.06 MiB memory in use. Process 1989725 has 1.09 GiB memory in use. Process 2000937 has 5.33 GiB memory in use. Including non-PyTorch memory, this process has 72.27 GiB memory in use. Of the allocated memory 69.51 GiB is allocated by PyTorch, with 222.00 MiB allocated in private pools (e.g., CUDA Graphs), and 272.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I need to double-check the job submission parms - my final submission was with:

bsub -J "mellea_-_all_tests" -q normal -G grp_runtime -cwd "/proj/dmfexp/eiger/users/jonesn/mellea-d" -o "/proj/dmfexp/eiger/users/jonesn/mellea-d/job_logs/mellea_-_all_tests_%J.stdout" -e "/proj/dmfexp/eiger/users/jonesn/mellea-d/job_logs/mellea_-_all_tests_%J.stderr" -gpu "num=1:mode=shared:j_exclusive=yes:mps=yes" "test/scripts/run_tests_with_ollama.sh --group-by-backend --timeout=1200 -v -rs -s"

ie I was still using mps as I'd needed before -- though I don't think it would cause an issue. - will correct and rerun

@planetf1 : Hmm, this is a little weird. You should never run out of memory in a 80gb GPU for the tests that we are running (with the aggressive cleanup in place). Do you have a stack trace of the skips/failures that I can look at?

planetf1 · 2026-03-23T19:32:16Z

I got:

@planetf1 : Hmm, this is a little weird. You should never run out of memory in a 80gb GPU for the tests that we are running (with the aggressive cleanup in place). Do you have a stack trace of the skips/failures that I can look at?

Caused by the MPS flag. My second run correctly had this off - it affects cuda isolation ....

planetf1 · 2026-03-23T19:34:57Z

@ajbozarth I think the test you saw fail is flaky. It's marked qualitative and there's probably a race with ollama not handling in parallel? Maybe worth checking if already issue and if not opening one.

ajbozarth · 2026-03-23T20:12:16Z

I ran

bash ./test/scripts/run_tests_with_ollama.sh --group-by-backend --timeout=1200 -v -rs -s

inside of

bsub -Is -n 1 -G grp_preemptable -q preemptable -gpu "num=1/task:mode=shared:mps=no:j_exclusive=yes:gvendor=nvidia" /bin/bash

and got

==================================== ERRORS ====================================
_______________________ ERROR at setup of test_think_big _______________________

gh_run = 0

    @pytest.fixture(scope="module")
    def m_session(gh_run):
        """Start default Mellea's session."""
        if gh_run == 1:  # on github
            m = start_session(
                "ollama", model_id=MODEL_ID, model_options={ModelOption.MAX_NEW_TOKENS: 5}
            )
        else:
>           m = start_session("ollama", model_id=MODEL_ID)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

test/stdlib/sampling/test_think_budget_forcing.py:31: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mellea/stdlib/session.py:241: in start_session
    backend = backend_class(model_id, model_options=model_options, **backend_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <mellea.backends.ollama.OllamaModelBackend object at 0x14f9c678ba70>
model_id = ModelIdentifier(hf_model_name='openai/gpt-oss-20b', ollama_name='gpt-oss:20b', watsonx_name=None, mlx_name=None, openai_name=None, bedrock_name='openai.gpt-oss-20b', hf_tokenizer_name=None)
formatter = None, base_url = None, model_options = None

    def __init__(
        self,
        model_id: str | ModelIdentifier = model_ids.IBM_GRANITE_4_MICRO_3B,
        formatter: ChatFormatter | None = None,
        base_url: str | None = None,
        model_options: dict | None = None,
    ):
        """Initialize an Ollama backend, connecting to the server and pulling the model if needed."""
        super().__init__(
            model_id=model_id,
            formatter=(
                formatter
                if formatter is not None
                else TemplateFormatter(model_id=model_id)
            ),
            model_options=model_options,
        )
        # Run the ollama model id accessor early, so that an Assertion fails immediately if we cannot find an ollama model id for the provided ModelIdentifier.
        self._get_ollama_model_id()
    
        # Setup the client and ensure that we have the model available.
        self._base_url = base_url
        self._client = ollama.Client(base_url)
    
        self._client_cache = ClientCache(2)
    
        # Call once to set up an async client and prepopulate the cache.
        _ = self._async_client
    
        if not self._check_ollama_server():
            err = f"could not create OllamaModelBackend: ollama server not running at {base_url}"
            FancyLogger.get_logger().error(err)
            raise Exception(err)
        if not self._pull_ollama_model():
            err = f"could not create OllamaModelBackend: {self._get_ollama_model_id()} could not be pulled from ollama library"
            FancyLogger.get_logger().error(err)
>           raise Exception(err)
E           Exception: could not create OllamaModelBackend: gpt-oss:20b could not be pulled from ollama library

mellea/backends/ollama.py:97: Exception
------------------------------ Captured log setup ------------------------------
ERROR    fancy_logger:ollama.py:96 could not create OllamaModelBackend: gpt-oss:20b could not be pulled from ollama library
_____________________ ERROR at setup of test_think_little ______________________

gh_run = 0

    @pytest.fixture(scope="module")
    def m_session(gh_run):
        """Start default Mellea's session."""
        if gh_run == 1:  # on github
            m = start_session(
                "ollama", model_id=MODEL_ID, model_options={ModelOption.MAX_NEW_TOKENS: 5}
            )
        else:
>           m = start_session("ollama", model_id=MODEL_ID)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

test/stdlib/sampling/test_think_budget_forcing.py:31: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mellea/stdlib/session.py:241: in start_session
    backend = backend_class(model_id, model_options=model_options, **backend_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <mellea.backends.ollama.OllamaModelBackend object at 0x14f9c678ba70>
model_id = ModelIdentifier(hf_model_name='openai/gpt-oss-20b', ollama_name='gpt-oss:20b', watsonx_name=None, mlx_name=None, openai_name=None, bedrock_name='openai.gpt-oss-20b', hf_tokenizer_name=None)
formatter = None, base_url = None, model_options = None

    def __init__(
        self,
        model_id: str | ModelIdentifier = model_ids.IBM_GRANITE_4_MICRO_3B,
        formatter: ChatFormatter | None = None,
        base_url: str | None = None,
        model_options: dict | None = None,
    ):
        """Initialize an Ollama backend, connecting to the server and pulling the model if needed."""
        super().__init__(
            model_id=model_id,
            formatter=(
                formatter
                if formatter is not None
                else TemplateFormatter(model_id=model_id)
            ),
            model_options=model_options,
        )
        # Run the ollama model id accessor early, so that an Assertion fails immediately if we cannot find an ollama model id for the provided ModelIdentifier.
        self._get_ollama_model_id()
    
        # Setup the client and ensure that we have the model available.
        self._base_url = base_url
        self._client = ollama.Client(base_url)
    
        self._client_cache = ClientCache(2)
    
        # Call once to set up an async client and prepopulate the cache.
        _ = self._async_client
    
        if not self._check_ollama_server():
            err = f"could not create OllamaModelBackend: ollama server not running at {base_url}"
            FancyLogger.get_logger().error(err)
            raise Exception(err)
        if not self._pull_ollama_model():
            err = f"could not create OllamaModelBackend: {self._get_ollama_model_id()} could not be pulled from ollama library"
            FancyLogger.get_logger().error(err)
>           raise Exception(err)
E           Exception: could not create OllamaModelBackend: gpt-oss:20b could not be pulled from ollama library

mellea/backends/ollama.py:97: Exception
=========================== short test summary info ============================
SKIPPED [1] test/backends/test_openai_vllm.py:149: vLLM process not available: vLLM server exited before startup (code 1).
SKIPPED [1] test/backends/test_openai_vllm.py:156: vLLM process not available: vLLM server exited before startup (code 1).
SKIPPED [1] test/backends/test_openai_vllm.py:178: vLLM process not available: vLLM server exited before startup (code 1).
SKIPPED [1] test/backends/test_openai_vllm.py:186: vLLM process not available: vLLM server exited before startup (code 1).
SKIPPED [1] test/backends/test_openai_vllm.py:196: vLLM process not available: vLLM server exited before startup (code 1).
SKIPPED [1] test/backends/test_openai_vllm.py:229: vLLM process not available: vLLM server exited before startup (code 1).
SKIPPED [1] test/backends/test_openai_vllm.py:241: vLLM process not available: vLLM server exited before startup (code 1).
SKIPPED [17] test/conftest.py:786: Skipping test: watsonx API key not found in environment
SKIPPED [1] test/backends/test_bedrock.py:27: Skipping Bedrock backend tests if $AWS_BEARER_TOKEN_BEDROCK is not set.
SKIPPED [1] test/plugins/test_manager.py:33: must pass --disable-default-mellea-plugins for this test
SKIPPED [1] test/plugins/test_manager.py:45: must pass --disable-default-mellea-plugins for this test
SKIPPED [1] test/stdlib/components/docs/test_richdocument.py:100: unconditional skip
SKIPPED [1] test/stdlib/requirements/test_reqlib_python.py:216: Sandbox tests require llm-sandbox[docker] and Docker to be available
SKIPPED [1] test/stdlib/requirements/test_reqlib_python.py:227: Sandbox tests require llm-sandbox[docker] and Docker to be available
SKIPPED [1] test/stdlib/requirements/test_reqlib_python.py:240: Sandbox tests require llm-sandbox[docker] and Docker to be available
SKIPPED [1] test/telemetry/test_tracing_backend.py:67: Telemetry not initialized
SKIPPED [1] test/telemetry/test_tracing_backend.py:113: Telemetry not initialized
SKIPPED [1] test/telemetry/test_tracing_backend.py:158: Telemetry not initialized
SKIPPED [1] test/telemetry/test_tracing_backend.py:209: Telemetry not initialized
SKIPPED [1] test/telemetry/test_tracing_backend.py:245: Telemetry not initialized
SKIPPED [1] test/telemetry/test_tracing_backend.py:279: Telemetry not initialized
= 831 passed, 37 skipped, 2 xfailed, 1 xpassed, 131 warnings, 2 errors in 1497.04s (0:24:57) =

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

ajbozarth · 2026-03-24T03:05:27Z

Finally did some local testing on my Mac (M1 Max, 32GB):

uv run pytest -v:

845 passed, 81 skipped, 3 deselected, 2 xfailed, 1 xpassed, 112 warnings in 1130.18s (0:18:50)

uv run pytest --group-by-backend -v:

2 failed, 843 passed, 81 skipped, 3 deselected, 2 xfailed, 1 xpassed, 111 warnings in 1119.48s (0:18:39)

(the two failures are common qualitative failures)

I also I ran these after removing the pytest.mark.requires_heavy_ram I added in #623 so this does fix #620 and #630

@avinash2692 I hope you don't mind but I pushed a commit removing those marks and updated the description to auto close those issues

planetf1 · 2026-03-24T11:41:22Z

@ajbozarth there's a ruff format error that needs fixing.

ajbozarth · 2026-03-24T14:05:00Z

@ajbozarth there's a ruff format error that needs fixing.

Odd pre-commit didn't catch that, @avinash2692 I've started another benchmark so you'll need to pull and fix it, sorry

jakelorocco · 2026-03-24T16:58:58Z

test/cli/test_alora_train_integration.py

+        # Cleanup GPU memory
+        base_model.cpu()
+        del model_with_adapter
+        del base_model
+        import gc
+
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+


Should we be cleaning up after the second test as well?

I think so, let me have a look at it.

test/scripts/run_tests_with_ollama.sh

test/stdlib/components/intrinsic/test_core.py

test/conftest.py

jakelorocco · 2026-03-24T17:11:06Z

test/conftest.py

+        # 1. Clear the LRU cache (holds DynamicCache KV tensors on GPU)
+        if hasattr(backend, "_cache") and hasattr(backend._cache, "cache"):


I feel like we should just grab the backend by class; I'm not the biggest fan of this hasattr calls when we know the types of backends we will be processing.

Fair enough. This is just general purpose enough to clear any CUDA memory if we decide to reuse the LRU cache in another backend.

Also, would like to keep this more generic cause I don't want to assume that the user does/does not have hf installed in their mellea installation.

jakelorocco · 2026-03-24T17:12:02Z

test/conftest.py

+    try:
+        import torch
+
+        if torch.cuda.is_available():
+            free_before, total = torch.cuda.mem_get_info()
+            logger.info(
+                f"  GPU before cleanup: {free_before / 1024**3:.1f}GB free "
+                f"/ {total / 1024**3:.1f}GB total"
+            )
+        else:
+            free_before = 0
+


Meta question: is this backend cleanup stuff we should just being doing in the del functions of each backend? Like if I spin up a bunch of huggingface backends in my own code, shouldn't this garbage collection code be executed in those cases as well?

hmm. maybe? Right now, the reason this exisits for hf backends is cause we do tend to hold on to GPU memory in our LRUCache. This could be included in the tear down for hf (and vllm) backends, but we might need to figure out where that would happen in the execution of a mellea program (at the end of a session? are the end on the program?)

but we might need to figure out where that would happen in the execution of a mellea program (at the end of a session? are the end on the program?)

could we put it in a function and let the user (or test) decide when to call it?

++ on Alex's point - it seems like we should call on end of a session, and should ensure we document the behaviour (maybe it's obvious). But given the size of these models being able to explicitly do this may be needed?

I'm happy to let the user call it, but the issue is that we will be introducing complexities again asking the user to do GPU mem management. Maybe this is a helper function in the LocalhfBackend that can be used based on the discretion of the user.

Maybe this is a helper function in the LocalhfBackend that can be used based on the discretion of the user.

This was what I was thinking

jakelorocco · 2026-03-24T17:12:49Z

test/conftest.py

+    # Reorder tests by backend if requested
+    if config.getoption("--group-by-backend", default=False):
+        logger = FancyLogger.get_logger()
+        logger.info("Grouping tests by backend (--group-by-backend enabled)")
+
+        # Group items by backend
+        grouped_items = []
+        seen = set()
+
+        for group_name in BACKEND_GROUP_ORDER:
+            marker = BACKEND_GROUPS[group_name]["marker"]
+            group_tests = [
+                item
+                for item in items
+                if item.get_closest_marker(marker) and id(item) not in seen
+            ]
+
+            if group_tests:
+                logger.info(
+                    f"Backend group '{group_name}': {len(group_tests)} tests ({BACKEND_GROUPS[group_name]['description']})"
+                )
+                grouped_items.extend(group_tests)
+                for item in group_tests:
+                    seen.add(id(item))
+
+        # Add tests without backend markers at the end
+        unmarked = [item for item in items if id(item) not in seen]
+        if unmarked:
+            logger.info(f"Unmarked tests: {len(unmarked)} tests")
+            grouped_items.extend(unmarked)
+
+        # Reorder in place
+        items[:] = grouped_items
+        logger.info(f"Total tests reordered: {len(items)}")


If we do switch to using session fixtures for the backends like we do for vllm, this code becomes unnecessary.

hmm, why do you say so? we do still need to group the tests so that we can tear down back ends between fixtures. But maybe I'm missing something here.

jakelorocco · 2026-03-24T17:13:59Z

test/conftest.py

+            if prev_group in ("vllm", "openai_vllm"):
+                try:
+                    shared_backend_defs = (
+                        item.session._fixturemanager._arg2fixturedefs.get(
+                            "shared_vllm_backend"
+                        )
+                    )
+                    if shared_backend_defs:
+                        backend_instance = shared_backend_defs[-1].cached_result[0]
+                        if backend_instance is not None:
+                            cleanup_gpu_backend(
+                                backend_instance, "shared-vllm-transition"
+                            )


Don't the individual fixtures call this as well? Why call it again here?

I don't think this happens for vllm tests cause the backend is at session level and so the teardown is also happening at session level.

planetf1 · 2026-03-25T14:08:56Z

I'm doing some test classification in #742 which, whilst not colliding at a code level (I hope), does effectively depend on getting this change in. So as a general point, I think if we think this improves things, even if there's followups, I'd err on merging. But I will comment on discussions above.

…ent-in-tests

jakelorocco

looks good to me; I think there might be additional improvements that could be done but we should try to get nightly tests running; thanks @avinash2692

…ent-in-tests

…M gates - Remove --isolate-heavy flag, _run_heavy_modules_isolated(), pytest_collection_finish(), and require_gpu_isolation() predicate — superseded by cleanup_gpu_backend() from PR generative-computing#721 - Remove dead requires_gpu/requires_api_key branches from docs/examples/conftest.py - Bump min_vram_gb from 8 → 12 on test_guardian, test_core, test_rag, test_spans — correct gate for 3B base model (6 GB) + adapters + inference overhead; 8 GB was wrong and masked by the now-fixed MPS pool leak - Add adapter accumulation signals to audit-markers skill - Update AGENTS.md, test/README.md, MARKERS_GUIDE.md to remove --isolate-heavy references

avinash2692 and others added 26 commits March 12, 2026 14:56

reduce vllm GPU allocation for tests

c557320

implement backend test grouping via reordering

5e3044f

add gpu cleanup between backend groups

b3a44fe

delay vllm backend creation until after openai vllm group

82662ff

adding explicit served model name for vllm openai test

a1d8ad4

fix: rag intrinscis are not for the hybrid model (I think)

82b15b4

testing a fix in tests for all the gpu issues

338947f

more gpu cleaning

7e17184

Merge remote-tracking branch 'origin/main' into ci/625-memory-managem…

19ab7ab

…ent-in-tests

Merge remote-tracking branch 'origin/main' into ci/625-memory-managem…

4303e1e

…ent-in-tests

Merge remote-tracking branch 'origin/main' into ci/625-memory-managem…

b077f7d

…ent-in-tests

adding docs tooling to mypy exclude

c946a32

removing kv cache also from GPU in cleanup for tests

10757c0

moving test order around and also fixing a fixture bug

88aae8b

Merge remote-tracking branch 'origin/main' into ci/625-memory-managem…

045e0d5

…ent-in-tests

rolling back some changes from exclusive process

524e170

some changes to the error message in vllm and also conftest cleaning

978e5f4

adding an end-to-end script for tests with ollama

06d6e2f

adding a port finder (just in case)

aea0a20

adding direct download of ollama binary from github

87a572b

warm starting ollama

0d189fb

warm starting ollama

eedf6e4

adding cuda paths for ollama

720e853

some extra checks for vllm and teardown

a5acf76

Merge remote-tracking branch 'origin/main' into ci/625-memory-managem…

06dfb86

…ent-in-tests

avinash2692 requested a review from a team as a code owner March 23, 2026 15:32

avinash2692 changed the title ~~Ci/625 memory management in tests~~ test: memory management in tests Mar 23, 2026

test: remove heavy ram pytest marks added in #623

b71dcab

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

This was referenced Mar 24, 2026

Epic: Testing Infrastructure & Strategy Overhaul #726

Open

test: backend resource cleanup post-PR #721 #736

Open

ruff formatting

ce72ec0

avinash2692 changed the title ~~test: memory management in tests~~ CI: memory management in tests Mar 24, 2026

avinash2692 changed the title ~~CI: memory management in tests~~ ci: memory management in tests Mar 24, 2026

jakelorocco reviewed Mar 24, 2026

View reviewed changes

planetf1 mentioned this pull request Mar 25, 2026

test: agent skills infrastructure and marker taxonomy audit (#727, #728) #742

Open

9 tasks

small changes to script and adding cleaup to guardian and core

ee3cad5

github-actions bot added the integrations label Mar 25, 2026

avinash2692 added 4 commits March 25, 2026 14:42

Merge remote-tracking branch 'origin/main' into ci/625-memory-managem…

ca28257

…ent-in-tests

making log dir more easy to set

4b9d9e7

increasing ollama startup to 2 mins

194f5c0

adding pytest-json-report

962d1ae

jakelorocco approved these changes Mar 26, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into ci/625-memory-managem…

9f69a9b

…ent-in-tests

avinash2692 enabled auto-merge March 26, 2026 16:22

avinash2692 added this pull request to the merge queue Mar 26, 2026

Merged via the queue into main with commit 19fd5c8 Mar 26, 2026
8 checks passed

		# 1. Clear the LRU cache (holds DynamicCache KV tensors on GPU)
		if hasattr(backend, "_cache") and hasattr(backend._cache, "cache"):

Conversation

avinash2692 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI: memory management in tests

Type of PR

Issues that it fixes

Problem

Solution

Files changed

How to run

Testing

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

mergify bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

Uh oh!

avinash2692 commented Mar 23, 2026

Uh oh!

planetf1 commented Mar 23, 2026

Uh oh!

planetf1 commented Mar 23, 2026

Uh oh!

ajbozarth commented Mar 23, 2026

Uh oh!

ajbozarth commented Mar 24, 2026

Uh oh!

planetf1 commented Mar 24, 2026

Uh oh!

ajbozarth commented Mar 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 commented Mar 25, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

avinash2692 commented Mar 23, 2026 •

edited

Loading

mergify bot commented Mar 23, 2026 •

edited

Loading