fix: hf metrics tests run out of memory by ajbozarth · Pull Request #623 · generative-computing/mellea

ajbozarth · 2026-03-11T17:55:10Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: test_huggingface_token_metrics_integration is too heavy #620

Previous huggingface tests all fell under the high GPU/RAM usage flags and were run in isolation. So we were not aware that by default there is no cleanup happening, causing each test to import a model into memory and leave it there until tests were finished.

This adds a fixture to only load the model once and clean up after the test file finishes.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

github-actions · 2026-03-11T17:55:22Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

mergify · 2026-03-11T17:55:47Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

psschwei · 2026-03-11T17:58:36Z

Would this be useful for other HF tests (in addition to the telemetry ones)?

ajbozarth · 2026-03-11T18:14:43Z

Would this be useful for other HF tests (in addition to the telemetry ones)?

For any test that uses LocalHFBackend but doesn't use the high cpu/ram marks yes, but the existing ones already do have similar fixtures, it was just missed for this new test

jakelorocco · 2026-03-11T18:29:55Z

When you were digging into this, did you figure out why this is happening? I know we have an explicit garbage collection method used when CICD=1: https://github.com/generative-computing/mellea/blob/main/test/conftest.py#L528.

Were you able to run the other huggingface tests when CICD=0 before this? And only this one new tests was causing issues? I don't understand what the difference is between instantiating multiple hf backends in a single test file vs. across several tests that would cause different issues?

ajbozarth · 2026-03-11T19:23:59Z

When you were digging into this, did you figure out why this is happening?

LocalHFBackend loads the model into memory (RAM) and has no built in cleanup. So in the case of my test (which ran twice, streaming, not streaming) it was loading the model into memory for the test then doing it a second time. In addition the model will stay in memory until the python process finished.

This fix copies what other hugging faces tests do by only loading the model in once and making sure to clean it up after the tests are done, otherwise as the pytest suite goes on the memory fills up more and more, this is also why it passes fine when only running the telemetry tests.

Were you able to run the other huggingface tests when CICD=0 before this? And only this one new tests was causing issues? I don't understand what the difference is between instantiating multiple hf backends in a single test file vs. across several tests that would cause different issues?

I looked into this to double check there were no other tests running into this and only my test pushed the RAM over the edge. It does seem like there were already other tests with this issue, I'll address those as well and push an update

ajbozarth · 2026-03-11T20:52:13Z

After some more testing this doesn't actually solve the problem. The core issue is that when a LocalHFBackend is created it loads the model into memory and that model stays there until the process is done. No form of python garbage collection can solve this. AFAIK it can only be solved with the isolation that #605 is working with. I would argue that that PR needs to be updated to isolate any @pytest.mark.huggingface

As it stand this PR does add good practice, but it doesn't solve #620 we can still merge it but I'll be removing the auto-close

Add @pytest.mark.requires_heavy_ram to tests that instantiate LocalHFBackend to address memory leak issues when running these tests in pytest. This ensures tests are skipped on systems without sufficient RAM. Changes: - test/telemetry/test_metrics_backend.py: Added marker to test_huggingface_token_metrics_integration - test/stdlib/test_spans.py: Added marker to module-level pytestmark Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

planetf1

LGTM

ajbozarth · 2026-03-12T15:44:47Z

Updated to flag the problem tests to skip due to heavy ram and opened a follow up issue to address the core problem in the future #630

Once CI passes I will be merging this and #605 together to solve our test issues in the short term for this release

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* fix(vllm): implement shared backend to prevent GPU OOM errors - Add session-scoped shared_vllm_backend fixture using Granite 4 Micro - Update test_vllm.py and test_vllm_tools.py to use shared backend - Fall back to module-scoped backends when --isolate-heavy flag is set - Both modules now use consistent Granite 4 Micro model - Enhance CUDA OOM error message with actionable solutions - Maintains backward compatibility with existing isolation mechanism Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * reduce vllm GPU allocation for tests * implement backend test grouping via reordering * add gpu cleanup between backend groups * delay vllm backend creation until after openai vllm group * adding explicit served model name for vllm openai test * fix: rag intrinscis are not for the hybrid model (I think) * testing a fix in tests for all the gpu issues * more gpu cleaning * adding docs tooling to mypy exclude * removing kv cache also from GPU in cleanup for tests * moving test order around and also fixing a fixture bug * rolling back some changes from exclusive process * some changes to the error message in vllm and also conftest cleaning * adding an end-to-end script for tests with ollama * adding a port finder (just in case) * adding direct download of ollama binary from github * warm starting ollama * warm starting ollama * adding cuda paths for ollama * some extra checks for vllm and teardown * making group by backend default * making the script executable * test: remove heavy ram pytest marks added in #623 Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * ruff formatting * small changes to script and adding cleaup to guardian and core * making log dir more easy to set * increasing ollama startup to 2 mins * adding pytest-json-report --------- Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: avinash.bala@us.ibm.com;0J8455897;AVINASH BALAKRISHNAN <avinashbala@p5-r03-n2.bluevela.rmf.ibm.com> Co-authored-by: Alex Bozarth <ajbozart@us.ibm.com>

fix: hf metrics tests run out of memory

34d8392

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

ajbozarth self-assigned this Mar 11, 2026

ajbozarth requested a review from a team as a code owner March 11, 2026 17:55

ajbozarth mentioned this pull request Mar 11, 2026

fix: restore VSCode test discovery and make GPU isolation opt-in #605

Merged

16 tasks

planetf1 approved these changes Mar 12, 2026

View reviewed changes

ajbozarth mentioned this pull request Mar 12, 2026

creating multiple LocalHFBackend in pytest caused a memory leak #630

Closed

ajbozarth added this pull request to the merge queue Mar 12, 2026

Merged via the queue into generative-computing:main with commit 5411760 Mar 12, 2026
4 checks passed

ajbozarth deleted the fix/hf-fixture branch March 12, 2026 16:15

psschwei mentioned this pull request Mar 20, 2026

chore: use github tooling to build release notes #710

Merged

8 tasks

avinash2692 mentioned this pull request Mar 23, 2026

ci: memory management in tests #721

Merged

7 tasks

ajbozarth added a commit that referenced this pull request Mar 24, 2026

test: remove heavy ram pytest marks added in #623

b71dcab

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: hf metrics tests run out of memory#623

fix: hf metrics tests run out of memory#623
ajbozarth merged 2 commits intogenerative-computing:mainfrom
ajbozarth:fix/hf-fixture

ajbozarth commented Mar 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

psschwei commented Mar 11, 2026

Uh oh!

ajbozarth commented Mar 11, 2026

Uh oh!

jakelorocco commented Mar 11, 2026

Uh oh!

ajbozarth commented Mar 11, 2026

Uh oh!

ajbozarth commented Mar 11, 2026

Uh oh!

planetf1 left a comment

Uh oh!

ajbozarth commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ajbozarth commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

psschwei commented Mar 11, 2026

Uh oh!

ajbozarth commented Mar 11, 2026

Uh oh!

jakelorocco commented Mar 11, 2026

Uh oh!

ajbozarth commented Mar 11, 2026

Uh oh!

ajbozarth commented Mar 11, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ajbozarth commented Mar 11, 2026 •

edited

Loading