Skip to content

fix: hf metrics tests run out of memory#623

Merged
ajbozarth merged 2 commits intogenerative-computing:mainfrom
ajbozarth:fix/hf-fixture
Mar 12, 2026
Merged

fix: hf metrics tests run out of memory#623
ajbozarth merged 2 commits intogenerative-computing:mainfrom
ajbozarth:fix/hf-fixture

Conversation

@ajbozarth
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth commented Mar 11, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Previous huggingface tests all fell under the high GPU/RAM usage flags and were run in isolation. So we were not aware that by default there is no cleanup happening, causing each test to import a model into memory and leave it there until tests were finished.

This adds a fixture to only load the model once and clean up after the test file finishes.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth ajbozarth self-assigned this Mar 11, 2026
@ajbozarth ajbozarth requested a review from a team as a code owner March 11, 2026 17:55
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 11, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@psschwei
Copy link
Copy Markdown
Member

Would this be useful for other HF tests (in addition to the telemetry ones)?

@ajbozarth
Copy link
Copy Markdown
Contributor Author

Would this be useful for other HF tests (in addition to the telemetry ones)?

For any test that uses LocalHFBackend but doesn't use the high cpu/ram marks yes, but the existing ones already do have similar fixtures, it was just missed for this new test

@jakelorocco
Copy link
Copy Markdown
Contributor

When you were digging into this, did you figure out why this is happening? I know we have an explicit garbage collection method used when CICD=1: https://github.com/generative-computing/mellea/blob/main/test/conftest.py#L528.

Were you able to run the other huggingface tests when CICD=0 before this? And only this one new tests was causing issues? I don't understand what the difference is between instantiating multiple hf backends in a single test file vs. across several tests that would cause different issues?

@ajbozarth
Copy link
Copy Markdown
Contributor Author

When you were digging into this, did you figure out why this is happening?

LocalHFBackend loads the model into memory (RAM) and has no built in cleanup. So in the case of my test (which ran twice, streaming, not streaming) it was loading the model into memory for the test then doing it a second time. In addition the model will stay in memory until the python process finished.

This fix copies what other hugging faces tests do by only loading the model in once and making sure to clean it up after the tests are done, otherwise as the pytest suite goes on the memory fills up more and more, this is also why it passes fine when only running the telemetry tests.

Were you able to run the other huggingface tests when CICD=0 before this? And only this one new tests was causing issues? I don't understand what the difference is between instantiating multiple hf backends in a single test file vs. across several tests that would cause different issues?

I looked into this to double check there were no other tests running into this and only my test pushed the RAM over the edge. It does seem like there were already other tests with this issue, I'll address those as well and push an update

@ajbozarth
Copy link
Copy Markdown
Contributor Author

After some more testing this doesn't actually solve the problem. The core issue is that when a LocalHFBackend is created it loads the model into memory and that model stays there until the process is done. No form of python garbage collection can solve this. AFAIK it can only be solved with the isolation that #605 is working with. I would argue that that PR needs to be updated to isolate any @pytest.mark.huggingface

As it stand this PR does add good practice, but it doesn't solve #620 we can still merge it but I'll be removing the auto-close

Add @pytest.mark.requires_heavy_ram to tests that instantiate LocalHFBackend
to address memory leak issues when running these tests in pytest. This ensures
tests are skipped on systems without sufficient RAM.

Changes:
- test/telemetry/test_metrics_backend.py: Added marker to test_huggingface_token_metrics_integration
- test/stdlib/test_spans.py: Added marker to module-level pytestmark

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ajbozarth
Copy link
Copy Markdown
Contributor Author

Updated to flag the problem tests to skip due to heavy ram and opened a follow up issue to address the core problem in the future #630

Once CI passes I will be merging this and #605 together to solve our test issues in the short term for this release

@ajbozarth ajbozarth added this pull request to the merge queue Mar 12, 2026
Merged via the queue into generative-computing:main with commit 5411760 Mar 12, 2026
4 checks passed
@ajbozarth ajbozarth deleted the fix/hf-fixture branch March 12, 2026 16:15
@avinash2692 avinash2692 mentioned this pull request Mar 23, 2026
7 tasks
ajbozarth added a commit that referenced this pull request Mar 24, 2026
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 26, 2026
* fix(vllm): implement shared backend to prevent GPU OOM errors

- Add session-scoped shared_vllm_backend fixture using Granite 4 Micro
- Update test_vllm.py and test_vllm_tools.py to use shared backend
- Fall back to module-scoped backends when --isolate-heavy flag is set
- Both modules now use consistent Granite 4 Micro model
- Enhance CUDA OOM error message with actionable solutions
- Maintains backward compatibility with existing isolation mechanism

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* reduce vllm GPU allocation for tests

* implement backend test grouping via reordering

* add gpu cleanup between backend groups

* delay vllm backend creation until after openai vllm group

* adding explicit served model name for vllm openai test

* fix: rag intrinscis are not for the hybrid model (I think)

* testing a fix in tests for all the gpu issues

* more gpu cleaning

* adding docs tooling to mypy exclude

* removing kv cache also from GPU in cleanup for tests

* moving test order around and also fixing a fixture bug

* rolling back some changes from exclusive process

* some changes to the error message in vllm and also conftest cleaning

* adding an end-to-end script for tests with ollama

* adding a port finder (just in case)

* adding direct download of ollama binary from github

* warm starting ollama

* warm starting ollama

* adding cuda paths for ollama

* some extra checks for vllm and teardown

* making group by backend default

* making the script executable

* test: remove heavy ram pytest marks added in #623

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

* ruff formatting

* small changes to script and adding cleaup to guardian and core

* making log dir more easy to set

* increasing ollama startup to 2 mins

* adding pytest-json-report

---------

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: avinash.bala@us.ibm.com;0J8455897;AVINASH BALAKRISHNAN <avinashbala@p5-r03-n2.bluevela.rmf.ibm.com>
Co-authored-by: Alex Bozarth <ajbozart@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants