Skip to content

fix: flush MPS cache in alora test GPU cleanup (#790)#800

Merged
planetf1 merged 2 commits intogenerative-computing:mainfrom
planetf1:fix/mps-cache-cleanup-790
Apr 10, 2026
Merged

fix: flush MPS cache in alora test GPU cleanup (#790)#800
planetf1 merged 2 commits intogenerative-computing:mainfrom
planetf1:fix/mps-cache-cleanup-790

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Apr 9, 2026

Flush MPS cache in alora test GPU cleanup

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

PR #765 added CUDA GPU cleanup to the alora integration tests, but the
MPS equivalent was missing. On Apple Silicon with MPS-capable PyTorch,
GPU memory isn't reclaimed between tests.

Changes in this PR:

  • Added torch.mps.empty_cache() after the CUDA cleanup blocks in both
    test_alora_training_integration and test_lora_training_integration
  • Updated cleanup comment to say "device caches" instead of "CUDA cache"

Mirrors the existing pattern in test/conftest.py:373-374.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@github-actions github-actions Bot added the bug Something isn't working label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@planetf1 planetf1 marked this pull request as ready for review April 9, 2026 13:58
@planetf1 planetf1 requested a review from a team as a code owner April 9, 2026 13:58
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract this into a function and then utilize it everywhere we currently call this pattern? For instance, we do a very similar garbage collection dance pytest_runtest_setup in test/conftest.py without the mps cleanup. But in other cleanup functions in that same file, we do the mps style cleanup as well.

Or, are there reasons we might not want to do mps cleanup every time it's available?

)

Add torch.mps.empty_cache() after CUDA cleanup blocks in both
alora integration tests, matching the existing conftest pattern.
Prevents MPS memory from accumulating between tests on Apple Silicon.
Consolidate the duplicated gc.collect + CUDA/MPS cache flush pattern
into a single flush_device_caches() function in test/conftest.py.

- Replaces 4 inline flush sites with a single call
- Adds MPS support to sites that previously only handled CUDA
  (pytest_runtest_setup backend transitions, memory_cleaner fixture)
- Fixes a bug where gc.collect() was conditional on CUDA availability
  in pytest_runtest_setup (now runs unconditionally)
- Adds torch.mps.synchronize() for parity with CUDA synchronize()
- Enriches cleanup_gpu_backend() VRAM logging: device-aware reporting
  for both CUDA (free/total/allocated/reserved/fragmentation) and MPS
  (allocated/max), with reclaimed bytes on both paths
- Removes unused shutil/sys imports from test_alora_train_integration
@planetf1 planetf1 force-pushed the fix/mps-cache-cleanup-790 branch from a3e95ae to 29922a3 Compare April 10, 2026 06:37
@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Apr 10, 2026

@jakelorocco Done - I think the cleanup is safe everywhere— extracted a flush_device_caches() helper and replaced all 5 calls including MPS support to the two places that were CUDA-only. cleanup_gpu_backend VRAM logging extended to cuda, though the data available differs (and is the inverse). Keep it simple

Site Location Before After
1 conftest.py cleanup_gpu_backend() Inline gc.collect() x2 + CUDA flush + MPS empty_cache() only; VRAM logging CUDA-only (free/total) Calls flush_device_caches(); VRAM logging device-aware — CUDA reports free/total/allocated/reserved/fragmentation, MPS reports allocated/max; both report reclaimed bytes
2 conftest.py pytest_runtest_setup() gc.collect() inside if torch.cuda.is_available() — no GC or flush on MPS-only machines Calls flush_device_caches() — GC runs unconditionally, flushes both CUDA and MPS
3 conftest.py memory_cleaner() gc.collect() x2 + CUDA flush only; no MPS support Calls flush_device_caches() — adds MPS flush
4 test_alora_train_integration.py test_alora_training_integration() Inline gc.collect() x2 + CUDA flush + MPS empty_cache(); model-specific cleanup (accelerate hooks, del adapter, move to CPU) Model-specific cleanup unchanged; inline flush replaced by flush_device_caches() (adds mps.synchronize())
5 test_alora_train_integration.py test_lora_training_integration() Inline gc.collect() x2 + CUDA flush + MPS empty_cache() Calls flush_device_caches() (adds mps.synchronize())

@planetf1 planetf1 enabled auto-merge April 10, 2026 06:41
@planetf1 planetf1 requested a review from jakelorocco April 10, 2026 06:41
@planetf1
Copy link
Copy Markdown
Contributor Author

Have also run on LSF as well as locally (macOS). Just one qualitative failure - test/backends/test_ollama.py — one of the 4 raw prompts ("What is 2+2?") returned an empty string. The other 3 prompts got valid responses. So an LLM qual failure -- ie happy with test run

@planetf1 planetf1 added this pull request to the merge queue Apr 10, 2026
Merged via the queue into generative-computing:main with commit 839eead Apr 10, 2026
4 checks passed
@planetf1 planetf1 deleted the fix/mps-cache-cleanup-790 branch April 10, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: MPS cache not flushed in alora test GPU cleanup

2 participants