release_available_cached_blocks() adds wrong releaseSize to totalReleased and can crash if cur is pointing to last position

### 🐛 Describe the bug

In code below, inside `c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::release_available_cached_blocks()`, `totalReleased += (*cur)->size;` is executed after *cur is released, and can crash if *cur ends up pointing at pool.blocks.end(). I have observed this crash using C++ libTorch 2.7.1 with CUDA 12.8.1, when using `expendable_segments:true` and `max_split_size_mb:2048`, but it's hard to hit reliably as it only triggers a crash when last segment is released.
 
The order of operations inside if statement should be reversed.
```
        if (!(*cur)->expandable_segment_) {
          release_block(*cur, context);
          totalReleased += (*cur)->size;
        }
```
to
```
        if (!(*cur)->expandable_segment_) {
          totalReleased += (*cur)->size;
          release_block(*cur, context);
        }
```

### Versions

N/A (using libTorch release)

cc @ptrblck @msaroufim @eqy @jerryzh168

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release_available_cached_blocks() adds wrong releaseSize to totalReleased and can crash if cur is pointing to last position #159567

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

release_available_cached_blocks() adds wrong releaseSize to totalReleased and can crash if cur is pointing to last position #159567

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions