Skip to content

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Nov 8, 2023

Stack from ghstack (oldest at bottom):

The previous documentation did not appear to accurately describe
the actual semantics in CUDA caching allocator.

When you record stream, we only record a stream use:

  void recordStream(Block* block, cuda::CUDAStream stream) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    if (stream.stream() == block->stream) {
      // ignore uses on the allocation stream, since those don't require any
      // special synchronization
      return;
    }
    block->stream_uses.insert(stream);
  }

It is only at deallocation time when we actually install an event on
stream uses that we will subsequently query to determine if the block
can be reused or not.

Signed-off-by: Edward Z. Yang ezyang@meta.com

The previous documentation did not appear to accurately describe
the actual semantics in CUDA caching allocator.

When you record stream, we only record a stream use:

```
  void recordStream(Block* block, cuda::CUDAStream stream) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    if (stream.stream() == block->stream) {
      // ignore uses on the allocation stream, since those don't require any
      // special synchronization
      return;
    }
    block->stream_uses.insert(stream);
  }
```

It is only at deallocation time when we actually install an event on
stream uses that we will subsequently query to determine if the block
can be reused or not.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113282

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 958446d with merge base 9e6e958 (image):

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Nov 8, 2023
The previous documentation did not appear to accurately describe
the actual semantics in CUDA caching allocator.

When you record stream, we only record a stream use:

```
  void recordStream(Block* block, cuda::CUDAStream stream) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    if (stream.stream() == block->stream) {
      // ignore uses on the allocation stream, since those don't require any
      // special synchronization
      return;
    }
    block->stream_uses.insert(stream);
  }
```

It is only at deallocation time when we actually install an event on
stream uses that we will subsequently query to determine if the block
can be reused or not.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 46c3c2c
Pull Request resolved: #113282
@ezyang ezyang requested review from colesbury and janeyx99 November 8, 2023 18:05
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@ezyang
Copy link
Contributor Author

ezyang commented Nov 8, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2023
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@ezyang ezyang added release notes: cuda release notes category topic: docs topic category labels Nov 8, 2023
@ezyang
Copy link
Contributor Author

ezyang commented Nov 8, 2023

@pytorchbot merge -f "only the doc job matters"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/2421/head branch November 12, 2023 15:24
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…ch#113282)

The previous documentation did not appear to accurately describe
the actual semantics in CUDA caching allocator.

When you record stream, we only record a stream use:

```
  void recordStream(Block* block, cuda::CUDAStream stream) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    if (stream.stream() == block->stream) {
      // ignore uses on the allocation stream, since those don't require any
      // special synchronization
      return;
    }
    block->stream_uses.insert(stream);
  }
```

It is only at deallocation time when we actually install an event on
stream uses that we will subsequently query to determine if the block
can be reused or not.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: pytorch#113282
Approved by: https://github.com/Skylion007, https://github.com/albanD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category topic: docs topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants