Make DeviceCachingAllocator's error handling more defensive and a bit easier to read #51158

mcarilli · 2021-01-27T00:02:32Z

^
Currently, alloc_block's error handling has a couple (imo) minor flaws. It might clear the error state even if the error had nothing to do with memory allocation. It might also clear the error state even if it didn't attempt a cudaMalloc, meaning it might clear an error state that came from some completely unrelated earlier cuda call.

The diffs and comments are the best explanation of my preferred (new) error-checking policy.

The diffs add very little work to the common (successful, allocation satisfied by existing block) hot path. Most of the additional logic occurs in alloc_block, which is a slow path anyway (it tries cudaMalloc).

facebook-github-bot · 2021-01-27T00:02:41Z

💊 CI failures summary and remediations

As of commit 402f3ea (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: GitHub Actions - quick-checks

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

codecov · 2021-01-27T03:36:53Z

Codecov Report

Merging #51158 (402f3ea) into master (5748410) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #51158   +/-   ##
=======================================
  Coverage   80.86%   80.87%           
=======================================
  Files        1931     1931           
  Lines      210542   210542           
=======================================
+ Hits       170261   170266    +5     
+ Misses      40281    40276    -5

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-28T18:58:59Z

@ezyang merged this pull request in cedfa4c.

mcarilli added 3 commits January 26, 2021 16:45

compiles

fe3ae60

Merge remote-tracking branch 'upstream/master' into allocator_reserve

91d7920

Use existing hot-path assert

402f3ea

facebook-github-bot added the cla signed label Jan 27, 2021

mcarilli changed the title ~~Make caching allocator's error handling more defensive and a bit easier to read~~ Make DeviceCachingAllocator's error handling more defensive and a bit easier to read Jan 27, 2021

mcarilli requested review from colesbury, ngimel and ezyang January 27, 2021 00:03

pytorchbot added the open source label Jan 27, 2021

ezyang approved these changes Jan 27, 2021

View reviewed changes

facebook-github-bot reviewed Jan 27, 2021

View reviewed changes

facebook-github-bot closed this in cedfa4c Jan 28, 2021

facebook-github-bot added the Merged label Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make DeviceCachingAllocator's error handling more defensive and a bit easier to read #51158

Make DeviceCachingAllocator's error handling more defensive and a bit easier to read #51158

mcarilli commented Jan 27, 2021

facebook-github-bot commented Jan 27, 2021 •

edited

codecov bot commented Jan 27, 2021 •

edited

facebook-github-bot left a comment

facebook-github-bot commented Jan 28, 2021

Make DeviceCachingAllocator's error handling more defensive and a bit easier to read #51158

Make DeviceCachingAllocator's error handling more defensive and a bit easier to read #51158

Conversation

mcarilli commented Jan 27, 2021

facebook-github-bot commented Jan 27, 2021 • edited

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

codecov bot commented Jan 27, 2021 • edited

Codecov Report

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 28, 2021

facebook-github-bot commented Jan 27, 2021 •

edited

codecov bot commented Jan 27, 2021 •

edited