Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.cuda.memory_allocated to return {} if not initialized #51179

Closed

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Jan 27, 2021

Fixes #49952

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 27, 2021

💊 CI failures summary and remediations

As of commit 776ed5c (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 28 01:55:00 AssertionError: mypy failed: test/test_complex.py:21: error: Signature of "forward" incompatible with supertype "ScriptModule" [override]
Jan 28 01:53:41 ----------------------------------------------------------------------
Jan 28 01:53:49   test_doc_examples (__main__.TestTypeHints) ... ok (7.880s)
Jan 28 01:55:00   test_run_mypy (__main__.TestTypeHints) ... FAIL (70.719s)
Jan 28 01:55:00 
Jan 28 01:55:00 ======================================================================
Jan 28 01:55:00 FAIL [70.719s]: test_run_mypy (__main__.TestTypeHints) [mypy.ini]
Jan 28 01:55:00 ----------------------------------------------------------------------
Jan 28 01:55:00 Traceback (most recent call last):
Jan 28 01:55:00   File "test_type_hints.py", line 171, in test_run_mypy
Jan 28 01:55:00     self.fail(f"mypy failed: {stdout} {stderr}")
Jan 28 01:55:00 AssertionError: mypy failed: test/test_complex.py:21: error: Signature of "forward" incompatible with supertype "ScriptModule"  [override]
Jan 28 01:55:00 Found 1 error in 1 file (checked 1211 source files)
Jan 28 01:55:00  
Jan 28 01:55:00 
Jan 28 01:55:00 ----------------------------------------------------------------------
Jan 28 01:55:00 Ran 2 tests in 78.599s
Jan 28 01:55:00 
Jan 28 01:55:00 FAILED (failures=1)
Jan 28 01:55:00 
Jan 28 01:55:00 Generating XML reports...
Jan 28 01:55:00 Generated XML report: test-reports/python-unittest/TEST-TestTypeHints-20210128015341.xml

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@malfet malfet requested a review from a team January 27, 2021 04:09
@ngimel
Copy link
Collaborator

ngimel commented Jan 27, 2021

Thanks for the fix!
Can we add tests? Also, another case reported in #49952 still does not work (reports equal memory usage for both gpus)

python -c 'import torch; x=torch.ones(10<<20).to(0); y=torch.ones(10).to(1);print([torch.cuda.memory_allocated(torch.cuda.device(id)) for id in range(torch.cuda.device_count())])'

Weirdly, it works if torch.cuda.device(id) is replaced by id.

@codecov
Copy link

codecov bot commented Jan 27, 2021

Codecov Report

Merging #51179 (d9666fb) into master (ba316a7) will increase coverage by 0.00%.
The diff coverage is 83.33%.

@@           Coverage Diff           @@
##           master   #51179   +/-   ##
=======================================
  Coverage   80.88%   80.88%           
=======================================
  Files        1931     1931           
  Lines      210560   210562    +2     
=======================================
+ Hits       170311   170315    +4     
+ Misses      40249    40247    -2     

@malfet
Copy link
Contributor Author

malfet commented Jan 27, 2021

Weirdly, it works if torch.cuda.device(id) is replaced by id.

Fixed, although according to type annotations, this function is not supposed to be called with torch.cuda.device, which despite the name is not inherited from torch.device, filed #51224 to track that.

@ngimel
Copy link
Collaborator

ngimel commented Jan 27, 2021

Awesome, can you please add tests?

test/test_cuda.py Outdated Show resolved Hide resolved
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in 43f0ccd.

@malfet malfet deleted the malfet/torch.cuda.memory_allocated-return-empty-dict branch September 23, 2021 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

torch.cuda.memory_allocated() doesn't correctly work until the context is initialized
3 participants