Skip to content

Conversation

@zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Aug 11, 2021

  • batch_isend_irecv returns a list of requests instead of a single request
  • remove some unused variables

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 11, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 87ee5ac (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (1/1)

Step: "Test PyTorch" (full log | diagnosis details | 🔁 rerun)

2021-09-15T06:59:37.0188258Z AssertionError: Fa... dtypes. Got dtypes torch.float32 and torch.int64.
2021-09-15T06:59:37.0177917Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 111, in wrapper
2021-09-15T06:59:37.0178771Z     return func(*args, **kwargs)
2021-09-15T06:59:37.0179748Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 2848, in wrapper
2021-09-15T06:59:37.0180547Z     return func(*args, **kwargs)
2021-09-15T06:59:37.0181737Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 4644, in test_post_localSGD_optimizer_parity
2021-09-15T06:59:37.0182802Z     self.assertEqual(p1.data, p2.data)
2021-09-15T06:59:37.0183895Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1875, in assertEqual
2021-09-15T06:59:37.0185091Z     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
2021-09-15T06:59:37.0186008Z   File "/opt/conda/lib/python3.6/unittest/case.py", line 682, in assertTrue
2021-09-15T06:59:37.0186780Z     raise self.failureException(msg)
2021-09-15T06:59:37.0188258Z AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different dtypes. Got dtypes torch.float32 and torch.int64.
2021-09-15T06:59:37.0189193Z 
2021-09-15T06:59:37.0189426Z 
2021-09-15T06:59:37.0189731Z 		
2021-09-15T06:59:37.0190295Z ✅ 534 Passed
2021-09-15T06:59:37.0190810Z 💨 197 Skipped
2021-09-15T06:59:37.0191288Z 🚨 1 Failed
2021-09-15T06:59:37.0396276Z ##[group]Run # Remove any previous test reports if they exist
2021-09-15T06:59:37.0397180Z �[36;1m# Remove any previous test reports if they exist�[0m
2021-09-15T06:59:37.0397806Z �[36;1mrm -f test-reports-*.zip�[0m
2021-09-15T06:59:37.0398476Z �[36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'�[0m

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Comment on lines 1171 to 1173
reqs = dist.batch_isend_irecv([send_op])
for req in reqs:
req.wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this unit test passed since the code after dist.batch_isend_irecv was never executed and batch_isend_irecv just threw an exception. If so, should we just call dist.batch_isend_irecv here and ignore the return value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, will update soon.

@facebook-github-bot
Copy link
Contributor

@pritamdamania87 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zasdfgbnm
Copy link
Collaborator Author

@pritamdamania87 I have resolved your review comment.

@codecov
Copy link

codecov bot commented Sep 15, 2021

Codecov Report

Merging #63112 (87ee5ac) into master (d6d286f) will decrease coverage by 0.07%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master   #63112      +/-   ##
==========================================
- Coverage   66.47%   66.39%   -0.08%     
==========================================
  Files         725      725              
  Lines       93457    93448       -9     
==========================================
- Hits        62122    62046      -76     
- Misses      31335    31402      +67     

@facebook-github-bot
Copy link
Contributor

@pritamdamania87 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zasdfgbnm zasdfgbnm deleted the fix-nccl-tests branch October 12, 2021 02:53
wconstab pushed a commit that referenced this pull request Oct 20, 2021
Summary:
- `batch_isend_irecv` returns a list of requests instead of a single request
- remove some unused variables

Pull Request resolved: #63112

Reviewed By: pbelevich, wayi1, fduwjj

Differential Revision: D30921265

fbshipit-source-id: e2075925172805d33974ef0de6fb631bdf33b5ea
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants