From 003c30ba824861603b2ea99395771af995c51eef Mon Sep 17 00:00:00 2001
From: Luca Wehrstedt <lcw@fb.com>
Date: Thu, 10 Dec 2020 03:45:30 -0800
Subject: [PATCH] Fix FutureNCCL's completed() disagreeing with wait() (#48503)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48503

This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed).

 ---

My impression is that one property of the upstream Future class is that once .wait() returns, or once a callback is invoked, then .completed() should return True. This was not the case for FutureNCCL because .wait() would return immediately, and callbacks would be invoked inline, but .completed() could return False if the CUDA async operations hadn't completed yet.

That was odd and confusing. Since there are other ways for users to check the status of CUDA operations (if they really need, and typically I don't think it's so common), perhaps it's best to avoid checking the status of CUDA events in .completed().
ghstack-source-id: 118180028

Test Plan: Unit tests

Reviewed By: mrshenli

Differential Revision: D25180531

fbshipit-source-id: e1207f6b91f010f278923cc5fec1190d0fcdab30
---
 torch/lib/c10d/ProcessGroupNCCL.hpp | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/torch/lib/c10d/ProcessGroupNCCL.hpp b/torch/lib/c10d/ProcessGroupNCCL.hpp
index 383fc21fa0d8..8f19cb280d82 100644
--- a/torch/lib/c10d/ProcessGroupNCCL.hpp
+++ b/torch/lib/c10d/ProcessGroupNCCL.hpp
@@ -339,15 +339,8 @@ class ProcessGroupNCCL : public ProcessGroup {
       return fut;
     }
 
-    // Checks cudaEventQuery with cudaEvents. Returns true if a FutureError was
-    // recorded or the entire operation is completed on the GPU.
     bool completed() const override {
-      if (error_) {
-        return true;
-      }
-      // Checking the work's corresponding CUDA events' status
-      auto ret = cudaEventQuery((*cudaEvents_)[0]);
-      return ret != cudaErrorNotReady || ret == cudaSuccess;
+      return true;
     }
 
     bool hasValue() const override {