Skip to content

Conversation

@ailzhang
Copy link
Contributor

@ailzhang ailzhang commented Sep 11, 2020

Stack from ghstack:

Differential Revision: D23698386

@dr-ci
Copy link

dr-ci bot commented Sep 11, 2020

💊 CI failures summary and remediations

As of commit ff5d225 (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/3)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Sep 18 20:02:57 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Sep 18 20:02:57 At: 
Sep 18 20:02:57   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 20:02:57   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 20:02:57  
Sep 18 20:02:57 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 18 20:02:57  
Sep 18 20:02:57 At: 
Sep 18 20:02:57   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 20:02:57   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 20:02:57  
Sep 18 20:02:57 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 18 20:02:57  
Sep 18 20:02:57 At: 
Sep 18 20:02:57   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 20:02:57   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 20:02:57  
Sep 18 20:02:58 ok (1.626s) 
Sep 18 20:02:59   test_return_future_remote (__main__.ProcessGroupRpcTestWithSpawn) ... ok (1.531s) 
Sep 18 20:03:01   test_return_local_rrefs (__main__.ProcessGroupRpcTestWithSpawn) ... ok (1.628s) 
Sep 18 20:03:03   test_rpc_profiling_remote_record_function (__main__.ProcessGroupRpcTestWithSpawn) ... ok (1.633s) 
Sep 18 20:03:04   test_rpc_return_rref (__main__.ProcessGroupRpcTestWithSpawn) ... ok (1.618s) 

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Sep 19 00:58:19 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n return ((int*)(&strtod_l))[argc];\n ^\n1 error generated.\n" }
Sep 19 00:58:18     assert check_overrides(overrides, overridden) 
Sep 19 00:58:18 AssertionError 
Sep 19 00:58:18 Building torch_xla version: 1.6 
Sep 19 00:58:18 XLA Commit ID: 14d617a2ec11f7e579aa7eb39d5331331e2940c3 
Sep 19 00:58:18 PyTorch Commit ID: ff5d225a5cec67aee8999d829ce81e06684dc1ce 
Sep 19 00:58:18 Failed to generate ATEN bindings: ['/var/lib/jenkins/workspace/xla/scripts/generate_code.sh'] 
Sep 19 00:58:19 =================== sccache compilation log =================== 
Sep 19 00:58:19 + cleanup 
Sep 19 00:58:19 + retcode=1 
Sep 19 00:58:19 + set +x 
Sep 19 00:58:19 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n  return ((int*)(&strtod_l))[argc];\n                  ^\n1 error generated.\n" } 
Sep 19 00:58:19  
Sep 19 00:58:19 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Sep 19 00:58:19 Compile requests               5992 
Sep 19 00:58:19 Compile requests executed      3551 
Sep 19 00:58:19 Cache hits                     2359 
Sep 19 00:58:19 Cache misses                   1176 
Sep 19 00:58:19 Cache timeouts                    0 
Sep 19 00:58:19 Cache read errors                 0 
Sep 19 00:58:19 Forced recaches                   0 
Sep 19 00:58:19 Cache write errors                0 

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test (3/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 19 03:10:34 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Sep 19 03:10:34 At: 
Sep 19 03:10:34   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 19 03:10:34   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 19 03:10:34  
Sep 19 03:10:34 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 19 03:10:34  
Sep 19 03:10:34 At: 
Sep 19 03:10:34   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 19 03:10:34   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 19 03:10:34  
Sep 19 03:10:34 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 19 03:10:34  
Sep 19 03:10:34 At: 
Sep 19 03:10:34   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 19 03:10:34   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 19 03:10:34  
Sep 19 03:10:34 ok (1.541s) 
Sep 19 03:10:36   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:577] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 19 03:10:36 [W tensorpipe_agent.cpp:577] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Sep 19 03:10:36 [W tensorpipe_agent.cpp:577] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 19 03:10:36 ok (1.537s) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 65 times.

@ailzhang ailzhang requested review from bhosmer and ezyang September 11, 2020 18:01
ailzhang pushed a commit that referenced this pull request Sep 12, 2020
ghstack-source-id: 8fdfb0d
Pull Request resolved: #44556
ailzhang pushed a commit that referenced this pull request Sep 13, 2020
ghstack-source-id: 4fdb953
Pull Request resolved: #44556

// Ideally we want to test both forward and backward on math kernels but I
// haven't found an easy way to do it. Currently we only test forward here
// and rely on backward tests of each at:: function used in math kernels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely something we should work on making easier, though I'm not exactly sure how. Perhaps APIs for supporting static runtime would make it easier, as they would make it possible to ask for a specific kernel.

Absent machinery like this, I think the easiest way to do this is to just make the math kernel available at another name (i.e., make a new native_functions.yaml entry; and then just do the testing from Python. Maybe @mruberry has some other ideas.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without too much context I'll also vote for (re)using Python.

ailzhang pushed a commit that referenced this pull request Sep 15, 2020
ghstack-source-id: 43a1497
Pull Request resolved: #44556
ailzhang pushed a commit that referenced this pull request Sep 15, 2020
ghstack-source-id: 7c5d79e
Pull Request resolved: #44556
@ailzhang ailzhang requested review from bhosmer and ezyang September 15, 2020 02:22
Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

ailzhang pushed a commit that referenced this pull request Sep 15, 2020
ghstack-source-id: dbed1d7
Pull Request resolved: #44556
@ezyang ezyang requested a review from bdhirsh September 16, 2020 00:09
ailzhang pushed a commit that referenced this pull request Sep 16, 2020
ghstack-source-id: c3c0a16
Pull Request resolved: #44556
ailzhang pushed a commit that referenced this pull request Sep 16, 2020
ghstack-source-id: 87c40aa
Pull Request resolved: #44556
ailzhang pushed a commit that referenced this pull request Sep 16, 2020
ghstack-source-id: 3d40c5e
Pull Request resolved: #44556
@facebook-github-bot
Copy link
Contributor

@ailzhang merged this pull request in 4b42f0b.

loadbxh pushed a commit to loadbxh/Torch that referenced this pull request Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants