Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework compat bindings. #47863

Closed
wants to merge 9 commits into from
Closed

Rework compat bindings. #47863

wants to merge 9 commits into from

Conversation

robieta
Copy link

@robieta robieta commented Nov 12, 2020

Stack from ghstack:

Differential Revision: D25199261

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Nov 12, 2020

💊 CI failures summary and remediations

As of commit cf28c05 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_bazel_build (1/1)

Step: "Bazel Build" (full log | diagnosis details | 🔁 rerun)

Dec 01 18:27:09 FAILED: Build did NOT complete successfully
Dec 01 18:27:09    static char *vparameterStr[7] = { "v", "vv", "", "vv", "v", "vvv", "" }; 
Dec 01 18:27:09                 ^~~~~~~~~~~~~ 
Dec 01 18:27:09 ERROR: missing input file 'external/sleef/src/libm/sleeflibm_header.h.org', owner: '@sleef//:src/libm/sleeflibm_header.h.org' 
Dec 01 18:27:09 ERROR: /var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/external/sleef/BUILD.bazel:178:1: @sleef//:sleef_h: missing input file '@sleef//:src/libm/sleeflibm_header.h.org' 
Dec 01 18:27:09 Target //:torch failed to build 
Dec 01 18:27:09 Use --verbose_failures to see the command lines of failed build steps. 
Dec 01 18:27:09 ERROR: /var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/external/sleef/BUILD.bazel:178:1 1 input file(s) do not exist 
Dec 01 18:27:09 INFO: Elapsed time: 112.353s, Critical Path: 1.40s 
Dec 01 18:27:09 INFO: 196 processes: 196 processwrapper-sandbox. 
Dec 01 18:27:09 FAILED: Build did NOT complete successfully 
Dec 01 18:27:09 FAILED: Build did NOT complete successfully 
Dec 01 18:27:09 + cleanup 
Dec 01 18:27:09 + retcode=1 
Dec 01 18:27:09 + set +x 
Dec 01 18:27:09 =================== sccache compilation log =================== 
Dec 01 18:27:09 ERROR 2020-12-01T18:25:25Z: sccache::server: ["null"] fatal error: Permission denied (os error 13) at path "/dev/.tmpXDOmUG" 
Dec 01 18:27:09  
Dec 01 18:27:09 ERROR 2020-12-01T18:25:25Z: sccache::server: ["null"] 	Permission denied (os error 13) at path "/dev/.tmpXDOmUG" 
Dec 01 18:27:09  
Dec 01 18:27:09 ERROR 2020-12-01T18:25:25Z: sccache::server: ["null"] fatal error: Permission denied (os error 13) at path "/dev/.tmp5JCkDk" 
Dec 01 18:27:09  

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun) ❄️

Dec 01 20:37:13 RuntimeError: Process 0 terminated or timed out after 118.06462502479553 seconds
Dec 01 20:37:13 ====================================================================== 
Dec 01 20:37:13 ERROR [118.104s]: test_function_not_on_callee (__main__.TensorPipeRpcTestWithSpawn) 
Dec 01 20:37:13 ---------------------------------------------------------------------- 
Dec 01 20:37:13 Traceback (most recent call last): 
Dec 01 20:37:13   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 278, in wrapper 
Dec 01 20:37:13     self._join_processes(fn) 
Dec 01 20:37:13   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 395, in _join_processes 
Dec 01 20:37:13     self._check_return_codes(elapsed_time) 
Dec 01 20:37:13   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 436, in _check_return_codes 
Dec 01 20:37:13     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time)) 
Dec 01 20:37:13 RuntimeError: Process 0 terminated or timed out after 118.06462502479553 seconds 
Dec 01 20:37:13  
Dec 01 20:37:13 ---------------------------------------------------------------------- 
Dec 01 20:37:13 Ran 369 tests in 1587.130s 
Dec 01 20:37:13  
Dec 01 20:37:13 FAILED (errors=1, skipped=31) 
Dec 01 20:37:13  
Dec 01 20:37:13 Generating XML reports... 
Dec 01 20:37:13 Generated XML report: test-reports/dist-gloo/TEST-TensorPipeDdpComparisonTestWithSpawn-20201201201046.xml 
Dec 01 20:37:13 Generated XML report: test-reports/dist-gloo/TEST-TensorPipeDdpUnderDistAutogradTestWithSpawn-20201201201046.xml 
Dec 01 20:37:13 Generated XML report: test-reports/dist-gloo/TEST-TensorPipeDistAutogradTestWithSpawn-20201201201046.xml 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 31 times.

# load will automatically search /usr/include, but not conda include.
EXTRA_INCLUDE_PATHS.append(os.path.join(CONDA_PREFIX, "include"))


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be nice to have a Note explaining at a high level what's going on here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is "temporary" stuff but because it interacts nontrivially with some code that isn't in PyTorch itself, it will be harder for other people to figure out how this relates to the bigger picture. A note here will help a lot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a valid point. Check out BACK_TESTING_NOTE: and let me know if it seems reasonable.

Taylor Robie added 3 commits November 13, 2020 16:38
# PyTorch the cost and complexity of such shims will increase. Once back
# testing is no longer required (which is to say we have done enough historic
# analysis and the shims no longer justify their maintenance and code
# complexity costs) back testing paths will be removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this note is super helpful. One extra benefit of the note is it also clues the reader in on what kinds of code changes to Timer are permissible, and what are not (changes that add more dependencies on C symbols => you need to add more backtesting support.)

@facebook-github-bot
Copy link
Contributor

@robieta merged this pull request in 17ea112.

@facebook-github-bot facebook-github-bot deleted the gh/robieta/3/head branch December 5, 2020 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants