Skip to content

xlogy: Port to structured #60814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

xlogy: Port to structured #60814

wants to merge 5 commits into from

Conversation

Freey0
Copy link
Contributor

@Freey0 Freey0 commented Jun 26, 2021

Stack from ghstack:

Differential Revision: D29449373

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 26, 2021

💊 CI failures summary and remediations

As of commit 7b3350f (more details on the Dr. CI page and at hud.pytorch.org/pr/60814):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Jul 15 13:20:30 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Jul 15 13:20:30 ++++ extract_trap_cmd
Jul 15 13:20:30 ++++ printf '%s\n' ''
Jul 15 13:20:30 +++ printf '%s\n' cleanup
Jul 15 13:20:30 ++ trap -- '
Jul 15 13:20:30 cleanup' EXIT
Jul 15 13:20:30 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build != *pytorch-win-* ]]
Jul 15 13:20:30 ++ which sccache
Jul 15 13:20:30 ++ sccache --stop-server
Jul 15 13:20:30 ++ true
Jul 15 13:20:30 ++ rm /var/lib/jenkins/sccache_error.log
Jul 15 13:20:30 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Jul 15 13:20:30 ++ true
Jul 15 13:20:30 ++ [[ -n '' ]]
Jul 15 13:20:30 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build == *rocm* ]]
Jul 15 13:20:30 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Jul 15 13:20:30 ++ SCCACHE_IDLE_TIMEOUT=1200
Jul 15 13:20:30 ++ RUST_LOG=sccache::server=error
Jul 15 13:20:30 ++ sccache --start-server
Jul 15 13:20:30 sccache: Starting the server...
Jul 15 13:20:30 ++ sccache --zero-stats
Jul 15 13:20:30 Compile requests                      0

1 job timed out:

  • pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jul 15 16:41:07 RuntimeError: tensorflow/compil...OK() (Unknown: Could not start gRPC server vs. OK)
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch_xla-1.10-py3.6-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 314, in _setup_replication
Jul 15 16:41:07     device = xm.xla_device()
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch_xla-1.10-py3.6-linux-x86_64.egg/torch_xla/core/xla_model.py", line 232, in xla_device
Jul 15 16:41:07     devkind=devkind if devkind is not None else None)
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch_xla-1.10-py3.6-linux-x86_64.egg/torch_xla/core/xla_model.py", line 137, in get_xla_supported_devices
Jul 15 16:41:07     xla_devices = _DEVICES.value
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch_xla-1.10-py3.6-linux-x86_64.egg/torch_xla/utils/utils.py", line 32, in value
Jul 15 16:41:07     self._value = self._gen_fn()
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch_xla-1.10-py3.6-linux-x86_64.egg/torch_xla/core/xla_model.py", line 19, in <lambda>
Jul 15 16:41:07     _DEVICES = xu.LazyProperty(lambda: torch_xla._XLAC._xla_get_devices())
Jul 15 16:41:07 RuntimeError: tensorflow/compiler/xla/xla_client/xrt_local_service.cc:56 : Check failed: tensorflow::NewServer(server_def, &server_) == ::tensorflow::Status::OK() (Unknown: Could not start gRPC server vs. OK)
Jul 15 16:41:07 Traceback (most recent call last):
Jul 15 16:41:07   File "/var/lib/jenkins/workspace/xla/test/test_mp_replication.py", line 30, in <module>
Jul 15 16:41:07     xmp.spawn(_mp_fn, args=())
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch_xla-1.10-py3.6-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 394, in spawn
Jul 15 16:41:07     start_method=start_method)
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
Jul 15 16:41:07     while not context.join():
Jul 15 16:41:07   File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 144, in join
Jul 15 16:41:07     exit_code=exitcode
Jul 15 16:41:07 torch.multiprocessing.spawn.ProcessExitedException: process 2 terminated with exit code 17

ci.pytorch.org: 1 failed


Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@ezyang
Copy link
Contributor

ezyang commented Jun 29, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@Freey0 Freey0 mentioned this pull request Jul 7, 2021
@ezyang
Copy link
Contributor

ezyang commented Jul 7, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jul 12, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jul 19, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 11cc179.

@facebook-github-bot facebook-github-bot deleted the gh/Feey0/35/head branch July 23, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants