Skip to content

Conversation

driazati
Copy link
Contributor

@driazati driazati commented Jun 22, 2021

This replaces Ninja for parallel builds with asyncio which is more idiomatic Python + easier to debug when things go wrong since the data never leaves Python.

This replaces Ninja for parallel builds with asyncio which is more idiomatic Python + easier to debug when things go wrong since the data never leaves Python.
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 22, 2021

💊 CI failures summary and remediations

As of commit c2c9521 (more details on the Dr. CI page and at hud.pytorch.org/pr/60495):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 22 23:57:57 AssertionError: False is not tr... was 1.0 (1.0 vs. 0.0), which occurred at index 0.
Jun 22 23:57:57 ----------------------------------------------------------------------
Jun 22 23:57:57 Traceback (most recent call last):
Jun 22 23:57:57   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 397, in instantiated_test
Jun 22 23:57:57     result = test_fn(self, *args)
Jun 22 23:57:57   File "/var/lib/jenkins/workspace/xla/test/../../test/test_view_ops.py", line 458, in test_transpose_inplace_view
Jun 22 23:57:57     self.assertEqual(t[1, 0], v[0, 1])
Jun 22 23:57:57   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 605, in assertEqual
Jun 22 23:57:57     return DeviceTypeTestBase.assertEqual(self, x, y, *args, **kwargs)
Jun 22 23:57:57   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1407, in assertEqual
Jun 22 23:57:57     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jun 22 23:57:57 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 1 element(s) (out of 1) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 1.0 (1.0 vs. 0.0), which occurred at index 0.
Jun 22 23:57:57 
Jun 22 23:57:57 ----------------------------------------------------------------------
Jun 22 23:57:57 Ran 138 tests in 3.648s
Jun 22 23:57:57 
Jun 22 23:57:57 FAILED (failures=2, skipped=102)
Jun 22 23:57:57 
Jun 22 23:57:57 Generating XML reports...
Jun 22 23:57:57 Generated XML report: test-reports/python-unittest/test.......test.test_view_ops/TEST-TestViewOpsXLA-20210622235753.xml
Jun 22 23:57:57 + cleanup
Jun 22 23:57:57 + retcode=1

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jun 23 01:10:36 unknown file: Failure
Jun 23 01:10:24 frame #7: build/bin/test_api() [0xc06f65]
Jun 23 01:10:24 frame #8: build/bin/test_api() [0xc07205]
Jun 23 01:10:24 frame #9: testing::internal::UnitTestImpl::RunAllTests() + 0xbf9 (0xc08249 in build/bin/test_api)
Jun 23 01:10:24 frame #10: testing::UnitTest::Run() + 0x8f (0xc084ef in build/bin/test_api)
Jun 23 01:10:24 frame #11: main + 0xc8 (0x582c48 in build/bin/test_api)
Jun 23 01:10:24 frame #12: __libc_start_main + 0xf0 (0x7f2531af7840 in /lib/x86_64-linux-gnu/libc.so.6)
Jun 23 01:10:24 frame #13: _start + 0x29 (0x5b92b9 in build/bin/test_api)
Jun 23 01:10:24 " thrown in the test body.
Jun 23 01:10:24 [  FAILED  ] IntegrationTest.MNIST_CUDA (12051 ms)
Jun 23 01:10:24 [ RUN      ] IntegrationTest.MNISTBatchNorm_CUDA
Jun 23 01:10:36 unknown file: Failure
Jun 23 01:10:36 C++ exception with description "Error opening images file at test/cpp/api/mnist/t10k-images-idx3-ubyte
Jun 23 01:10:36 Exception raised from read_images at /var/lib/jenkins/workspace/torch/csrc/api/src/data/datasets/mnist.cpp:67 (most recent call first):
Jun 23 01:10:36 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f254b2398cb in /var/lib/jenkins/workspace/build/lib/libc10.so)
Jun 23 01:10:36 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7f254b2350de in /var/lib/jenkins/workspace/build/lib/libc10.so)
Jun 23 01:10:36 frame #2: <unknown function> + 0x43d5bd5 (0x7f254fa5abd5 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Jun 23 01:10:36 frame #3: torch::data::datasets::MNIST::MNIST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 0x46 (0x7f254fa5bbf6 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Jun 23 01:10:36 frame #4: IntegrationTest_MNISTBatchNorm_CUDA_Test::TestBody() + 0x1229 (0x782389 in build/bin/test_api)
Jun 23 01:10:36 frame #5: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x4a (0xc0ff3a in build/bin/test_api)
Jun 23 01:10:36 frame #6: build/bin/test_api() [0xc06976]
Jun 23 01:10:36 frame #7: build/bin/test_api() [0xc06f65]

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

@driazati has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@driazati driazati requested a review from 1ntEgr8 June 22, 2021 22:37
Copy link
Contributor

@1ntEgr8 1ntEgr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it locally and observed the output oftop. lgtm!

@facebook-github-bot
Copy link
Contributor

@driazati merged this pull request in 555c154.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants