Skip to content

Conversation

thomasjpfan
Copy link
Contributor

Fixes #57109

Adds comment to clarify a=sqrt(5) in nn.Linear.reset_parameters.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 18, 2021

💊 CI failures summary and remediations

As of commit 8543d05 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

May 19 23:15:43 AssertionError: False is not tr...lowed difference with rtol=0 and atol=0 is only 0!
May 19 23:15:43 ----------------------------------------------------------------------
May 19 23:15:43 Traceback (most recent call last):
May 19 23:15:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 391, in wrapper
May 19 23:15:43     self._join_processes(fn)
May 19 23:15:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 583, in _join_processes
May 19 23:15:43     self._check_return_codes(elapsed_time)
May 19 23:15:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 645, in _check_return_codes
May 19 23:15:43     msg="Expected zero exit code but got {}".format(first_process.exitcode)
May 19 23:15:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1400, in assertEqual
May 19 23:15:43     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
May 19 23:15:43 AssertionError: False is not true : Scalars failed to compare as equal! Comparing -11 and 0 gives a difference of 11, but the allowed difference with rtol=0 and atol=0 is only 0!
May 19 23:15:43 Expected zero exit code but got -11
May 19 23:15:43 
May 19 23:15:43 ----------------------------------------------------------------------
May 19 23:15:43 Ran 46 tests in 16.661s
May 19 23:15:43 
May 19 23:15:43 FAILED (failures=31, skipped=15)
May 19 23:15:43 
May 19 23:15:43 Generating XML reports...
May 19 23:15:43 Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20210519231526.xml
May 19 23:15:43 Traceback (most recent call last):

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_bionic_cuda10_2_cudnn7_py3_9_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

May 20 00:18:05 unknown file: Failure
May 20 00:18:05 frame #7: <unknown function> + 0x899e25 (0x55a74668ce25 in build/bin/test_api)
May 20 00:18:05 frame #8: <unknown function> + 0x89a0d5 (0x55a74668d0d5 in build/bin/test_api)
May 20 00:18:05 frame #9: testing::internal::UnitTestImpl::RunAllTests() + 0xc1c (0x55a74668e12c in build/bin/test_api)
May 20 00:18:05 frame #10: testing::UnitTest::Run() + 0x98 (0x55a74668e3f8 in build/bin/test_api)
May 20 00:18:05 frame #11: main + 0xc8 (0x55a746012de8 in build/bin/test_api)
May 20 00:18:05 frame #12: __libc_start_main + 0xe7 (0x7fe5c5dedbf7 in /lib/x86_64-linux-gnu/libc.so.6)
May 20 00:18:05 frame #13: _start + 0x2a (0x55a746048cca in build/bin/test_api)
May 20 00:18:05 " thrown in the test body.
May 20 00:18:05 [  FAILED  ] IntegrationTest.MNIST_CUDA (3 ms)
May 20 00:18:05 [ RUN      ] IntegrationTest.MNISTBatchNorm_CUDA
May 20 00:18:05 unknown file: Failure
May 20 00:18:05 C++ exception with description "Error opening images file at test/cpp/api/mnist/train-images-idx3-ubyte
May 20 00:18:05 Exception raised from read_images at /var/lib/jenkins/workspace/torch/csrc/api/src/data/datasets/mnist.cpp:67 (most recent call first):
May 20 00:18:05 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7fe5c65adb6b in /var/lib/jenkins/workspace/build/lib/libc10.so)
May 20 00:18:05 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7fe5c65a984e in /var/lib/jenkins/workspace/build/lib/libc10.so)
May 20 00:18:05 frame #2: <unknown function> + 0x4061055 (0x7fe5d98c1055 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
May 20 00:18:05 frame #3: torch::data::datasets::MNIST::MNIST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 0x46 (0x7fe5d98c2076 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
May 20 00:18:05 frame #4: IntegrationTest_MNISTBatchNorm_CUDA_Test::TestBody() + 0x9de (0x55a74620e69e in build/bin/test_api)
May 20 00:18:05 frame #5: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x4a (0x55a74669611a in build/bin/test_api)
May 20 00:18:05 frame #6: <unknown function> + 0x899815 (0x55a74668c815 in build/bin/test_api)
May 20 00:18:05 frame #7: <unknown function> + 0x899e25 (0x55a74668ce25 in build/bin/test_api)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@thomasjpfan thomasjpfan added module: nn Related to torch.nn module: docs Related to our documentation, both in docs/ and docblocks and removed module: docs Related to our documentation, both in docs/ and docblocks labels May 18, 2021
Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the clarification! Might be worth adding a link to the issue discussion for those interested in the details on how using kaiming_uniform_ with sqrt(5) results in uniform(-1/sqrt(in_features), 1/sqrt(in_features)).

@codecov
Copy link

codecov bot commented May 18, 2021

Codecov Report

Merging #58487 (c01ba70) into master (821a975) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head c01ba70 differs from pull request most recent head 8543d05. Consider uploading reports for the commit 8543d05 to get more accurate results

@@            Coverage Diff             @@
##           master   #58487      +/-   ##
==========================================
- Coverage   76.46%   76.46%   -0.01%     
==========================================
  Files        1992     1992              
  Lines      199937   199937              
==========================================
- Hits       152879   152872       -7     
- Misses      47058    47065       +7     

@facebook-github-bot
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jbschlosser merged this pull request in 145a6f7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nn.Linear weight initalization - uniform or kaiming_uniform?
4 participants