Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NNC] Added matmul for NNC lowering/unified dtypes #56456

Closed
wants to merge 9 commits into from

Conversation

Chillee
Copy link
Contributor

@Chillee Chillee commented Apr 20, 2021

Stack from ghstack:

Differential Revision: D27977532

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Apr 20, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 20, 2021

💊 CI failures summary and remediations

As of commit 88c77cc (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test1 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: CUDA error: device-side assert triggered
Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "C:\Users\circleci\project\build\win_tmp\build\torch\cuda\__init__.py", line 444, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
C:/Users/circleci/project/aten/src/ATen/native/cuda/TensorCompare.cu:68: block: [0,0,0], thread: [0,0,0] Assertion `input[0] != c10::complex<float>(0, 0)` failed.
Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "C:\Users\circleci\project\build\win_tmp\build\torch\cuda\__init__.py", line 444, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: device-side assert triggered
ok (55.422s)
  test_gather_bool (__main__.TestCuda) ... ok (0.005s)
  test_get_device_index (__main__.TestCuda) ... ok (0.005s)
  test_get_set_rng_state_all (__main__.TestCuda) ... skip (0.003s)
  test_grad_scaling_accumulation (__main__.TestCuda) ... ok (0.015s)
  test_grad_scaling_autocast (__main__.TestCuda) ... ok (0.058s)
  test_grad_scaling_clipping (__main__.TestCuda) ... ok (0.041s)
  test_grad_scaling_clipping_separate_unscale (__main__.TestCuda) ... test_cuda.py:2216: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
  torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm, error_if_nonfinite=False)
ok (0.034s)

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_macos_10_13_py3_lite_interpreter_build_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun) ❄️

Apr 24 02:09:51 unknown file: Failure
Apr 24 02:09:47 /Applications/Xcode-12.GM.seed.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: /Users/distiller/project/torch/lib/libCaffe2_perfkernels_avx512.a(common_avx512.cc.o) has no symbols
Apr 24 02:09:47 warning: /Applications/Xcode-12.GM.seed.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: archive library: /Users/distiller/project/torch/lib/libCaffe2_perfkernels_avx512.a the table of contents is empty (no object file members in the library define global symbols)
Apr 24 02:09:49 + popd
Apr 24 02:09:49 ~/project
Apr 24 02:09:49 + /Users/distiller/project/../cpp-build/caffe2/build/bin/test_lite_interpreter_runtime
Apr 24 02:09:50 Note: Google Test filter = *-*_CUDA:*_MultiCUDA
Apr 24 02:09:50 [==========] Running 2 tests from 1 test case.
Apr 24 02:09:50 [----------] Global test environment set-up.
Apr 24 02:09:50 [----------] 2 tests from RunTimeTest
Apr 24 02:09:50 [ RUN      ] RunTimeTest.LoadAndForward
Apr 24 02:09:51 unknown file: Failure
Apr 24 02:09:51 C++ exception with description "open file failed, file path: /Users/distiller/project/test/cpp/lite_interpreter_runtime/sequence.ptl
Apr 24 02:09:51 Exception raised from FileAdapter at /Users/distiller/project/caffe2/serialize/file_adapter.cc:11 (most recent call first):
Apr 24 02:09:51 frame #0: decltype(std::__1::forward<c10::(anonymous namespace)::GetFetchStackTrace()::$_0&>(fp)()) std::__1::__invoke<c10::(anonymous namespace)::GetFetchStackTrace()::$_0&>(c10::(anonymous namespace)::GetFetchStackTrace()::$_0&) + 54 (0x10f65a746 in libc10.dylib)
Apr 24 02:09:51 frame #1: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > std::__1::__invoke_void_return_wrapper<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::__call<c10::(anonymous namespace)::GetFetchStackTrace()::$_0&>(c10::(anonymous namespace)::GetFetchStackTrace()::$_0&) + 54 (0x10f65a6e6 in libc10.dylib)
Apr 24 02:09:51 frame #2: std::__1::__function::__alloc_func<c10::(anonymous namespace)::GetFetchStackTrace()::$_0, std::__1::allocator<c10::(anonymous namespace)::GetFetchStackTrace()::$_0>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() + 54 (0x10f65a6a6 in libc10.dylib)
Apr 24 02:09:51 frame #3: std::__1::__function::__func<c10::(anonymous namespace)::GetFetchStackTrace()::$_0, std::__1::allocator<c10::(anonymous namespace)::GetFetchStackTrace()::$_0>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() + 45 (0x10f6593dd in libc10.dylib)
Apr 24 02:09:51 frame #4: std::__1::__function::__value_func<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() const + 75 (0x10f66052b in libc10.dylib)
Apr 24 02:09:51 frame #5: std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > ()>::operator()() const + 35 (0x10f6504d3 in libc10.dylib)
Apr 24 02:09:51 frame #6: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 78 (0x10f650a4e in libc10.dylib)
Apr 24 02:09:51 frame #7: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 130 (0x10f64c9a2 in libc10.dylib)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Chillee added a commit that referenced this pull request Apr 20, 2021
ghstack-source-id: fa880c7195a2881d2ba7d48eb6792a8aef9f08da
Pull Request resolved: #56456
Chillee added a commit that referenced this pull request Apr 20, 2021
ghstack-source-id: 13e72b9c9d4c42d488b1d20af40f9918dd859d56
Pull Request resolved: #56456
Chillee added a commit that referenced this pull request Apr 21, 2021
ghstack-source-id: 9a49cc9f14e8515770bb47616878a719fad36bfb
Pull Request resolved: #56456
Chillee added a commit that referenced this pull request Apr 22, 2021
ghstack-source-id: c184315270675c09531729c0e218b8bece0d8375
Pull Request resolved: #56456
Chillee added a commit to Chillee/pytorch that referenced this pull request Apr 22, 2021
ghstack-source-id: 781d73f88440e49d02be63a82d542c955444afb4
Pull Request resolved: pytorch#56456
Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! Please include a more detailed description in the commit message and add a tag ([NNC] or [TensorExpr]) - it helps to find relevant commits for release notes.

@Chillee Chillee changed the title Added matmul/unified dtypes [NNC] Added matmul/unified dtypes Apr 23, 2021
@Chillee Chillee changed the title [NNC] Added matmul/unified dtypes [NNC] Added matmul for NNC lowering/unified dtypes Apr 23, 2021
@facebook-github-bot
Copy link
Contributor

@Chillee merged this pull request in bcef7eb.

@facebook-github-bot facebook-github-bot deleted the gh/chillee/44/head branch April 28, 2021 14:17
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary: Pull Request resolved: pytorch#56456

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27977532

Pulled By: Chillee

fbshipit-source-id: c04372d988c8ef795f27037348a155894c2eddad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants