Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DTensor] Computed DTensorSpec hash lazily #114322

Closed
wants to merge 1 commit into from

Conversation

awgu
Copy link
Contributor

@awgu awgu commented Nov 22, 2023

Copy link

pytorch-bot bot commented Nov 22, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114322

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 0f5e76b with merge base 140c54e (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu added a commit that referenced this pull request Nov 22, 2023
ghstack-source-id: 11dcb0e9dd8505fb88c37e37bfbab6b46bb99e6c
Pull Request resolved: #114322
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, thanks for the quick fix!

@awgu awgu marked this pull request as ready for review November 22, 2023 00:18
@awgu awgu added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR release notes: distributed (dtensor) release notes category labels Nov 22, 2023
@awgu
Copy link
Contributor Author

awgu commented Nov 22, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: periodic / macos-12-py3-x86-64 / build

Details for Dev Infra team Raised by workflow job

@awgu
Copy link
Contributor Author

awgu commented Nov 22, 2023

Failure unrelated: periodic / macos-12-py3-x86-64 / build (gh)

FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMPS.cpp.o 
/var/folders/3s/vfzpb5r51gs6y328rmlgzm7c0000gn/T/tmp.4xorXdVd/clang++ -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/runner/work/pytorch/pytorch/build/aten/src -I/Users/runner/work/pytorch/pytorch/aten/src -I/Users/runner/work/pytorch/pytorch/build -I/Users/runner/work/pytorch/pytorch -I/Users/runner/work/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/runner/work/pytorch/pytorch/third_party/onnx -I/Users/runner/work/pytorch/pytorch/build/third_party/onnx -I/Users/runner/work/pytorch/pytorch/third_party/foxi -I/Users/runner/work/pytorch/pytorch/build/third_party/foxi -I/Users/runner/work/pytorch/pytorch/torch/csrc/api -I/Users/runner/work/pytorch/pytorch/torch/csrc/api/include -I/Users/runner/work/pytorch/pytorch/caffe2/aten/src/TH -I/Users/runner/work/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/runner/work/pytorch/pytorch/build/caffe2/aten/src -I/Users/runner/work/pytorch/pytorch/build/caffe2/../aten/src -I/Users/runner/work/pytorch/pytorch/torch/csrc -I/Users/runner/work/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/runner/work/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/runner/work/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/runner/work/pytorch/pytorch/aten/src/ATen/.. -I/Users/runner/work/pytorch/pytorch/third_party/FXdiv/include -I/Users/runner/work/pytorch/pytorch/c10/.. -I/Users/runner/work/pytorch/pytorch/third_party/pthreadpool/include -I/Users/runner/work/pytorch/pytorch/third_party/cpuinfo/include -I/Users/runner/work/pytorch/pytorch/third_party/QNNPACK/include -I/Users/runner/work/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/runner/work/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/runner/work/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/runner/work/pytorch/pytorch/third_party/NNPACK/include -I/Users/runner/work/pytorch/pytorch/third_party/fbgemm/include -I/Users/runner/work/pytorch/pytorch/third_party/fbgemm -I/Users/runner/work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit/src -I/Users/runner/work/pytorch/pytorch/third_party/ittapi/src/ittnotify -I/Users/runner/work/pytorch/pytorch/third_party/FP16/include -I/Users/runner/work/pytorch/pytorch/third_party/fmt/include -I/Users/runner/work/pytorch/pytorch/build/third_party/ideep/mkl-dnn/include -I/Users/runner/work/pytorch/pytorch/third_party/ideep/mkl-dnn/src/../include -I/Users/runner/work/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/runner/work/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/runner/work/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/runner/work/pytorch/pytorch/third_party/protobuf/src -isystem /Users/runner/work/_temp/conda_environment_6951118264/include -isystem /Users/runner/work/pytorch/pytorch/third_party/gemmlowp -isystem /Users/runner/work/pytorch/pytorch/third_party/neon2sse -isystem /Users/runner/work/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/runner/work/pytorch/pytorch/third_party/ittapi/include -isystem /Users/runner/work/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/runner/work/pytorch/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /Users/runner/work/pytorch/pytorch/third_party/ideep/include -isystem /Users/runner/work/pytorch/pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -isysroot /Applications/Xcode_13.3.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk -mmacosx-version-min=11.0 -fPIC -DMKL_HAS_SBGEMM -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -DASMJIT_STATIC -Xpreprocessor -fopenmp -I/Users/runner/work/_temp/conda_environment_6951118264/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMPS.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMPS.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMPS.cpp.o -c /Users/runner/work/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp
clang: error: unable to execute command: Segmentation fault: 11
clang: error: clang frontend command failed due to signal (use -v to see invocation)
Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode_13.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
clang: note: diagnostic msg: 
********************

@awgu
Copy link
Contributor Author

awgu commented Nov 22, 2023

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: periodic / macos-12-py3-x86-64 / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Nov 22, 2023

This is weird as I still see the test failing in trunk on multigpu job https://hud.pytorch.org/pytorch/pytorch/commit/e7326ec295559c16795088e79a5631e784bb4d61, any thoughts?

@awgu
Copy link
Contributor Author

awgu commented Nov 22, 2023

Looking into it!

@facebook-github-bot facebook-github-bot deleted the gh/awgu/465/head branch November 25, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants