Send Pull Request #1

miladm · 2022-05-19T02:39:50Z

Fixes #ISSUE_NUMBER

@mruberry

Re-enable previously filtered op tests. Expecting lotsa failures. Should dtype also be wrapped in list? cc @mruberry, @suo Pull Request resolved: #77330 Approved by: https://github.com/suo

Removes azure_pipelines removal from create_release.yml Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77369 Approved by: https://github.com/suo, https://github.com/janeyx99

Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77370 Approved by: https://github.com/suo, https://github.com/janeyx99

This reverts commit 56bed0d. Reverted #76823 on behalf of https://github.com/rohan-varma

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77376 Approved by: https://github.com/atalman, https://github.com/malfet

This infrastructure was used at some point but I do not believe it is used anymore. We should remove this to reduce the amount of confusion one has when trying to contribute to pytorch ci. Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77364 Approved by: https://github.com/janeyx99

Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77383 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Approved by: https://github.com/malfet

These jobs take a while to spin up, so let's make it so that they use custom runners Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77384 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Approved by: https://github.com/malfet

Exposes `cuFuncSetAttribute` & `cuFuncGetAttribute` Used for runtime compilation by nvfuser Pull Request resolved: #77296 Approved by: https://github.com/davidberard98

Decompositions can be used to fill in meta support where necessary, assuming the operations they decompose to support meta key. This PR adds register_meta kwarg to register_decomposition that optionally lets you register the meta to the C++ dispatch table for meta tensors. I use this to then get the meta function for where and huber_loss for free. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77353 Approved by: https://github.com/mruberry

Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77390 Approved by: https://github.com/malfet

In `_need_symbolic_context`, when the annotation is postponed evaluated, the annotation is a string and not a type. We need to use get_type_hints to get the real type. For example, ```python def g(a: int) -> int: return a def f(a: "int") -> "int": return a ``` we will get the correct type `int` for both g and f with `typing.get_type_hints`. Otherwise, the type for `a` in `f` will be a string and is not comparable to the type `int` - `issubclass` will complain. This is necessary as we will use postponed typing evaluation to break circular dependencies. Pull Request resolved: #77365 Approved by: https://github.com/BowenBao

This adds basic coverage, but can be easily made more efficient by providing a native implementation. Follow up work includes supporting CSR gradients for strided Tensors. Pull Request resolved: #77177 Approved by: https://github.com/nikitaved, https://github.com/mikaylagawarecki

None as input is legal per ONNX spec for representing optional inputs. For [example](https://github.com/onnx/onnx/blob/main/docs/Operators.md#inputs-2---3-7) `constant_value` for `ONNX::Pad`. This PR removes such constraint check that was set prior to calling onnx shape inference. For the issue below, such constraint prevents the onnx shape inference of `ONNX::Pad`, which leads to falling back on an incorrect constant traced shape. For the unit test in this PR, prior to this PR, the ONNX shape inference for `ONNX::Pad` would be skipped, and would return `None` instead. Fixes pytorch/vision#5971 Pull Request resolved: #77379 Approved by: https://github.com/garymm

Support quantization for maxpool exporting to ONNX. Pull Request resolved: #77393 Approved by: https://github.com/BowenBao

Adds support for scripting ParameterDicts and getattr() on them. It does not support iterating on ParameterDicts because torch/nn/container.py implementation of ParameterDict.items() uses a generator, which is not supported by torchscript. torch/nn/container.py would need to be updated so that iter gets correctly registered in python_sugared_value.cpp Added a test in test_module_containers.py Pull Request resolved: #77143 Approved by: https://github.com/eellison

Pull Request resolved: #77371 Approved by: https://github.com/mruberry

Reduce circular dependencies - Lift constants and flags from `symbolic_helper` to `_constants` and `_globals` - Standardized constant naming to make it consistant - Make `utils` strictly dependent on `symbolic_helper`, removing inline imports from symbolic_helper - Move side effects from `utils` to `_patch_torch` Pull Request resolved: #77142 Approved by: https://github.com/garymm, https://github.com/BowenBao

Main question mark is that `log_sigmoid_forward` uses `acc_t` instead of `opmath_t` - not sure if we have a decorator today for that? Glad to add one if we don't. Pull Request resolved: #77329 Approved by: https://github.com/ezyang

…dows) (#77192) Ref: #74537 Pull Request resolved: #77192 Approved by: https://github.com/anjali411

Pull Request resolved: #77347 Approved by: https://github.com/datumbox, https://github.com/frank-wei

This PR makes the following changes... Prims - adds as_strided - fixes errors in flatten meta Testing - enables view consistency checking (which can be opted out of, see issues below) - adds reference inputs for view, reshape, and flatten - adds error inputs for reshape Refs - adds as_strided, reshape, and view - fixes an error in the flatten ref where it was not returning self on no-op - fixes a bug in transpose where it was not retuning a view when the transposed tensor has 1 or fewer dims Issues - #77218 - #77216 Pull Request resolved: #77220 Approved by: https://github.com/ngimel

…idesOf` (#77387) Summary: s/size/in_size/ in outer func Test Plan: CI Differential Revision: D36357483 Pull Request resolved: #77387 Approved by: https://github.com/seemethere, https://github.com/mehtanirav

Pull Request resolved: #77358 Approved by: https://github.com/ezyang

Pull Request resolved: #76296 Approved by: https://github.com/cpuhrsch, https://github.com/ngimel

…t[value={0}]" Retry of #76875. It was reverted due to torchvision failures, but it turned out that the failures were caused by a different PR. irparser previously didn't support these, which would cause failures in log_extract.py Pull Request resolved: #77377 Approved by: https://github.com/datumbox

Summary: The new PrivateUse1 DeviceType is associated with the PrivateUse1 DispatchKey, which can be used for non-public devices without introducing a new device type. Note that the stringified name of the PrivateUse1 device is "privateuseone". Test Plan: All CI should pass. Differential Revision: D35859437 Pull Request resolved: #77208 Approved by: https://github.com/bdhirsh

Summary: The root module may have different forward functions. The current implementation assumes only the func `forward` can be traced. In this diff, we add an argument of forward func name to enable users trace different forward functions Test Plan: N1903198 Differential Revision: D36157032 Pull Request resolved: #77109 Approved by: https://github.com/jamesr66a

…ance on CPU Pull Request resolved: #73953 Approved by: https://github.com/frank-wei

) Fixes #ISSUE_NUMBER Pull Request resolved: #76888 Approved by: https://github.com/eellison

Fixes #77412 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77488 Approved by: https://github.com/mruberry

…representing tensor sizes (#76836)"" This reverts commit c35bd8d. Pull Request resolved: #77719 Approved by: https://github.com/Chillee, https://github.com/malfet

Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: #77471 Approved by: https://github.com/seemethere, https://github.com/eellison

This reverts commit d03d43d. Reverted #77673 on behalf of https://github.com/suo

In preparation of adopting future rocblas library options, it is necessary to track when the backward pass of training is executing. The scope-based helper class `BackwardPassGuard` is provided to toggle state. Pull Request resolved: #71881 Approved by: https://github.com/albanD

Pull Request resolved: #77653 Approved by: https://github.com/albanD

…ong type for detach() Pull Request resolved: #77655 Approved by: https://github.com/albanD

Pull Request resolved: #77782 Approved by: https://github.com/albanD

Fixing #77748 Pull Request resolved: #77767 Approved by: https://github.com/soulitzer

Pull Request resolved: #77605 Approved by: https://github.com/cpuhrsch

Introduce error handling across all ranks when loading and saving checkpoints. This makes it a lot simpler for users to handle failures and, as a positive side-effect, coordination of when it successfully finished. This change requires 3 collectives when saving and 1 when loading. All those collectives carry a small payload so they will be latency bound and write time should dominate it. Pull Request resolved: #77091 Approved by: https://github.com/pritamdamania87, https://github.com/wanchaol

This reverts commit a7cf95a. Reverted #77708 on behalf of https://github.com/suo

Makes debugging of failures like #76999 (comment) easier, by posting a link to checkrun that have failed/still pending Pull Request resolved: #77763 Approved by: https://github.com/seemethere

This is a workaround for EFA for TensorPipe. This allows RPC enabled tests to be ran on AWS clusters. Pull Request resolved: #77363 Approved by: https://github.com/wanchaol

more! Pull Request resolved: #77803 Approved by: https://github.com/seemethere

Resubmit of #77673, which was reverted due to Windows test failures: #77673 (comment). I suspect these failures happened because I don't explicitly set a side stream for graph capture in the new test. Not setting a side stream explicitly is alright on Linux because cuda tests implicitly use a side stream. I think Windows cuda tests implicitly use the default stream, breaking capture and leaving the backend in a bad state. Other graphs tests explicitly set side streams and don't error in Windows builds, so i'm 95% sure doing the same for the new test will work. Pull Request resolved: #77789 Approved by: https://github.com/ezyang

Pull Request resolved: #77800 Approved by: https://github.com/pritamdamania87, https://github.com/fduwjj

This is the first PR to make DataPipe deterministic. Users should be able to use `torch.manual_seed(seed)` to control the shuffle order for the following cases: - Directly over `DataPipe` - For single-process DataLoader - Multiprocessing DataLoader Unfortunately, for distributed training, users have to run `apply_shuffle_seed` manually to make sure all distributed processes having the same order of shuffle. Pull Request resolved: #77741 Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT

Pull Request resolved: #77663 Approved by: https://github.com/cpuhrsch

Rehash of #75426 now that a revised version of load_state_dict_post_hook has landed. Pull Request resolved: #76912 Approved by: https://github.com/awgu

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77682 Approved by: https://github.com/ngimel, https://github.com/mruberry

Pull Request resolved: #77761 Approved by: https://github.com/jbschlosser

…78136) This prevents `import torch` accidentally crash on machines with no metal devices Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true Backtrace to the crash: ``` (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23 frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 frame pytorch#2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125 frame pytorch#3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535 frame pytorch#4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl: -> 0x10fd9f524 <+436>: movq %rax, 0x1b0(%rbx) 0x10fd9f52b <+443>: movw $0x0, 0x1b8(%rbx) 0x10fd9f534 <+452>: addq $0x8, %rsp 0x10fd9f538 <+456>: popq %rbx (lldb) disassemble ... 0x10fd9f514 <+420>: movq 0xf19ad15(%rip), %rsi ; "maxBufferLength" 0x10fd9f51b <+427>: movq %r14, %rdi 0x10fd9f51e <+430>: callq *0xeaa326c(%rip) ; (void *)0x00007fff7202be40: objc_msgSend ``` which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171 Pull Request resolved: pytorch#78136 Approved by: https://github.com/seemethere

… of libtorch_python (pytorch#78028) Summary: This moves torch::class_<WorkerInfo> into `rpc_agent.cpp` so it gets registered in libtorch instead of libtorch_python. This is intermediate work to getting torch::deploy to load an unmodified copy of libtorch. Current RPC is incompatible due to duplicate registrations. ``` unknown file: Failure C++ exception with description "Exception Caught inside torch::deploy embedded library: Custom class with name __torch__.torch.classes.dist_rpc.WorkerInfo is already registered. Ensure that registration with torch::class_ is only called once. Exception raised from registerCustomClass at ../aten/src/ATen/core/custom_class.cpp:61 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f3bd9adb92e in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7f3bd9ab7068 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame pytorch#2: torch::registerCustomClass(std::shared_ptr<c10::ClassType>) + 0x110 (0x7f3bc2258980 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame pytorch#3: torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&) + 0x3b9 (0x7f3bc225a419 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame pytorch#4: [0x7f3ba45cfea1] frame pytorch#5: <unknown function> + 0x1b5334 (0x5652bdab9334 in ./test_deploy) frame pytorch#6: <unknown function> + 0x1b4f3e (0x5652bdab8f3e in ./test_deploy) frame pytorch#7: <unknown function> + 0x1b519b (0x5652bdab919b in ./test_deploy) frame pytorch#8: loadSearchFile(char const*) + 0x23e (0x7f3ba62f37f8 in /tmp/torch_deploy9ATEFg) frame pytorch#9: deploy_set_self + 0x51 (0x7f3ba62f38f9 in /tmp/torch_deploy9ATEFg) frame pytorch#10: torch::deploy::Interpreter::Interpreter(torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>) + 0x274 (0x5652bdaaa790 in ./test_deploy) frame pytorch#11: void __gnu_cxx::new_allocator<torch::deploy::Interpreter>::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x81 (0x5652bdaaf58b in ./test_deploy) frame pytorch#12: void std::allocator_traits<std::allocator<torch::deploy::Interpreter> >::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(std::allocator<torch::deploy::Interpreter>&, torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x4a (0x5652bdaae320 in ./test_deploy) frame pytorch#13: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::_M_realloc_insert<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(__gnu_cxx::__normal_iterator<torch::deploy::Interpreter*, std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> > >, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xee (0x5652bdaae4a0 in ./test_deploy) frame pytorch#14: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::emplace_back<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xb6 (0x5652bdaad258 in ./test_deploy) frame pytorch#15: torch::deploy::InterpreterManager::InterpreterManager(unsigned long, std::shared_ptr<torch::deploy::Environment>) + 0x123 (0x5652bdaa83b1 in ./test_deploy) frame pytorch#16: TorchpyTest_InitTwice_Test::TestBody() + 0x65 (0x5652bda075a9 in ./test_deploy) frame pytorch#17: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x65 (0x5652bda944b7 in ./test_deploy) frame pytorch#18: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x5a (0x5652bda8cfe7 in ./test_deploy) frame pytorch#19: testing::Test::Run() + 0x100 (0x5652bda68622 in ./test_deploy) frame pytorch#20: testing::TestInfo::Run() + 0x10f (0x5652bda68fb3 in ./test_deploy) frame pytorch#21: testing::TestSuite::Run() + 0x121 (0x5652bda6980d in ./test_deploy) frame pytorch#22: testing::internal::UnitTestImpl::RunAllTests() + 0x38e (0x5652bda756e6 in ./test_deploy) frame pytorch#23: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x65 (0x5652bda9586b in ./test_deploy) frame pytorch#24: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x5a (0x5652bda8e0f7 in ./test_deploy) frame pytorch#25: testing::UnitTest::Run() + 0xc9 (0x5652bda73fd1 in ./test_deploy) frame pytorch#26: RUN_ALL_TESTS() + 0x11 (0x5652bda169fa in ./test_deploy) frame pytorch#27: main + 0x27 (0x5652bda10ce2 in ./test_deploy) frame pytorch#28: <unknown function> + 0x2d310 (0x7f3bc0431310 in /usr/lib/libc.so.6) frame pytorch#29: __libc_start_main + 0x81 (0x7f3bc04313c1 in /usr/lib/libc.so.6) frame pytorch#30: _start + 0x25 (0x5652bda063b5 in ./test_deploy) ``` Test Plan: CI Differential Revision: D36564258 Pull Request resolved: pytorch#78028 Approved by: https://github.com/rohan-varma

…ytorch#78276) Fixes pytorch#325 **Summary**: Currently, the pytorchbot only allows for rebasing to the master branch. These modifications add functionality for rebasing to the 'viable/strict' branch of pytorch/pytorch by adding a flag to the comment. **Test Plan:** tested manually on personal fork ([#1](swang392#1)), and included a test case in test_tryrebase.py that checks if rebasing to viable/strict branch was successful. Pull Request resolved: pytorch#78276 Approved by: https://github.com/clee2000, https://github.com/janeyx99

… to conform with non-quantized countertpart filenames Summary: Names of analogous files in quantized directory (previously snake case) were inconsistent with their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes all files in quantized (and sub-directories) dir to have pascal case. `aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR. ``` terminate called after throwing an instance of 'c10::Error' what(): Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types. Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame pytorch#2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame pytorch#8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2) ..........................truncated............. ``` Test Plan: ``` python test/test_quantization.py ``` Pull Request resolved: pytorch#77037 Approved by: https://github.com/jerryzh168

### Summary: This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. According to the collected accuracy stats, model pytorch#2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5) **Model pytorch#2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5) **Model pytorch#3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5) **Model pytorch#2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5) **Model pytorch#3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5) **Full Precision model (FX Graph Mode quantized)** Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5) **Eager mode quantized model** Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5) Pull Request resolved: pytorch#81040 Approved by: https://github.com/jerryzh168

Hi! I was playing with libfuzzer and found bug when loading a model from file via `torch::jit::load` function. There is an unhandled exception in caffe2/serialize when calling a `stoull` function on unsanitized version string. The bug can be reproduced with `aot_model_compiler` binary: ``` aot_model_compiler --model=crash-stoull --model_name=name --model_version=1 --input_dims='1,3,224,224;2,2' --input_types='float;float' ``` Crash file is provided in [crash.zip](https://github.com/pytorch/pytorch/files/8701504/crash.zip). gdb output: ``` Temporary breakpoint 1, main (argc=6, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:87 87 "Run NNC AOT compiler for pytorch model. Example usage:\n" (gdb) c Continuing. terminate called after throwing an instance of 'std::invalid_argument' what(): stoull Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fa637f16859 in __GI_abort () at abort.c:79 pytorch#2 0x00007fa6381c1911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 pytorch#3 0x00007fa6381cd38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 pytorch#4 0x00007fa6381cd3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6 pytorch#5 0x00007fa6381cd6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6 pytorch#6 0x00007fa6381c42ce in std::__throw_invalid_argument(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6 pytorch#7 0x000000000247d567 in __gnu_cxx::__stoa<unsigned long long, unsigned long long, char, int> (__str=0x7ffcd160f228 "ZZ", __idx=0x0, __base=10, __convf=<optimized out>, __name=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/string_conversions.h:83 pytorch#8 std::__cxx11::stoull (__str="ZZ", __idx=0x0, __base=10) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/basic_string.h:6577 pytorch#9 caffe2::serialize::PyTorchStreamReader::init (this=this@entry=0x8c11ce0) at /pytorch_master/caffe2/serialize/inline_container.cc:145 pytorch#10 0x000000000247d9c7 in caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader (this=0x8c11ce0, in=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...}) at /pytorch_master/caffe2/serialize/inline_container.cc:88 pytorch#11 0x00000000035b7ba4 in __gnu_cxx::new_allocator<caffe2::serialize::PyTorchStreamReader>::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > ( __p=0x2, __args=..., this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/new_allocator.h:150 pytorch#12 std::allocator_traits<std::allocator<caffe2::serialize::PyTorchStreamReader> >::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__a=..., __p=0x2, __p@entry=0x8c11ce0, __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/alloc_traits.h:512 pytorch#13 0x00000000035b1988 in std::_Sp_counted_ptr_inplace<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x8c11cd0, __a=..., __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:551 pytorch#14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a8, __p=@0x7ffcd160f3a0: 0x10, __args=..., __a=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:683 pytorch#15 std::__shared_ptr<caffe2::serialize::PyTorchStreamReader, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0, __args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1371 pytorch#16 std::shared_ptr<caffe2::serialize::PyTorchStreamReader>::shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0, __args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:408 pytorch#17 std::allocate_shared<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=..., __a=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:859 pytorch#18 std::make_shared<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:875 pytorch#19 torch::jit::load (rai=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...}, device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.: extra_files=std::unordered_map with 0 elements) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:474 pytorch#20 0x00000000035b1ef6 in torch::jit::load (filename="crash-stoull", device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.: extra_files=std::unordered_map with 0 elements) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:444 pytorch#21 0x00000000035b1d22 in torch::jit::load (filename="", device=device@entry=...) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:424 pytorch#22 0x00000000008f9be3 in main (argc=1, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:128 ``` Pull Request resolved: pytorch#77557 Approved by: https://github.com/Gamrix

### Summary: This PR implements QAT for APoT FakeQuant. It runs QAT with FX graph mode quantized models (Resnet-18 pre-trained model, full ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. It also refactors the APoT PTQ module `apot_fx_graph_mode_ptq.py` (previously `fx_graph_mode_apot.py`) such that shared helper functions between PTQ and QAT are in a separate file `quantization_util.py`. Model pytorch#2 (uniformly quantized activation, APoT quantized weight) shows comparable accuracy compared to model #1 (uniformly quantized activation, APoT quantized weight) for 8-bit and significant accuracy improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run QAT models with: `python test/quantization/core/experimental/apot_qat.py` Run PTQ models with: `python test/quantization/core/experimental/apot_ptq.py` ### Accuracy Stats 8-bit (Uniform int8, APoT b = 8 k = 2) Model #1: Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.67% (Top-1), 89.04% (Top-5) Model pytorch#2: Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.72% (Top-1), 89.06% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) Model #1: Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 46.85% (Top-1), 72.85% (Top-5) Model pytorch#2: Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 66.45% (Top-1), 86.23% (Top-5) Pull Request resolved: pytorch#83282 Approved by: https://github.com/jerryzh168

Natalia Gimelshein and others added 30 commits May 12, 2022 20:11

reenable filtered op tests (#77330)

14f84a8

Re-enable previously filtered op tests. Expecting lotsa failures. Should dtype also be wrapped in list? cc @mruberry, @suo Pull Request resolved: #77330 Approved by: https://github.com/suo

ci: Cleanup create_release.yml workflow

35af5f3

Removes azure_pipelines removal from create_release.yml Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77369 Approved by: https://github.com/suo, https://github.com/janeyx99

ci: Set create_release.yml to run on nightly

3e92cae

Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77370 Approved by: https://github.com/suo, https://github.com/janeyx99

Revert "Load state dict post hook"

d92b0a5

This reverts commit 56bed0d. Reverted #76823 on behalf of https://github.com/rohan-varma

Remove unnecessary ifdef, fixes fbcode build

4ef6407

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77376 Approved by: https://github.com/atalman, https://github.com/malfet

ci: switch trymerge to custom runner

b1214ba

Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77383 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Approved by: https://github.com/malfet

exposing more CUDA driver API (#77296)

1a91975

Exposes `cuFuncSetAttribute` & `cuFuncGetAttribute` Used for runtime compilation by nvfuser Pull Request resolved: #77296 Approved by: https://github.com/davidberard98

ci: Add pr number to job name for trymerge

65f71c0

Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: #77390 Approved by: https://github.com/malfet

[ONNX] Add quantization support for maxpool (#77393)

1dd7336

Support quantization for maxpool exporting to ONNX. Pull Request resolved: #77393 Approved by: https://github.com/BowenBao

make scalar clamp overloads propagate nan (#77371)

5c7d916

Pull Request resolved: #77371 Approved by: https://github.com/mruberry

Add trace and log_sigmoid_forward decomps (#77329)

8626f76

Main question mark is that `log_sigmoid_forward` uses `acc_t` instead of `opmath_t` - not sure if we have a decorator today for that? Glad to add one if we don't. Pull Request resolved: #77329 Approved by: https://github.com/ezyang

[complex32] sum, prod : cuda only (disable jiterator reduction on win…

39bd37f

…dows) (#77192) Ref: #74537 Pull Request resolved: #77192 Approved by: https://github.com/anjali411

fix torchvhsion failed case test_classification_model on slow_conv2d

2b7943c

Pull Request resolved: #77347 Approved by: https://github.com/datumbox, https://github.com/frank-wei

[BE] Fix shadowed variable warning in `c10::TensorType::contiguousStr…

9fcf75e

…idesOf` (#77387) Summary: s/size/in_size/ in outer func Test Plan: CI Differential Revision: D36357483 Pull Request resolved: #77387 Approved by: https://github.com/seemethere, https://github.com/mehtanirav

fix StridesPolicy logic for FunctionalTensorWrapper

5762c7b

Pull Request resolved: #77358 Approved by: https://github.com/ezyang

Index reduction CUDA support

1141b45

Pull Request resolved: #76296 Approved by: https://github.com/cpuhrsch, https://github.com/ngimel

add simd horizantal reduce to improve log_softmax and softmax perform…

dcc255d

…ance on CPU Pull Request resolved: #73953 Approved by: https://github.com/frank-wei

extend replaceConvolutionWithAtenConv to handle conv_transpose3d (#76888

e867831

) Fixes #ISSUE_NUMBER Pull Request resolved: #76888 Approved by: https://github.com/eellison

ezyang and others added 22 commits May 18, 2022 18:25

Fix typo

befa4e3

Fixes #77412 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77488 Approved by: https://github.com/mruberry

Revert "Revert "Implement sym_sizes to create proper IR for sym ints …

4941e72

…representing tensor sizes (#76836)"" This reverts commit c35bd8d. Pull Request resolved: #77719 Approved by: https://github.com/Chillee, https://github.com/malfet

Revert "Adds torch.cuda.is_current_stream_capturing (#77673)"

0d8a0f1

This reverts commit d03d43d. Reverted #77673 on behalf of https://github.com/suo

Support no-batch-dim for CrossEntropyLoss with prob target

8881d7a

Pull Request resolved: #77653 Approved by: https://github.com/albanD

Throw a nice error when SubTensor.__torch_dispatch__() returns the wr…

0794d59

…ong type for detach() Pull Request resolved: #77655 Approved by: https://github.com/albanD

Add USE_MPS option to cmake summary

1f7d243

Pull Request resolved: #77782 Approved by: https://github.com/albanD

improve mps note to describe the different functions available (#77767)

dcd2ba3

Fixing #77748 Pull Request resolved: #77767 Approved by: https://github.com/soulitzer

Support copy_ for Sparse Compressed tensors.

8b5f11c

Pull Request resolved: #77605 Approved by: https://github.com/cpuhrsch

Revert "Add sharding tests to multigpu-test.sh (#77708)"

5e0f559

This reverts commit a7cf95a. Reverted #77708 on behalf of https://github.com/suo

[GHF] Add URL for pending/failed mandatory checks (#77763)

d40a240

Makes debugging of failures like #76999 (comment) easier, by posting a link to checkrun that have failed/still pending Pull Request resolved: #77763 Approved by: https://github.com/seemethere

Add testing workaround for EFA and TensorPipe (#77363)

dac3fba

This is a workaround for EFA for TensorPipe. This allows RPC enabled tests to be ran on AWS clusters. Pull Request resolved: #77363 Approved by: https://github.com/wanchaol

Update scale-config.yml (#77803)

0f328f3

more! Pull Request resolved: #77803 Approved by: https://github.com/seemethere

[shard] fix failed tests in sharded tensor

4124307

Pull Request resolved: #77800 Approved by: https://github.com/pritamdamania87, https://github.com/fduwjj

masked cumsum/cumprod

ea27244

Pull Request resolved: #77663 Approved by: https://github.com/cpuhrsch

[FSDP] Use post load_state_dict hooks (#76912)

4a57321

Rehash of #75426 now that a revised version of load_state_dict_post_hook has landed. Pull Request resolved: #76912 Approved by: https://github.com/awgu

square support

e3403ff

Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #77682 Approved by: https://github.com/ngimel, https://github.com/mruberry

MHA forward pass bug fix

f9db8b7

Pull Request resolved: #77761 Approved by: https://github.com/jbschlosser

miladm merged commit 3a2009d into miladm:master May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Send Pull Request #1

Send Pull Request #1

Uh oh!

miladm commented May 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

74 participants

Uh oh!

Send Pull Request #1

Send Pull Request #1

Uh oh!

Conversation

miladm commented May 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

74 participants