Skip to content

Add correction parameter to std/var #50903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 43 commits into from

Conversation

peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Jan 21, 2021

Stack from ghstack:

First part of #50010. Also fixes #51127.

Added overloads for torch.{std, var, std_mean, var_mean} with a correction argument specifying the difference between the sample size and number of degrees of freedom. e.g correction=1 is equivalent to Bessel's correction, which can also be gained using the unbiased=True overload.

Differential Revision: D27911345

@peterbell10 peterbell10 requested a review from albanD as a code owner January 21, 2021 20:34
@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Jan 21, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 21, 2021

💊 CI failures summary and remediations

As of commit 94a0fa4 (more details on the Dr. CI page):



🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

May 04 17:01:03 torch_xla/csrc/aten_xla_type_de...:22: error: no matching function for call to 'var'
May 04 17:01:03 /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.h:987:21: note: candidate function not viable: requires 2 arguments, but 4 were provided
May 04 17:01:03   static at::Tensor std(const at::Tensor& self, bool unbiased);
May 04 17:01:03                     ^
May 04 17:01:03 In file included from torch_xla/csrc/init_python_bindings.cpp:35:
May 04 17:01:03 In file included from /var/lib/jenkins/workspace/torch/csrc/jit/python/pybind.h:11:
May 04 17:01:03 In file included from /var/lib/jenkins/workspace/torch/csrc/jit/python/pybind_utils.h:26:
May 04 17:01:03 In file included from /var/lib/jenkins/workspace/torch/csrc/utils/python_arg_parser.h:62:
May 04 17:01:03 /var/lib/jenkins/workspace/torch/csrc/utils/python_strings.h:80:19: warning: unused function 'PyObject_FastGetAttrString' [-Wunused-function]
May 04 17:01:03 static py::object PyObject_FastGetAttrString(PyObject *obj, char *name)
May 04 17:01:03                   ^
May 04 17:01:03 torch_xla/csrc/aten_xla_type_default.cpp:4707:22: error: no matching function for call to 'var'
May 04 17:01:03   auto var_out_tmp = AtenXlaType::var(self, dim, correction, keepdim);
May 04 17:01:03                      ^~~~~~~~~~~~~~~~
May 04 17:01:03 /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.h:1114:21: note: candidate function not viable: no known conversion from 'c10::optional<at::IntArrayRef>' (aka 'optional<ArrayRef<long> >') to 'at::IntArrayRef' (aka 'ArrayRef<long>') for 2nd argument
May 04 17:01:03   static at::Tensor var(const at::Tensor& self, at::IntArrayRef dim,
May 04 17:01:03                     ^
May 04 17:01:03 /var/lib/jenkins/workspace/xla/torch_xla/csrc/aten_xla_type.h:1112:21: note: candidate function not viable: requires 2 arguments, but 4 were provided
May 04 17:01:03   static at::Tensor var(const at::Tensor& self, bool unbiased);
May 04 17:01:03                     ^
May 04 17:01:04 clang-9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/var/lib/jenkins/workspace/xla -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-bin -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/protobuf_archive/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_protobuf/src -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/eigen_archive -I/var/lib/jenkins/workspace/xla/third_party/tensorflow/bazel-tensorflow/external/com_google_absl -I/var/lib/jenkins/workspace -I/var/lib/jenkins/workspace/torch/csrc -I/var/lib/jenkins/workspace/torch/lib/tmp_install/include -I/opt/conda/lib/python3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/opt/conda/include/python3.6m -c torch_xla/csrc/layout_manager.cpp -o build/temp.linux-x86_64-3.6/torch_xla/csrc/layout_manager.o -std=c++14 -Wno-sign-compare -Wno-deprecated-declarations -Wno-return-type -Wno-macro-redefined -Wno-return-std-move -DNDEBUG -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=_XLAC -D_GLIBCXX_USE_CXX11_ABI=1
May 04 17:01:12 2 errors generated.

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (2/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

May 04 15:14:01 sccache: error: couldn't connect to server
May 04 15:14:01 +++ eval 'extract_trap_cmd '
May 04 15:14:01 ++++ extract_trap_cmd
May 04 15:14:01 ++++ printf '%s\n' ''
May 04 15:14:01 +++ printf '%s\n' cleanup
May 04 15:14:01 ++ trap -- '
May 04 15:14:01 cleanup' EXIT
May 04 15:14:01 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build != *pytorch-win-* ]]
May 04 15:14:01 ++ which sccache
May 04 15:14:01 ++ sccache --stop-server
May 04 15:14:01 Stopping sccache server...
May 04 15:14:01 sccache: error: couldn't connect to server
May 04 15:14:01 sccache: caused by: Connection refused (os error 111)
May 04 15:14:01 ++ true
May 04 15:14:01 ++ rm /var/lib/jenkins/sccache_error.log
May 04 15:14:01 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
May 04 15:14:01 ++ true
May 04 15:14:01 ++ [[ -n '' ]]
May 04 15:14:01 ++ [[ pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build == *rocm* ]]
May 04 15:14:01 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
May 04 15:14:01 ++ SCCACHE_IDLE_TIMEOUT=1200
May 04 15:14:01 ++ RUST_LOG=sccache::server=error

See CircleCI build pytorch_libtorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

May 04 15:14:02 sccache: error: couldn't connect to server
May 04 15:14:02 +++ eval 'extract_trap_cmd '
May 04 15:14:02 ++++ extract_trap_cmd
May 04 15:14:02 ++++ printf '%s\n' ''
May 04 15:14:02 +++ printf '%s\n' cleanup
May 04 15:14:02 ++ trap -- '
May 04 15:14:02 cleanup' EXIT
May 04 15:14:02 ++ [[ pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build != *pytorch-win-* ]]
May 04 15:14:02 ++ which sccache
May 04 15:14:02 ++ sccache --stop-server
May 04 15:14:02 Stopping sccache server...
May 04 15:14:02 sccache: error: couldn't connect to server
May 04 15:14:02 sccache: caused by: Connection refused (os error 111)
May 04 15:14:02 ++ true
May 04 15:14:02 ++ rm /var/lib/jenkins/sccache_error.log
May 04 15:14:02 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
May 04 15:14:02 ++ true
May 04 15:14:02 ++ [[ -n '' ]]
May 04 15:14:02 ++ [[ pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-build == *rocm* ]]
May 04 15:14:02 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
May 04 15:14:02 ++ SCCACHE_IDLE_TIMEOUT=1200
May 04 15:14:02 ++ RUST_LOG=sccache::server=error

2 jobs timed out:

  • pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build
  • pytorch_libtorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_ios_12_0_0_x86_64_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.
remote: Compressing objects:   0% (1/151)        
remote: Compressing objects:   1% (2/151)        
remote: Compressing objects:   2% (4/151)        
remote: Compressing objects:   3% (5/151)        
remote: Compressing objects:   4% (7/151)        
remote: Compressing objects:   5% (8/151)        
remote: Compressing objects:   6% (10/151)        
remote: Compressing objects:   7% (11/151)        
remote: Compressing objects:   8% (13/151)        
remote: Compressing objects:   9% (14/151)        
remote: Compressing objects:  10% (16/151)        
remote: Compressing objects:  11% (17/151)        
remote: Compressing objects:  12% (19/151)        
remote: Compressing objects:  13% (20/151)        
remote: Compressing objects:  14% (22/151)        
remote: Compressing objects:  15% (23/151)        
remote: Compressing objects:  16% (25/151)        
remote: Compressing objects:  17% (26/151)        
remote: Compressing objects:  18% (28/151)        
remote: Compressing objects:  19% (29/151)        
remote: Compressing objects:  20% (31/151)        
remote: Compressing objects:  21% (32/151)        
remote: Compressing objects:  22% (34/151)        
remote: Compressing objects:  23% (35/151)        
remote: Compressing objects:  24% (37/151)        
remote: Compressing objects:  25% (38/151)        
remote: Compressing objects:  26% (40/151)        
remote: Compressing objects:  27% (41/151)        
remote: Compressing objects:  28% (43/151)        
remote: Compressing objects:  29% (44/151)        
remote: Compressing objects:  30% (46/151)        
remote: Compressing objects:  31% (47/151)        
remote: Compressing objects:  32% (49/151)        
remote: Compressing objects:  33% (50/151)        
remote: Compressing objects:  34% (52/151)        
remote: Compressing objects:  35% (53/151)        
remote: Compressing objects:  36% (55/151)        
remote: Compressing objects:  37% (56/151)        
remote: Compressing objects:  38% (58/151)        
remote: Compressing objects:  39% (59/151)        
remote: Compressing objects:  40% (61/151)        
remote: Compressing objects:  41% (62/151)        
remote: Compressing objects:  42% (64/151)        
remote: Compressing objects:  43% (65/151)        
remote: Compressing objects:  44% (67/151)        
remote: Compressing objects:  45% (68/151)        
remote: Compressing objects:  46% (70/151)        
remote: Compressing objects:  47% (71/151)        
remote: Compressing objects:  48% (73/151)        
remote: Compressing objects:  49% (74/151)        
remote: Compressing objects:  50% (76/151)        
remote: Compressing objects:  51% (78/151)        
remote: Compressing objects:  52% (79/151)        
remote: Compressing objects:  53% (81/151)        
remote: Compressing objects:  54% (82/151)        
remote: Compressing objects:  55% (84/151)        
remote: Compressing objects:  56% (85/151)        
remote: Compressing objects:  57% (87/151)        
remote: Compressing objects:  58% (88/151)        
remote: Compressing objects:  59% (90/151)        
remote: Compressing objects:  60% (91/151)        
remote: Compressing objects:  61% (93/151)        
remote: Compressing objects:  62% (94/151)        
remote: Compressing objects:  63% (96/151)        
remote: Compressing objects:  64% (97/151)        
remote: Compressing objects:  65% (99/151)        
remote: Compressing objects:  66% (100/151)        
remote: Compressing objects:  67% (102/151)        
remote: Compressing objects:  68% (103/151)        
remote: Compressing objects:  69% (105/151)        
remote: Compressing objects:  70% (106/151)        
remote: Compressing objects:  71% (108/151)        
remote: Compressing objects:  72% (109/151)        
remote: Compressing objects:  73% (111/151)        
remote: Compressing objects:  74% (112/151)        
remote: Compressing objects:  75% (114/151)        
remote: Compressing objects:  76% (115/151)        
remote: Compressing objects:  77% (117/151)        
remote: Compressing objects:  78% (118/151)        
remote: Compressing objects:  79% (120/151)        
remote: Compressing objects:  80% (121/151)        
remote: Compressing objects:  81% (123/151)        
remote: Compressing objects:  82% (124/151)        
remote: Compressing objects:  83% (126/151)        
remote: Compressing objects:  84% (127/151)        
remote: Compressing objects:  85% (129/151)        
remote: Compressing objects:  86% (130/151)        
remote: Compressing objects:  87% (132/151)        
remote: Compressing objects:  88% (133/151)        
remote: Compressing objects:  89% (135/151)        
remote: Compressing objects:  90% (136/151)        
remote: Compressing objects:  91% (138/151)        
remote: Compressing objects:  92% (139/151)        
remote: Compressing objects:  93% (141/151)        
remote: Compressing objects:  94% (142/151)        
remote: Compressing objects:  95% (144/151)        
remote: Compressing objects:  96% (145/151)        
remote: Compressing objects:
Receiving objects:   0% (1/158)
Receiving objects:   1% (2/158)
Receiving objects:   2% (4/158)
Receiving objects:   3% (5/158)
Receiving objects:   4% (7/158)
Receiving objects:   5% (8/158)
Receiving objects:   6% (10/158)
Receiving objects:   7% (12/158)
Receiving objects:   8% (13/158)
Receiving objects:   9% (15/158)
Receiving objects:  10% (16/158)
Receiving objects:  11% (18/158)
Receiving objects:  12% (19/158)
Receiving objects:  13% (21/158)
Receiving objects:  14% (23/158)
Receiving objects:  15% (24/158)
Receiving objects:  16% (26/158)
Receiving objects:  17% (27/158)
Receiving objects:  18% (29/158)
Receiving objects:  19% (31/158)
Receiving objects:  20% (32/158)
Receiving objects:  21% (34/158)
Receiving objects:  22% (35/158)
Receiving objects:  23% (37/158)
Receiving objects:  24% (38/158)
Receiving objects:  25% (40/158)
Receiving objects:  26% (42/158)
Receiving objects:  27% (43/158)
Receiving objects:  28% (45/158)
Receiving objects:  29% (46/158)
Receiving objects:  30% (48/158)
Receiving objects:  31% (49/158)
Receiving objects:  32% (51/158)
Receiving objects:  33% (53/158)
Receiving objects:  34% (54/158)
Receiving objects:  35% (56/158)
Receiving objects:  36% (57/158)
Receiving objects:  37% (59/158)
Receiving objects:  38% (61/158)
Receiving objects:  39% (62/158)
Receiving objects:  40% (64/158)
Receiving objects:  41% (65/158)
Receiving objects:  42% (67/158)
Receiving objects:  43% (68/158)
Receiving objects:  44% (70/158)
Receiving objects:  45% (72/158)
Receiving objects:  46% (73/158)
Receiving objects:  47% (75/158)
Receiving objects:  48% (76/158)
Receiving objects:  49% (78/158)
Receiving objects:  50% (79/158)
Receiving objects:  51% (81/158)
Receiving objects:  52% (83/158)
Receiving objects:  53% (84/158)
Receiving objects:  54% (86/158)
Receiving objects:  55% (87/158)
Receiving objects:  56% (89/158)
Receiving objects:  57% (91/158)
Receiving objects:  58% (92/158)
Receiving objects:  59% (94/158)
Receiving objects:  60% (95/158)
Receiving objects:  61% (97/158)
Receiving objects:  62% (98/158)
Receiving objects:  63% (100/158)
Receiving objects:  64% (102/158)
Receiving objects:  65% (103/158)
Receiving objects:  66% (105/158)
Receiving objects:  67% (106/158)
Receiving objects:  68% (108/158)
Receiving objects:  69% (110/158)
Receiving objects:  70% (111/158)
Receiving objects:  71% (113/158)
Receiving objects:  72% (114/158)
Receiving objects:  73% (116/158)
Receiving objects:  74% (117/158)
Receiving objects:  75% (119/158)
Receiving objects:  76% (121/158)
Receiving objects:  77% (122/158)
Receiving objects:  78% (124/158)
Receiving objects:  79% (125/158)
Receiving objects:  80% (127/158)
Receiving objects:  81% (128/158)
Receiving objects:  82% (130/158)
Receiving objects:  83% (132/158)
Receiving objects:  84% (133/158)
Receiving objects:  85% (135/158)
Receiving objects:  86% (136/158)
Receiving objects:  87% (138/158)
Receiving objects:  88% (140/158)
remote: Total 158 (delta 109), reused 15 (delta 4), pack-reused 0        
Receiving objects:  89% (141/158)
Receiving objects:  90% (143/158)
Receiving objects:  91% (144/158)
Receiving objects:  92% (146/158)
Receiving objects:  93% (147/158)
Receiving objects:  94% (149/158)
Receiving objects:  95% (151/158)
Receiving objects:  96% (152/158)
Receiving objects:  97% (154/158)
Receiving objects:  98% (155/158)
Receiving objects:  99% (157/158)
Receiving objects: 100% (158/158)
Receiving objects: 100% (158/158), 210.27 KiB | 15.02 MiB/s, done.
Resolving deltas:   0% (0/109)
Resolving deltas:   1% (2/109)
Resolving deltas:   9% (10/109)
Resolving deltas:  10% (11/109)
Resolving deltas:  11% (12/109)
Resolving deltas:  12% (14/109)
Resolving deltas:  13% (15/109)
Resolving deltas:  14% (16/109)
Resolving deltas:  15% (17/109)
Resolving deltas:  16% (18/109)
Resolving deltas:  17% (19/109)
Resolving deltas:  18% (20/109)
Resolving deltas:  19% (21/109)
Resolving deltas:  20% (22/109)
Resolving deltas:  21% (23/109)
Resolving deltas:  22% (24/109)
Resolving deltas:  26% (29/109)
Resolving deltas:  27% (30/109)
Resolving deltas:  28% (31/109)
Resolving deltas:  29% (32/109)
Resolving deltas:  30% (33/109)
Resolving deltas:  31% (34/109)
Resolving deltas:  32% (35/109)
Resolving deltas:  33% (36/109)
Resolving deltas:  34% (38/109)
Resolving deltas:  35% (39/109)
Resolving deltas:  36% (40/109)
Resolving deltas:  37% (41/109)
Resolving deltas:  38% (42/109)
Resolving deltas:  39% (43/109)
Resolving deltas:  40% (44/109)
Resolving deltas:  41% (45/109)
Resolving deltas:  42% (46/109)
Resolving deltas:  43% (47/109)
Resolving deltas:  44% (48/109)
Resolving deltas:  45% (50/109)
Resolving deltas:  46% (51/109)
Resolving deltas:  47% (52/109)
Resolving deltas:  48% (53/109)
Resolving deltas:  49% (54/109)
Resolving deltas:  50% (55/109)
Resolving deltas:  51% (56/109)
Resolving deltas:  52% (57/109)
Resolving deltas:  53% (58/109)
Resolving deltas:  54% (59/109)
Resolving deltas:  55% (60/109)
Resolving deltas:  56% (62/109)
Resolving deltas:  57% (63/109)
Resolving deltas:  58% (64/109)
Resolving deltas:  59% (65/109)
Resolving deltas:  60% (66/109)
Resolving deltas:  61% (67/109)
Resolving deltas:  62% (68/109)
Resolving deltas:  63% (69/109)
Resolving deltas:  64% (70/109)
Resolving deltas:  65% (71/109)
Resolving deltas:  66% (72/109)
Resolving deltas:  67% (74/109)
Resolving deltas:  68% (75/109)
Resolving deltas:  69% (76/109)
Resolving deltas:  70% (77/109)
Resolving deltas:  71% (78/109)
Resolving deltas:  72% (79/109)
Resolving deltas:  73% (80/109)
Resolving deltas:  74% (81/109)
Resolving deltas:  75% (82/109)
Resolving deltas:  76% (83/109)
Resolving deltas:  77% (84/109)
Resolving deltas:  78% (86/109)
Resolving deltas:  79% (87/109)
Resolving deltas:  80% (88/109)
Resolving deltas:  81% (89/109)
Resolving deltas:  82% (90/109)
Resolving deltas:  83% (91/109)
Resolving deltas:  84% (92/109)
Resolving deltas:  85% (93/109)
Resolving deltas:  86% (94/109)
Resolving deltas:  87% (95/109)
Resolving deltas:  88% (96/109)
Resolving deltas:  89% (98/109)
Resolving deltas:  90% (99/109)
Resolving deltas:  91% (100/109)
Resolving deltas:  92% (101/109)
Resolving deltas:  93% (102/109)
Resolving deltas:  94% (103/109)
Resolving deltas:  95% (104/109)
Resolving deltas:  96% (105/109)
Resolving deltas:  97% (106/109)
Resolving deltas:  98% (107/109)
Resolving deltas:  99% (108/109)
Resolving deltas: 100% (109/109)
Resolving deltas: 100% (109/109), completed with 97 local objects.
From ssh://github.com/google/googletest
 * branch            7aca84427f224eeed3144123d5230d5871e93347 -> FETCH_HEAD
Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347'
remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0        
Received disconnect from 140.82.114.4 port 22:11: Bye Bye

Disconnected from 140.82.114.4 port 22

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fetched in submodule path 'third_party/nccl/nccl', but it did not contain 033d799524fb97629af5ac2f609de367472b2696. Direct fetching of that commit failed.


Exited with code exit status 1


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

peterbell10 added a commit that referenced this pull request Jan 21, 2021
ghstack-source-id: f25f46a
Pull Request resolved: #50903
@peterbell10 peterbell10 requested a review from mruberry January 21, 2021 21:03
peterbell10 added a commit that referenced this pull request Jan 21, 2021
ghstack-source-id: f6a6e24
Pull Request resolved: #50903
@peterbell10 peterbell10 marked this pull request as draft January 21, 2021 22:39
peterbell10 added a commit that referenced this pull request Jan 22, 2021
ghstack-source-id: a93431c
Pull Request resolved: #50903
peterbell10 added a commit that referenced this pull request Jan 22, 2021
ghstack-source-id: 1ca5a15
Pull Request resolved: #50903
peterbell10 added a commit that referenced this pull request Jan 22, 2021
ghstack-source-id: 2dad5ea
Pull Request resolved: #50903
peterbell10 added a commit that referenced this pull request Jan 23, 2021
ghstack-source-id: d8eae9e
Pull Request resolved: #50903
@mruberry
Copy link
Collaborator

Looks like a complex test is failing internally:

test_std_correction_vs_numpy_cpu_complex64 (test_reductions.TestReductionsCPU) (architecture: x86_64, buildmode: opt, buildsystem: buck, compiler: clang, sanitizer: none)


AssertionError: False is not true : Scalars failed to compare as equal! Comparing the real part nan and inf gives a difference of nan, but the allowed difference with rtol=1.3e-06 and atol=1e-05 is only inf!

torch/testing/_internal/common_device_type.py", line 292, in instantiated_test
    result = test_fn(self, *args)
test_reductions.py", line 2174, in test_std_correction_vs_numpy
    self.assertEqual(torch_res, numpy_res, exact_dtype=False)

Maybe the test is sensitive to different random values, a different version of NumPy, or a different compiler? We might need to make the test "easier" or simplify it. The complex double version of the test is failing with the same error.

First part of #50010. Also fixes #51127.

Differential Revision: [D27911345](https://our.internmc.facebook.com/intern/diff/D27911345)

[ghstack-poisoned]
@peterbell10
Copy link
Collaborator Author

Maybe the test is sensitive to different random values, a different version of NumPy, or a different compiler? We might need to make the test "easier" or simplify it. The complex double version of the test is failing with the same error.

@mruberry the test is sensitive to machine precision, but only in the event the variance is zero (since exactly zero gives nan, while finite but small gives inf). However, zero variance seems very unlikely on random data so that's still a bit surprising and might be worth looking into.

For now, I've changed the test to treat nan and inf as equal.

@mruberry
Copy link
Collaborator

mruberry commented Apr 22, 2021

Unfortunately the same tests are failing:

AssertionError: False is not true : Scalars failed to compare as equal! Comparing the imaginary part nan and 0.0 gives a difference of nan, but the allowed difference with rtol=1.3e-06 and atol=1e-05 is only 1e-05!

If it's a pain to sort out what's going I could try to get someone with access to the internal test harness to take a look?

First part of #50010. Also fixes #51127.

Differential Revision: [D27911345](https://our.internmc.facebook.com/intern/diff/D27911345)

[ghstack-poisoned]
@peterbell10
Copy link
Collaborator Author

@mruberry I've loosened the restrictions on the imaginary component this time. Allowing to to be a non-finite value if the real component also isn't finite. If that doesn't work, I really need to know what arguments to std are failing. Including the tensor shape and contents.

@mruberry
Copy link
Collaborator

@mruberry I've loosened the restrictions on the imaginary component this time. Allowing to to be a non-finite value if the real component also isn't finite. If that doesn't work, I really need to know what arguments to std are failing. Including the tensor shape and contents.

Sounds good; let's see what the internal tests say.

@mruberry
Copy link
Collaborator

Good news! Internal tests are now passing. I'll get the land process started tomorrow morning and ping @JackCaoG with updates

@mruberry
Copy link
Collaborator

Darn, looks like the PR CI failures for fx are real:

test_normalize_operator_exhaustive_std_cpu_float32
test_normalize_operator_exhaustive_var_cpu_float32

Probably just a tweak to that test is needed?

First part of #50010. Also fixes #51127.

Differential Revision: [D27911345](https://our.internmc.facebook.com/intern/diff/D27911345)

[ghstack-poisoned]
First part of #50010. Also fixes #51127.

Differential Revision: [D27911345](https://our.internmc.facebook.com/intern/diff/D27911345)

[ghstack-poisoned]
@peterbell10
Copy link
Collaborator Author

@mruberry since the fx PR got reverted, I've removed it from this stack and instead just added a skip in the test for std and var. That should unblock this PR.

First part of #50010. Also fixes #51127.

Differential Revision: [D27911345](https://our.internmc.facebook.com/intern/diff/D27911345)

[ghstack-poisoned]
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Apr 29, 2021
ghstack-source-id: 49aef26
Pull Request resolved: pytorch#50903
@mruberry
Copy link
Collaborator

Tests look good. @Chillee's fix may also have resolved the fx issue but we don't need to worry about unskipping it for this PR. Internal tests look. I'll check-in with @JackCaoG during business hours to validate we're OK to land this, and we'll try to merge it today.

@mruberry
Copy link
Collaborator

We'll have to try again on Monday, sorry @peterbell10. Some landing issues let this accumulate another merge conflict. Let's rebase it Monday morning and I'll start the land process while no one's committing.

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request May 4, 2021
ghstack-source-id: 49aef26
Pull Request resolved: pytorch#50903
First part of #50010. Also fixes #51127.

Differential Revision: [D27911345](https://our.internmc.facebook.com/intern/diff/D27911345)

[ghstack-poisoned]
@peterbell10
Copy link
Collaborator Author

@mruberry I've rebased on viable/strict and fixed merge conflicts.

@mruberry
Copy link
Collaborator

mruberry commented May 6, 2021

Internal tests look good; unfortunately I lost power this morning, but I'll try to coordinate this with @JackCaoG tomorrow morning

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 2043093.

@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/47/head branch May 11, 2021 14:17
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#50903

First part of pytorch#50010. Also fixes pytorch#51127.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27911345

Pulled By: mruberry

fbshipit-source-id: 7138fddc935802918ab9ff19f4bc1b9f4d745d41
jasperzhong pushed a commit to jasperzhong/swift that referenced this pull request Nov 25, 2021
ghstack-source-id: 98ac0e1
Pull Request resolved: pytorch/pytorch#50903
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants