Expand the coverage of test_addmm and test_addmm_sizes #43831

zasdfgbnm · 2020-08-29T08:32:23Z

This test is very fast and very important, so it makes no sense in marking it as slowTest
This test is should also run on CUDA
This test should check alpha and beta support
This test should check out= support
manual computation should use list instead of index_put because list is much faster
precision for TF32 needs to be fixed. Will do it in future PR.

zasdfgbnm · 2020-08-29T08:33:03Z

test/test_torch.py

-            self.assertEqual(res1, res2, atol=prec, rtol=0)
+        }[dtype]
+
+        if False and dtype.is_complex:  # bug to be fixed in another PR


To be fixed in #43827

codecov · 2020-08-29T11:20:59Z

Codecov Report

❗ No coverage uploaded for pull request base (master@4e4626a). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #43831   +/-   ##
=========================================
  Coverage          ?   40.16%           
=========================================
  Files             ?      378           
  Lines             ?    46728           
  Branches          ?        0           
=========================================
  Hits              ?    18767           
  Misses            ?    27961           
  Partials          ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e4626a...511266f. Read the comment docs.

dr-ci · 2020-08-29T11:21:09Z

💊 CI failures summary and remediations

As of commit c9b002c (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.7-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 21 times.

test/test_torch.py

ngimel · 2020-08-31T18:54:46Z

test/test_torch.py

+    @dtypesIfCUDA(*torch.testing.get_all_complex_dtypes(), *torch.testing.get_all_fp_dtypes(include_bfloat16=False))
+    @dtypes(*torch.testing.get_all_complex_dtypes(), *torch.testing.get_all_fp_dtypes())
+    def test_addmm(self, device, dtype):
+        prec = {


can you use @precisionOverride for this?

Fixed. And note that the precision for bfloat16 is bumped to 0.6. Because we are now accumulating in float32 instead of bfloat16 scalar.

AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.016 and atol=0.1, found 2 element(s) (out of 250) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.53125 (8.375 vs. 7.84375), which occurred at index (6, 4).

test/test_torch.py

…_addmm

ngimel

Thank you! Waiting for CI.
Not for this PR, but we should also test that with beta=0 nans and infs in M are not propagated, right now I think we are doing it only for empty inputs.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2020-09-01T00:18:49Z

@ailzhang do you know why xla does not inherit precisionOverride here for bfloat16 (0.6) ? Is it ok if we disable bfloat16 on xla?

ngimel · 2020-09-01T17:07:11Z

test/test_torch.py

+    @precisionOverride({torch.double: 1e-8, torch.float: 1e-4, torch.bfloat16: 0.6,
+                        torch.half: 1e-1, torch.cfloat: 1e-4, torch.cdouble: 1e-8})
+    @dtypesIfCUDA(*torch.testing.get_all_complex_dtypes(), *torch.testing.get_all_fp_dtypes(include_bfloat16=False))
+    @dtypes(*torch.testing.get_all_complex_dtypes(), *torch.testing.get_all_fp_dtypes())


can you please not include bfloat16 here, and include it only in @dtypesIfCPU so that bfloat16 is not run on XLA?

ailzhang · 2020-09-01T17:09:00Z

cc: @JackCaoG made some changes to bfloat16 test skip, do you have an idea what's the best workaround here?

JackCaoG · 2020-09-01T17:20:08Z

@ailzhang the change was to auto skip all of the float16 tests, we still want to run bfloat16 tests. My guess is that pt/xla has its own precision overwrite for each type and ignore pytorch's precision.

ngimel · 2020-09-01T19:00:57Z

@JackCaoG @ailzhang so what do you guys suggest we do? Currently bfloat16 test is failing with

Tensors failed to compare as equal! With rtol=0.001 and atol=0.001, found 182 element(s) (out of 250) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.53125 (8.375 vs. 7.84375), which occurred at index (6, 4).

tbh, rtol 0.001 and atol 0.001 seems incredibly low for bfloat16 even under the best of circumstances, typically errors are much larger. And here the test definitely won't pass with such tolerances. Is there a way to override tolerances on the xla side?

JackCaoG · 2020-09-01T19:25:18Z

@JackCaoG @ailzhang so what do you guys suggest we do? Currently bfloat16 test is failing with
Tensors failed to compare as equal! With rtol=0.001 and atol=0.001, found 182 element(s) (out of 250) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.53125 (8.375 vs. 7.84375), which occurred at index (6, 4).
tbh, rtol 0.001 and atol 0.001 seems incredibly low for bfloat16 even under the best of circumstances, typically errors are much larger. And here the test definitely won't pass with such tolerances. Is there a way to override tolerances on the xla side?

@ngimel Let me submit a pr on the pt/xla side to disable this test on our end. Ideally we should take pytorch precision overwrite if it is bigger than pt/xla's. I will investigate a bit on this too.

zasdfgbnm · 2020-09-02T17:36:41Z

Looks like GitHub automatically closed this by mistake? I will reopen and rebase.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-09-03T04:15:09Z

@ngimel merged this pull request in bc45c47.

Expand the coverage of test_addmm

41292db

zasdfgbnm requested review from ngimel and mruberry August 29, 2020 08:32

zasdfgbnm commented Aug 29, 2020

View reviewed changes

pytorchbot added the open source label Aug 29, 2020

Expand coverage of test_addmm_sizes

ba19d3b

Merge branch 'test_addmm_sizes' into test_addmm

6105921

zasdfgbnm changed the title ~~Expand the coverage of test_addmm~~ Expand the coverage of test_addmm and test_addmm_sizes Aug 29, 2020

zasdfgbnm commented Aug 29, 2020

View reviewed changes

test/test_torch.py Outdated Show resolved Hide resolved

ngimel reviewed Aug 31, 2020

View reviewed changes

zasdfgbnm added 6 commits August 31, 2020 13:18

Merge branch 'master' of https://github.com/pytorch/pytorch into test…

fa55bcf

…_addmm

Merge branch 'master' of https://github.com/pytorch/pytorch into test…

257a264

…_addmm

fix

417e202

fix

8c01119

fix

a3e3223

skip if no numpy

511266f

ngimel approved these changes Aug 31, 2020

View reviewed changes

facebook-github-bot reviewed Aug 31, 2020

View reviewed changes

ngimel reviewed Sep 1, 2020

View reviewed changes

This was referenced Sep 1, 2020

Disable test_addmm_xla_bfloat16 pytorch/xla#2477

Closed

Only overwrite test precision if the pt/xla tolerance is larger pytorch/xla#2478

Merged

Merge branch 'master' of github.com:pytorch/pytorch into test_addmm

cea5e29

JackCaoG closed this in pytorch/xla#2478 Sep 2, 2020

zasdfgbnm reopened this Sep 2, 2020

Merge branch 'master' of github.com:pytorch/pytorch into test_addmm

c9b002c

facebook-github-bot reviewed Sep 2, 2020

View reviewed changes

facebook-github-bot closed this in bc45c47 Sep 3, 2020

facebook-github-bot added the merged label Sep 3, 2020

zasdfgbnm deleted the test_addmm branch September 3, 2020 16:24

mruberry added the Merged label Oct 28, 2020

Expand the coverage of test_addmm and test_addmm_sizes #43831

Expand the coverage of test_addmm and test_addmm_sizes #43831

Uh oh!

Conversation

zasdfgbnm commented Aug 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm Aug 29, 2020

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dr-ci bot commented Aug 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

Uh oh!

ngimel Aug 31, 2020 • edited by mruberry Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Sep 1, 2020

Uh oh!

ngimel Sep 1, 2020

Choose a reason for hiding this comment

Uh oh!

ailzhang commented Sep 1, 2020

Uh oh!

JackCaoG commented Sep 1, 2020

Uh oh!

ngimel commented Sep 1, 2020

Uh oh!

JackCaoG commented Sep 1, 2020

Uh oh!

zasdfgbnm commented Sep 2, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 3, 2020

Uh oh!

Uh oh!

zasdfgbnm commented Aug 29, 2020 •

edited

Loading

codecov bot commented Aug 29, 2020 •

edited

Loading

dr-ci bot commented Aug 29, 2020 •

edited

Loading

ngimel Aug 31, 2020 •

edited by mruberry

Loading

zasdfgbnm Aug 31, 2020 •

edited

Loading