mm checks and correct int dtypes if A is on the gpu #658

coquelin77 · 2020-08-25T11:33:11Z

Description

fixes bug by casting tensors to floats before mm operation

Issue/s resolved: #657

Changes proposed:

for matmul: cast tensors to floats if the devices are GPUs and the dtypes are ints

Type of change

Bug fix (non-breaking change which fixes an issue)

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

mtar · 2020-08-25T11:33:13Z

GPU cluster tests are currently disabled on this Pull Request.

codecov · 2020-08-25T11:38:51Z

Codecov Report

Merging #658 (48004d8) into master (0c9c763) will decrease coverage by 0.12%.
The diff coverage is 79.19%.

@@            Coverage Diff             @@
##           master     #658      +/-   ##
==========================================
- Coverage   97.54%   97.41%   -0.13%     
==========================================
  Files          87       87              
  Lines       18219    18331     +112     
==========================================
+ Hits        17771    17857      +86     
- Misses        448      474      +26

Impacted Files	Coverage Δ
heat/core/linalg/basics.py	`90.14% <62.65%> (-4.12%)`	⬇️
heat/core/linalg/tests/test_basics.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c9c763...48004d8. Read the comment docs.

mtar · 2020-08-25T11:53:01Z

test this please

bhagemeier · 2020-08-25T14:54:31Z

It works, but the results are now dependent on whether you compute on the GPU or CPU. Do we have other places where we do similar stuff? The actual difference at first sight is the difference in the data type returned (int64 vs. float64). Further down the road, we're losing 11 bits of precision for very large numbers. This may be a corner case, but I wanted to point it out before approving this.

>>> a = ht.array([[0, 1], [2, 3]])
>>> b = ht.array([[4, 5], [6, 7]])
>>> ht.dot(a, b)
DNDarray([[ 6,  7],
          [26, 31]], dtype=ht.int64, device=cpu:0, split=None)
>>> a.gpu()
DNDarray([[0, 1],
          [2, 3]], dtype=ht.int64, device=gpu:0, split=None)
>>> b.gpu()
DNDarray([[4, 5],
          [6, 7]], dtype=ht.int64, device=gpu:0, split=None)
>>> ht.dot(a, b)
DNDarray([[ 6.,  7.],
          [26., 31.]], dtype=ht.float64, device=gpu:0, split=None)

coquelin77 · 2020-08-26T06:38:09Z

as far as i know, this is the only place where we do a specific type change for GPUs. however, the function required for this operation, namely addmm_cuda is not implemented so we are forced to work around it. even if we were to do this:

r = a_block @ b_block
c[c_start0 : c_start0 + mB, c_start1 : c_start1 + nB] += r

it still fails with the same error, unfortunately I do not see a way around this. we can loop back to it when this function is implemented in torch.

mtar · 2020-09-01T10:20:45Z

The dtype of the returned array changes. As a user, I would expect it to be the same as the input.

coquelin77 · 2020-09-01T10:23:24Z

again, the operation which we need is not implemented in torch. this cannot be done a different way in this specific scenario. It can throw a warning but that seems excessive.

mtar · 2020-09-01T10:24:57Z

Can you change the type afterwards before returning?

coquelin77 · 2020-09-01T10:27:43Z

the precision is already lost even if the type is changed back. we can cast back at the end but it does require yet another check in all 8 return locations

Markus-Goetz · 2020-09-09T14:09:08Z

Even if we do loose precision, we should be consistent on the types. For GPU, cast the output tensor back to int64. The case that you actually do int64 mm on GPU is relatively seldom to being with anyway.

mtar · 2020-09-22T08:13:33Z

rerun tests

coquelin77 · 2020-09-22T14:23:59Z

this now casts back to the initial promoted dtype for ints on GPUs

CHANGELOG.md

bhagemeier

Looks good now from my point of view.

bhagemeier

Test have been added. I have a feeling that codecov is having trouble just now. Therefore, I vote for ignoring the minute (.01%) decrease in coverage.

mm checks and correct int dtypes if A is on the gpu

0d574c6

coquelin77 requested a review from bhagemeier August 25, 2020 11:33

Merge branch 'master' into bug/657-mm-int-gpu

0052dc7

coquelin77 added 3 commits September 22, 2020 16:17

minor updates

531a037

cosmetic changes and gpu int flag added

97dc8c8

Merge branch 'master' into bug/657-mm-int-gpu

9ea24a0

coquelin77 added 4 commits September 22, 2020 16:38

changelog update

f9c0bdd

Merge branch 'master' into bug/657-mm-int-gpu

0d19715

Merge branch 'master' into bug/657-mm-int-gpu

b99e6f0

pre-commit adjustments

da622dc

bhagemeier reviewed Dec 4, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

re-add accidentally removed lines in CHANGELOG.md

85505c3

bhagemeier previously approved these changes Dec 4, 2020

View reviewed changes

int64 mm tests

48004d8

coquelin77 dismissed bhagemeier’s stale review via 48004d8 December 4, 2020 11:50

bhagemeier approved these changes Dec 4, 2020

View reviewed changes

bhagemeier merged commit 6a10e94 into master Dec 4, 2020

bhagemeier deleted the bug/657-mm-int-gpu branch December 8, 2021 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mm checks and correct int dtypes if A is on the gpu #658

mm checks and correct int dtypes if A is on the gpu #658

coquelin77 commented Aug 25, 2020 •

edited

mtar commented Aug 25, 2020

codecov bot commented Aug 25, 2020 •

edited

mtar commented Aug 25, 2020

bhagemeier commented Aug 25, 2020

coquelin77 commented Aug 26, 2020

mtar commented Sep 1, 2020

coquelin77 commented Sep 1, 2020

mtar commented Sep 1, 2020 •

edited

coquelin77 commented Sep 1, 2020

Markus-Goetz commented Sep 9, 2020

mtar commented Sep 22, 2020

coquelin77 commented Sep 22, 2020

bhagemeier left a comment

bhagemeier left a comment

mm checks and correct int dtypes if A is on the gpu #658

mm checks and correct int dtypes if A is on the gpu #658

Conversation

coquelin77 commented Aug 25, 2020 • edited

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

mtar commented Aug 25, 2020

codecov bot commented Aug 25, 2020 • edited

Codecov Report

mtar commented Aug 25, 2020

bhagemeier commented Aug 25, 2020

coquelin77 commented Aug 26, 2020

mtar commented Sep 1, 2020

coquelin77 commented Sep 1, 2020

mtar commented Sep 1, 2020 • edited

coquelin77 commented Sep 1, 2020

Markus-Goetz commented Sep 9, 2020

mtar commented Sep 22, 2020

coquelin77 commented Sep 22, 2020

bhagemeier left a comment

Choose a reason for hiding this comment

bhagemeier left a comment

Choose a reason for hiding this comment

coquelin77 commented Aug 25, 2020 •

edited

codecov bot commented Aug 25, 2020 •

edited

mtar commented Sep 1, 2020 •

edited