New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mm checks and correct int dtypes if A is on the gpu #658
Conversation
GPU cluster tests are currently disabled on this Pull Request. |
Codecov Report
@@ Coverage Diff @@
## master #658 +/- ##
==========================================
- Coverage 97.54% 97.41% -0.13%
==========================================
Files 87 87
Lines 18219 18331 +112
==========================================
+ Hits 17771 17857 +86
- Misses 448 474 +26
Continue to review full report at Codecov.
|
test this please |
It works, but the results are now dependent on whether you compute on the GPU or CPU. Do we have other places where we do similar stuff? The actual difference at first sight is the difference in the data type returned (int64 vs. float64). Further down the road, we're losing 11 bits of precision for very large numbers. This may be a corner case, but I wanted to point it out before approving this.
|
as far as i know, this is the only place where we do a specific type change for GPUs. however, the function required for this operation, namely r = a_block @ b_block
c[c_start0 : c_start0 + mB, c_start1 : c_start1 + nB] += r it still fails with the same error, unfortunately I do not see a way around this. we can loop back to it when this function is implemented in torch. |
The dtype of the returned array changes. As a user, I would expect it to be the same as the input. |
again, the operation which we need is not implemented in torch. this cannot be done a different way in this specific scenario. It can throw a warning but that seems excessive. |
Can you change the type afterwards before returning? |
the precision is already lost even if the type is changed back. we can cast back at the end but it does require yet another check in all 8 return locations |
Even if we do loose precision, we should be consistent on the types. For GPU, cast the output tensor back to int64. The case that you actually do int64 mm on GPU is relatively seldom to being with anyway. |
rerun tests |
this now casts back to the initial promoted dtype for ints on GPUs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now from my point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test have been added. I have a feeling that codecov is having trouble just now. Therefore, I vote for ignoring the minute (.01%) decrease in coverage.
Description
fixes bug by casting tensors to floats before mm operation
Issue/s resolved: #657
Changes proposed:
matmul
: cast tensors to floats if the devices are GPUs and the dtypes are intsType of change
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no