The matrix multiplication operator can't get correct results on 3090 !! #61890
Labels
module: cuda
Related to torch.cuda, and CUDA support in general
module: tf32
Related to tf32 data format
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Bug
The matrix multiplication operator can't get correct results on 3090 !!
To Reproduce
mini code sample:
output:
Expected behavior
because cur_mat is a identity mat, the output should be unchanged.
Environment
conda
,pip
, source): pipdetails
Additional context
Only 3090 exists the above-mentioned problem.
Testing results show that pytorch with '1.7.1+cu110' verision can get the right result. However, when giving a tensor with a lagre batch size, bmm operator is also ubable to return the right result.
code sample:
test data : tt_dict.pkl
output:
cc @ngimel @zasdfgbnm @ptrblck
The text was updated successfully, but these errors were encountered: