New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensor.new() can have nan's, and some pytorch code is thus unsafe #1347
Comments
inside the blas side of things, we explicitly check for the special case of alpha=0 and zero the output. if you see that it isn't true for a particular blas call, let me know and I'll fix it, but for mm and mv it should already be fixed. |
I see. I had the issue with |
For instance, undoing the zero'ing of the new tensors in the four functions mentioned in the opening post causes tests to fail in #1306. The tests pass on one of the python versions, and on my machine a different test suit fails (which is where I hunted down the root). Now, it seems very possible that I introduced this vulnerability somehow, because this doesn't seem to happen without my PR. I'll look into it. |
For ger, if beta=0 tensors are explicitly zeroed on the CPU and GPU sides: |
Ok, I confirmed that this is not a problem without my PR (I manually inserted nan's into |
* Refactor War Sync Insertion Pass (pytorch#1339) * Remove kir::Expr::scope_ (pytorch#1341) * Fusion IR Refactor (pytorch#1343) * Refactor KIR Step 1 - Remove kir::Node (pytorch#1347) * Refactor KIR Step 2 - TMP IrUtils change (pytorch#1348) * Refactor KIR Step 3 - Remove kir::Expr and kir::Val. (pytorch#1349) * Refactor KIR Step 4 - Remove kir::Bool,Double,Int,NamedScalar. (pytorch#1350) * Refactor KIR Step 5 - Remove kir::IterDomain/TensorDomain/TensorView (pytorch#1351) * Refactor KIR Step 6 - Remove kir::UnaryOp/BinaryOp/TernaryOp/ReductionOp/WelfordOp/BroadcastOp. (pytorch#1352) * Refactor KIR Step 7 - Remove kir dispatch (pytorch#1353) * Refactor KIR Step 8 - Clean up lower_utils (pytorch#1355) * Refactor KIR Step 9 - lower_utils ir_utils::applyReplacements. (pytorch#1354) * Refactor KIR Step 10 - Remove kir_printer in favor of io_stream (pytorch#1356)
tensor.new()
does not initialize memory, so it could end up containingnan
's. This could be unsafe in some cases.For example, in
torch/autograd/variable.py
:is dangerous because
output
could contain anan
, and even thoughalpha
is being set to0
,nan * 0 = nan
and the result could contain anan
(I had an optim test failing because of anan
originating this way).I haven't done an exhaustive search, so there may be other places in the code that could have this issue.
The text was updated successfully, but these errors were encountered: