tensor.new() can have nan's, and some pytorch code is thus unsafe #1347

ankitkv · 2017-04-24T20:12:59Z

tensor.new() does not initialize memory, so it could end up containing nan's. This could be unsafe in some cases.

For example, in torch/autograd/variable.py:

def mm(self, matrix):
    output = Variable(self.data.new(self.data.size(0), matrix.data.size(1)))
    return self._static_blas(Addmm, (output, 0, 1, self, matrix), False)

def bmm(self, batch):
    output = Variable(self.data.new(self.data.size(0), self.data.size(1),
                                    batch.data.size(2)))
    return self._static_blas(Baddbmm, (output, 0, 1, self, batch), False)

def mv(self, vector):
    output = Variable(self.data.new(self.data.size(0)))
    return self._static_blas(Addmv, (output, 0, 1, self, vector), False)

def ger(self, vector):
    output = Variable(self.data.new(self.data.size(0), vector.data.size(0)))
    return self._static_blas(Addr, (output, 0, 1, self, vector), False)

is dangerous because output could contain a nan, and even though alpha is being set to 0, nan * 0 = nan and the result could contain a nan (I had an optim test failing because of a nan originating this way).

I haven't done an exhaustive search, so there may be other places in the code that could have this issue.

The text was updated successfully, but these errors were encountered:

soumith · 2017-04-25T14:55:50Z

inside the blas side of things, we explicitly check for the special case of alpha=0 and zero the output.

if you see that it isn't true for a particular blas call, let me know and I'll fix it, but for mm and mv it should already be fixed.

ankitkv · 2017-04-25T15:15:36Z

I see. I had the issue with ger, can you check?

ankitkv · 2017-04-25T15:29:59Z

For instance, undoing the zero'ing of the new tensors in the four functions mentioned in the opening post causes tests to fail in #1306. The tests pass on one of the python versions, and on my machine a different test suit fails (which is where I hunted down the root). Now, it seems very possible that I introduced this vulnerability somehow, because this doesn't seem to happen without my PR. I'll look into it.

ngimel · 2017-04-25T16:00:32Z

For ger, if beta=0 tensors are explicitly zeroed on the CPU and GPU sides:
https://github.com/pytorch/pytorch/blob/master/torch/lib/THC/generic/THCTensorMathBlas.cu#L158
https://github.com/pytorch/pytorch/blob/master/torch/lib/TH/generic/THTensorMath.c#L1369

ankitkv · 2017-04-25T19:44:14Z

Ok, I confirmed that this is not a problem without my PR (I manually inserted nan's into output to test).

* Refactor War Sync Insertion Pass (pytorch#1339) * Remove kir::Expr::scope_ (pytorch#1341) * Fusion IR Refactor (pytorch#1343) * Refactor KIR Step 1 - Remove kir::Node (pytorch#1347) * Refactor KIR Step 2 - TMP IrUtils change (pytorch#1348) * Refactor KIR Step 3 - Remove kir::Expr and kir::Val. (pytorch#1349) * Refactor KIR Step 4 - Remove kir::Bool,Double,Int,NamedScalar. (pytorch#1350) * Refactor KIR Step 5 - Remove kir::IterDomain/TensorDomain/TensorView (pytorch#1351) * Refactor KIR Step 6 - Remove kir::UnaryOp/BinaryOp/TernaryOp/ReductionOp/WelfordOp/BroadcastOp. (pytorch#1352) * Refactor KIR Step 7 - Remove kir dispatch (pytorch#1353) * Refactor KIR Step 8 - Clean up lower_utils (pytorch#1355) * Refactor KIR Step 9 - lower_utils ir_utils::applyReplacements. (pytorch#1354) * Refactor KIR Step 10 - Remove kir_printer in favor of io_stream (pytorch#1356)

ankitkv mentioned this issue Apr 25, 2017

Added twice differentiation for a bunch of ops #1306

Closed

14 tasks

ankitkv closed this as completed Apr 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor.new() can have nan's, and some pytorch code is thus unsafe #1347

tensor.new() can have nan's, and some pytorch code is thus unsafe #1347

ankitkv commented Apr 24, 2017

soumith commented Apr 25, 2017

ankitkv commented Apr 25, 2017

ankitkv commented Apr 25, 2017 •

edited

ngimel commented Apr 25, 2017

ankitkv commented Apr 25, 2017

tensor.new() can have nan's, and some pytorch code is thus unsafe #1347

tensor.new() can have nan's, and some pytorch code is thus unsafe #1347

Comments

ankitkv commented Apr 24, 2017

soumith commented Apr 25, 2017

ankitkv commented Apr 25, 2017

ankitkv commented Apr 25, 2017 • edited

ngimel commented Apr 25, 2017

ankitkv commented Apr 25, 2017

ankitkv commented Apr 25, 2017 •

edited