-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stack emptiness checks inside interpreter.cpp #94298
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94298
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 535cc01: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpreter.cpp is sort of a hot path; can we guard these with #ifndef NDEBUG
? you can see other examples in this file.
also, if possible it would be great if you can do some simple benchmarks (before/after this change) and add results into the PR. specify whether your pytorch build is built with debug or not (debug build can be turned on/off by DEBUG=1/0
environment variable, default is for it to be turned off).
example test could be something like this (untested code, this is the general idea):
import torch
from time import time
x = torch.rand((1))
def fn(x):
for i in range(10000):
x = x+i
return x
fn_t = torch.jit.trace(fn, x)
begin = time()
fn_t(x)
finish = time()
print(f"{finish-begin} ms")
9800aac
to
061b4b7
Compare
Given this exceptionally coarse benchmark, which essentially executes import torch
graph = torch.parse_ir(
"""
graph(%x: Tensor):
%type: int = prim::Constant[value=1]()
{}
%ret: float[] = prim::tolist(%x, %dim, %type)
return (%ret)
""".format(
" %dim: int = aten::dim(%x)\n" * 100_000
)
)
x = torch.randn(4)
res = torch._C._jit_interpret_graph(graph, (x,))
print(res) I got the following results: Debug (with checks)
Debug (without checks)
Based on these results, I can conclude that there is no perceptible slowdown*. The error value is greater than the time difference between executions.
Furthermore, I think, that maybe it's a good idea to add |
thanks for the tests! can you rebase onto the viable/strict branch so we can make sure all the current failures in the tests are all unrelated? You can also comment |
@pytorchbot rebase -s |
@pytorchbot successfully started a rebase job. Check the current status here |
Successfully rebased |
061b4b7
to
708db1b
Compare
@pytorchbot merge |
This PR needs to be approved by an authorized maintainer before merge. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12), trunk / macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12) Details for Dev Infra teamRaised by workflow job |
@pytorchbot rebase -s |
@pytorchbot successfully started a rebase job. Check the current status here |
Successfully rebased |
708db1b
to
535cc01
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Hi!
I've been fuzzing different pytorch modules, and found a few crashes inside one of them.
Specifically, I'm talking about a module for interpreting the JIT code and a function called
InterpreterState::run()
. Running this function with provided crash file results in a crash, which occurs while callingdim()
on astack
with 0 elements (line-686). The crash itself occurs later, when std::move is called with incorrect value of typeIValue
.The second crash is similar and occurs on line 328, where
reg(inst.X + i - 1) = pop(stack);
is executed. The error here is the same,Stack stack
might not contain enough elements.The third crash occurs on line 681. The problem here is the same as for previous crashes. There are not enough elements in the stack.
In addition to these places, there are many others (in the same function) where border checking is also missing. I am not sure what is the best way to fix these problems, however I suggest adding a boundary check inside each of these case statement.
All tests were performed on this pytorch version: abc54f93145830b502400faa92bec86e05422fbd
How to reproduce
To reproduce the crash, use provided docker: Dockerfile
Build the container:
docker build -t oss-sydr-fuzz-pytorch-reproduce .
Copy these crash files to the current directory:
Run the container:
docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash
And execute the binary:
/jit_differential_fuzz /homedir/crash-4f18c5128c9a5a94343fcbbd543d7d6b02964471
After execution completes you will see this stacktrace: