Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stack emptiness checks inside interpreter.cpp #94298

Closed

Conversation

m4drat
Copy link
Contributor

@m4drat m4drat commented Feb 7, 2023

Hi!

I've been fuzzing different pytorch modules, and found a few crashes inside one of them.

Specifically, I'm talking about a module for interpreting the JIT code and a function called InterpreterState::run(). Running this function with provided crash file results in a crash, which occurs while calling dim() on a stack with 0 elements (line-686). The crash itself occurs later, when std::move is called with incorrect value of type IValue.

The second crash is similar and occurs on line 328, where reg(inst.X + i - 1) = pop(stack); is executed. The error here is the same, Stack stack might not contain enough elements.

The third crash occurs on line 681. The problem here is the same as for previous crashes. There are not enough elements in the stack.

In addition to these places, there are many others (in the same function) where border checking is also missing. I am not sure what is the best way to fix these problems, however I suggest adding a boundary check inside each of these case statement.

All tests were performed on this pytorch version: abc54f93145830b502400faa92bec86e05422fbd

How to reproduce

  1. To reproduce the crash, use provided docker: Dockerfile

  2. Build the container: docker build -t oss-sydr-fuzz-pytorch-reproduce .

  3. Copy these crash files to the current directory:

  4. Run the container: docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash

  5. And execute the binary: /jit_differential_fuzz /homedir/crash-4f18c5128c9a5a94343fcbbd543d7d6b02964471

After execution completes you will see this stacktrace:

=36==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060001657f8 at pc 0x00000060bc91 bp 0x7fff00b33380 sp 0x7fff00b33378
READ of size 4 at 0x6060001657f8 thread T0
    #0 0x60bc90 in c10::IValue::IValue(c10::IValue&&) /pytorch_fuzz/torch/include/ATen/core/ivalue.h:214:43
    #1 0xc20e7cd in torch::jit::pop(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/aten/src/ATen/core/stack.h:102:12
    #2 0xc20e7cd in torch::jit::dim(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/mobile/promoted_prim_ops.cpp:119:20
    #3 0xc893060 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:686:13
    #4 0xc85c47b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:1010:9
    #5 0x600598 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential_fuzz.cc:66:38
    #6 0x601d99 in LLVMFuzzerTestOneInput /jit_differential_fuzz.cc:107:25
    #7 0x52ccf1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #8 0x516c0c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #9 0x51c95b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #10 0x545ef2 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #11 0x7f9ec069a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #12 0x51152d in _start (/jit_differential_fuzz+0x51152d)

0x6060001657f8 is located 8 bytes to the left of 64-byte region [0x606000165800,0x606000165840)
allocated by thread T0 here:
    #0 0x5fd42d in operator new(unsigned long) /llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0xa16ab5 in void std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_insert<c10::IValue&>(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33
    #2 0xa168f1 in c10::IValue& std::vector<c10::IValue, std::allocator<c10::IValue> >::emplace_back<c10::IValue&>(c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:121:4
    #3 0xc89b53c in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:344:19
    #4 0xc85c47b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:1010:9
    #5 0x600598 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential_fuzz.cc:66:38
    #6 0x601d99 in LLVMFuzzerTestOneInput /jit_differential_fuzz.cc:107:25
    #7 0x52ccf1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #8 0x516c0c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #9 0x51c95b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #10 0x545ef2 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #11 0x7f9ec069a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)

SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch_fuzz/torch/include/ATen/core/ivalue.h:214:43 in c10::IValue::IValue(c10::IValue&&)
Shadow bytes around the buggy address:
  0x0c0c80024aa0: fd fd fd fd fd fd fd fa fa fa fa fa 00 00 00 00
  0x0c0c80024ab0: 00 00 00 fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c80024ac0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c80024ad0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c80024ae0: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
=>0x0c0c80024af0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa[fa]
  0x0c0c80024b00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
  0x0c0c80024b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==36==ABORTING
  1. Executing the remaining crashes gives similar crash reports

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 7, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94298

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 535cc01:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Feb 7, 2023
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 8, 2023
Copy link
Contributor

@davidberard98 davidberard98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpreter.cpp is sort of a hot path; can we guard these with #ifndef NDEBUG ? you can see other examples in this file.

also, if possible it would be great if you can do some simple benchmarks (before/after this change) and add results into the PR. specify whether your pytorch build is built with debug or not (debug build can be turned on/off by DEBUG=1/0 environment variable, default is for it to be turned off).

example test could be something like this (untested code, this is the general idea):

import torch
from time import time

x = torch.rand((1))
def fn(x):
  for i in range(10000):
    x = x+i
  return x
fn_t = torch.jit.trace(fn, x)
begin = time()
fn_t(x)
finish = time()
print(f"{finish-begin} ms")

@m4drat m4drat force-pushed the interpreter-stack-emptiness-checks branch from 9800aac to 061b4b7 Compare February 9, 2023 13:49
@m4drat
Copy link
Contributor Author

m4drat commented Feb 9, 2023

Given this exceptionally coarse benchmark, which essentially executes dim opcode 100'000 times:

import torch

graph = torch.parse_ir(
    """
graph(%x: Tensor):
  %type: int = prim::Constant[value=1]()
{}
  %ret: float[] = prim::tolist(%x, %dim, %type)
  return (%ret)
""".format(
        "  %dim: int = aten::dim(%x)\n" * 100_000
    )
)

x = torch.randn(4)
res = torch._C._jit_interpret_graph(graph, (x,))
print(res)

I got the following results:

Debug (with checks)

$ hyperfine -i "python3 ./bench.py"
Benchmark 1: python3 ./bench.py
  Time (mean ± σ):      9.508 s ±  0.087 s    [User: 9.305 s, System: 0.180 s]
  Range (min … max):    9.382 s …  9.655 s    10 runs

Debug (without checks)

$ hyperfine -i "python3 ./bench.py"
Benchmark 1: python3 ./bench.py
  Time (mean ± σ):      9.489 s ±  0.062 s    [User: 9.278 s, System: 0.190 s]
  Range (min … max):    9.365 s …  9.577 s    10 runs

Based on these results, I can conclude that there is no perceptible slowdown*. The error value is greater than the time difference between executions.

*For debug builds. Adding the same checks in release can show different results. Also, the benchmark stress-tests specific code path, which also limits understanding of the big picture.

Furthermore, I think, that maybe it's a good idea to add TORCH_INTERNAL_ASSERT_DEBUG_ONLY for every potentially problematic case arm.

@davidberard98
Copy link
Contributor

thanks for the tests! can you rebase onto the viable/strict branch so we can make sure all the current failures in the tests are all unrelated? You can also comment @ pytorchbot rebase -s (with no space) for the bot to do the rebase for you, if you prefer.

@m4drat
Copy link
Contributor Author

m4drat commented Feb 10, 2023

@pytorchbot rebase -s

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased interpreter-stack-emptiness-checks onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout interpreter-stack-emptiness-checks && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the interpreter-stack-emptiness-checks branch from 061b4b7 to 708db1b Compare February 10, 2023 11:07
@davidberard98
Copy link
Contributor

@pytorchbot merge

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 12, 2023

This PR needs to be approved by an authorized maintainer before merge.

@davidberard98
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 12, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12), trunk / macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12)

Details for Dev Infra team Raised by workflow job

@davidberard98
Copy link
Contributor

@pytorchbot rebase -s

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased interpreter-stack-emptiness-checks onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout interpreter-stack-emptiness-checks && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the interpreter-stack-emptiness-checks branch from 708db1b to 535cc01 Compare February 13, 2023 17:28
@davidberard98
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: jit release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants