Add stack emptiness checks inside interpreter.cpp #94298

m4drat · 2023-02-07T10:44:36Z

Hi!

I've been fuzzing different pytorch modules, and found a few crashes inside one of them.

Specifically, I'm talking about a module for interpreting the JIT code and a function called InterpreterState::run(). Running this function with provided crash file results in a crash, which occurs while calling dim() on a stack with 0 elements (line-686). The crash itself occurs later, when std::move is called with incorrect value of type IValue.

The second crash is similar and occurs on line 328, where reg(inst.X + i - 1) = pop(stack); is executed. The error here is the same, Stack stack might not contain enough elements.

The third crash occurs on line 681. The problem here is the same as for previous crashes. There are not enough elements in the stack.

In addition to these places, there are many others (in the same function) where border checking is also missing. I am not sure what is the best way to fix these problems, however I suggest adding a boundary check inside each of these case statement.

All tests were performed on this pytorch version: abc54f93145830b502400faa92bec86e05422fbd

How to reproduce

To reproduce the crash, use provided docker: Dockerfile
Build the container: docker build -t oss-sydr-fuzz-pytorch-reproduce .
Copy these crash files to the current directory:
Run the container: docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash
And execute the binary: /jit_differential_fuzz /homedir/crash-4f18c5128c9a5a94343fcbbd543d7d6b02964471

After execution completes you will see this stacktrace:

=36==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060001657f8 at pc 0x00000060bc91 bp 0x7fff00b33380 sp 0x7fff00b33378
READ of size 4 at 0x6060001657f8 thread T0
    #0 0x60bc90 in c10::IValue::IValue(c10::IValue&&) /pytorch_fuzz/torch/include/ATen/core/ivalue.h:214:43
    #1 0xc20e7cd in torch::jit::pop(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/aten/src/ATen/core/stack.h:102:12
    #2 0xc20e7cd in torch::jit::dim(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/mobile/promoted_prim_ops.cpp:119:20
    #3 0xc893060 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:686:13
    #4 0xc85c47b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:1010:9
    #5 0x600598 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential_fuzz.cc:66:38
    #6 0x601d99 in LLVMFuzzerTestOneInput /jit_differential_fuzz.cc:107:25
    #7 0x52ccf1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #8 0x516c0c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #9 0x51c95b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #10 0x545ef2 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #11 0x7f9ec069a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #12 0x51152d in _start (/jit_differential_fuzz+0x51152d)

0x6060001657f8 is located 8 bytes to the left of 64-byte region [0x606000165800,0x606000165840)
allocated by thread T0 here:
    #0 0x5fd42d in operator new(unsigned long) /llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0xa16ab5 in void std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_insert<c10::IValue&>(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33
    #2 0xa168f1 in c10::IValue& std::vector<c10::IValue, std::allocator<c10::IValue> >::emplace_back<c10::IValue&>(c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:121:4
    #3 0xc89b53c in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:344:19
    #4 0xc85c47b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/jit/runtime/interpreter.cpp:1010:9
    #5 0x600598 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential_fuzz.cc:66:38
    #6 0x601d99 in LLVMFuzzerTestOneInput /jit_differential_fuzz.cc:107:25
    #7 0x52ccf1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #8 0x516c0c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #9 0x51c95b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #10 0x545ef2 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #11 0x7f9ec069a082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)

SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch_fuzz/torch/include/ATen/core/ivalue.h:214:43 in c10::IValue::IValue(c10::IValue&&)
Shadow bytes around the buggy address:
  0x0c0c80024aa0: fd fd fd fd fd fd fd fa fa fa fa fa 00 00 00 00
  0x0c0c80024ab0: 00 00 00 fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c80024ac0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c80024ad0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c80024ae0: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
=>0x0c0c80024af0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa[fa]
  0x0c0c80024b00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
  0x0c0c80024b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c80024b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==36==ABORTING

Executing the remaining crashes gives similar crash reports

pytorch-bot · 2023-02-07T10:44:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94298

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 535cc01:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

davidberard98

interpreter.cpp is sort of a hot path; can we guard these with #ifndef NDEBUG ? you can see other examples in this file.

also, if possible it would be great if you can do some simple benchmarks (before/after this change) and add results into the PR. specify whether your pytorch build is built with debug or not (debug build can be turned on/off by DEBUG=1/0 environment variable, default is for it to be turned off).

example test could be something like this (untested code, this is the general idea):

import torch
from time import time

x = torch.rand((1))
def fn(x):
  for i in range(10000):
    x = x+i
  return x
fn_t = torch.jit.trace(fn, x)
begin = time()
fn_t(x)
finish = time()
print(f"{finish-begin} ms")

m4drat · 2023-02-09T13:50:21Z

Given this exceptionally coarse benchmark, which essentially executes dim opcode 100'000 times:

import torch

graph = torch.parse_ir(
    """
graph(%x: Tensor):
  %type: int = prim::Constant[value=1]()
{}
  %ret: float[] = prim::tolist(%x, %dim, %type)
  return (%ret)
""".format(
        "  %dim: int = aten::dim(%x)\n" * 100_000
    )
)

x = torch.randn(4)
res = torch._C._jit_interpret_graph(graph, (x,))
print(res)

I got the following results:

Debug (with checks)

$ hyperfine -i "python3 ./bench.py"
Benchmark 1: python3 ./bench.py
  Time (mean ± σ):      9.508 s ±  0.087 s    [User: 9.305 s, System: 0.180 s]
  Range (min … max):    9.382 s …  9.655 s    10 runs

Debug (without checks)

$ hyperfine -i "python3 ./bench.py"
Benchmark 1: python3 ./bench.py
  Time (mean ± σ):      9.489 s ±  0.062 s    [User: 9.278 s, System: 0.190 s]
  Range (min … max):    9.365 s …  9.577 s    10 runs

Based on these results, I can conclude that there is no perceptible slowdown*. The error value is greater than the time difference between executions.

*For debug builds. Adding the same checks in release can show different results. Also, the benchmark stress-tests specific code path, which also limits understanding of the big picture.

Furthermore, I think, that maybe it's a good idea to add TORCH_INTERNAL_ASSERT_DEBUG_ONLY for every potentially problematic case arm.

davidberard98 · 2023-02-09T21:56:06Z

thanks for the tests! can you rebase onto the viable/strict branch so we can make sure all the current failures in the tests are all unrelated? You can also comment @ pytorchbot rebase -s (with no space) for the bot to do the rebase for you, if you prefer.

m4drat · 2023-02-10T11:04:49Z

@pytorchbot rebase -s

pytorchmergebot · 2023-02-10T11:06:54Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-02-10T11:07:04Z

Successfully rebased interpreter-stack-emptiness-checks onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout interpreter-stack-emptiness-checks && git pull --rebase)

davidberard98 · 2023-02-12T23:08:59Z

@pytorchbot merge

pytorch-bot · 2023-02-12T23:09:03Z

This PR needs to be approved by an authorized maintainer before merge.

davidberard98 · 2023-02-12T23:10:06Z

@pytorchbot merge

pytorchmergebot · 2023-02-12T23:14:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-02-13T00:04:51Z

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12), trunk / macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12)

Details for Dev Infra team

Raised by workflow job

davidberard98 · 2023-02-13T17:26:26Z

@pytorchbot rebase -s

pytorchmergebot · 2023-02-13T17:28:20Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-02-13T17:28:25Z

Successfully rebased interpreter-stack-emptiness-checks onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout interpreter-stack-emptiness-checks && git pull --rebase)

davidberard98 · 2023-02-13T17:29:27Z

@pytorchbot merge

pytorchmergebot · 2023-02-13T17:32:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the release notes: jit release notes category label Feb 7, 2023

pytorchbot added the open source label Feb 7, 2023

soulitzer requested a review from davidberard98 February 8, 2023 20:11

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 8, 2023

davidberard98 reviewed Feb 8, 2023

View reviewed changes

m4drat force-pushed the interpreter-stack-emptiness-checks branch from 9800aac to 061b4b7 Compare February 9, 2023 13:49

pytorchmergebot force-pushed the interpreter-stack-emptiness-checks branch from 061b4b7 to 708db1b Compare February 10, 2023 11:07

davidberard98 approved these changes Feb 12, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 12, 2023

Add stack emptiness checks

535cc01

pytorchmergebot force-pushed the interpreter-stack-emptiness-checks branch from 708db1b to 535cc01 Compare February 13, 2023 17:28

pytorchmergebot added the Merged label Feb 13, 2023

pytorchmergebot closed this in a6a433a Feb 13, 2023

Add stack emptiness checks inside interpreter.cpp #94298

Add stack emptiness checks inside interpreter.cpp #94298

Uh oh!

Conversation

m4drat commented Feb 7, 2023

How to reproduce

Uh oh!

pytorch-bot bot commented Feb 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94298

✅ No Failures

Uh oh!

davidberard98 left a comment

Choose a reason for hiding this comment

Uh oh!

m4drat commented Feb 9, 2023

Uh oh!

davidberard98 commented Feb 9, 2023

Uh oh!

m4drat commented Feb 10, 2023

Uh oh!

pytorchmergebot commented Feb 10, 2023

Uh oh!

pytorchmergebot commented Feb 10, 2023

Uh oh!

davidberard98 commented Feb 12, 2023

Uh oh!

pytorch-bot bot commented Feb 12, 2023

Uh oh!

davidberard98 commented Feb 12, 2023

Uh oh!

pytorchmergebot commented Feb 12, 2023

Merge started

Uh oh!

pytorchmergebot commented Feb 13, 2023

Merge failed

Uh oh!

davidberard98 commented Feb 13, 2023

Uh oh!

pytorchmergebot commented Feb 13, 2023

Uh oh!

pytorchmergebot commented Feb 13, 2023

Uh oh!

davidberard98 commented Feb 13, 2023

Uh oh!

pytorchmergebot commented Feb 13, 2023

Merge started

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 7, 2023 •

edited

Loading