Using torch.overwrite.tensor.contents to overwrite input argument fails at runtime #17316

aviator19941 · 2024-05-08T21:26:25Z

What happened?

The lowering comes from when we use index_copy_ to try to update an input argument's values. When trying to run the vmfb after compiling index_copy_repro.mlir, I get this error:

EXEC @test_index_copy
iree/runtime/src/iree/hal/command_buffer_validation.c:363: INVALID_ARGUMENT; source and target ranges overlap within the same buffer; while invoking native function hal.command_buffer.copy_buffer; while calling import; 
[ 2]   native hal.command_buffer.copy_buffer:0 -
[ 1] bytecode module.test_index_copy$async:1142 ../batch_llama_v4.mlir:2:3
[ 0] bytecode module.test_index_copy:66 ../batch_llama_v4.mlir:2:3; invoking function 'test_index_copy'

Steps to reproduce your issue

Compile with:
../iree-build/tools/iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx940 --iree-opt-const-eval=false --iree-rocm-bc-dir=/opt/rocm/amdgcn/bitcode --iree-opt-strip-assertions=true ../index_copy_repro.mlir -o index_copy_repro.vmfb
Run with:
../iree-build/tools/iree-run-module --module=llama_v4.vmfb --device=rocm --function=test_index_copy --input=8192x16x8x128xf32 --input=4xi64 --input=4x16x8x128xf32 --output=@output.npy EXEC @test_index_copy
Expected error output:

EXEC @test_index_copy
iree/runtime/src/iree/hal/command_buffer_validation.c:363: INVALID_ARGUMENT; source and target ranges overlap within the same buffer; while invoking native function hal.command_buffer.copy_buffer; while calling import; 
[ 2]   native hal.command_buffer.copy_buffer:0 -
[ 1] bytecode module.test_index_copy$async:1142 ../batch_llama_v4.mlir:2:3
[ 0] bytecode module.test_index_copy:66 ../batch_llama_v4.mlir:2:3; invoking function 'test_index_copy'

What component(s) does this issue relate to?

Runtime

Version information

355f56b

Additional context

No response

The text was updated successfully, but these errors were encountered:

benvanik · 2024-05-08T21:28:30Z

I don't think this is going to work - we have memcpy semantics and such an operation requires memmove. You must ensure when using in-place operations on in-place calls that you aren't trying to do a memmove.

ScottTodd · 2024-05-08T21:31:12Z

Oh, I might be hitting the same issue on nod-ai/SHARK-Platform#22 (comment) (among other things)

benvanik · 2024-05-08T21:38:57Z

Probably something that's going to need to be identified/fixed in the frontend - we can't really support memmove in general and need to ensure we aren't generating programs that require it. Not only are there no memmove DMA primitives in hardware (there's no cuMemmove, or vkCmdMoveBuffer, etc) but ensuring all dispatches that we generate have memmove semantics when operating in place is not possible.

ScottTodd · 2024-05-08T21:40:04Z

I only see one of these ops in the program I'm looking at. Trying to find where it's coming from... maybe this: https://github.com/llvm/torch-mlir/blob/ec6d7aa5d28f110aa5b893e16e502e6198988801/python/torch_mlir/extras/fx_importer.py#L1111-L1118 ?

benvanik · 2024-05-08T21:48:49Z

Probably - something @stellaraccident may have a pointer to/thoughts about - I'm not sure what the right solution is at that level. We can't do much in IREE as the only safe behavior is to silently insert copies for any externally-provided buffer and that defeats the purpose of the in-place ops. I'm not sure if analysis at the torch level could insert the copies, warn/error if they're required, or what :/

(this is one of the reasons we default to not doing in-place operations - they've got footguns :)

stellaraccident · 2024-05-08T23:05:14Z

I think my eyes aren't calibrated right to see where this is becoming a move-like thing. Probably need some help figuring out how to structure it.

benvanik · 2024-05-08T23:18:37Z

I don't know torch stuff (vtensor? what?), but maybe it's whatever to_vtensor is?
index_put ties operand 0 to its result so %0 is %2, assuming to_vtensor is not a forced copy %0 is %arg0, and the final overwrite aliases %2 to %arg0. so %arg0 == %0 == %2.

If to_vtensor must remain a clone and doesn't end up as a flow.tensor.clone then the above will happen. Even if it does become a clone maybe we're dropping it later and that's something we could fix locally but will be trickier in more complex programs to prove. A print-after-all would be useful.

stellaraccident · 2024-05-09T00:11:05Z

Chatting with Ben, the repro is not strictly legal (it is returning a value that is in-placed). While we could support that form, it is presently a limitation. Also, I think it is a testing artifact. We should work on this case:

module @module {
  func.func @test_index_copy(%arg0: !torch.tensor<[8192,16,8,128],f32>, %arg1: !torch.vtensor<[4],si64>, %arg2: !torch.vtensor<[4,16,8,128],f32>) {
    %0 = torch.copy.to_vtensor %arg0 : !torch.vtensor<[8192,16,8,128],f32>
    %1 = torch.prim.ListConstruct %arg1 : (!torch.vtensor<[4],si64>) -> !torch.list<optional<vtensor>>
    %false = torch.constant.bool false
    %2 = torch.aten.index_put %0, %1, %arg2, %false : !torch.vtensor<[8192,16,8,128],f32>, !torch.list<optional<vtensor>>, !torch.vtensor<[4,16,8,128],f32>, !torch.bool -> !torch.vtensor<[8192,16,8,128],f32>
    torch.overwrite.tensor.contents %2 overwrites %arg0 : !torch.vtensor<[8192,16,8,128],f32>, !torch.tensor<[8192,16,8,128],f32>
    return
  }
}

benvanik · 2024-05-09T00:54:26Z

notes to tomorrow me:

remove into from hal.tensor.export
add a hal.tensor.alias (or whatever) op: %tied = hal.tensor.alias %operand, %storage : tensor<..>, !hal.buffer_view -> %operand
change torch stuff to use that, drop optimization_barrier and hal.tensor.export
ensure emplace allocations uses the alias
error checking if two logical tensors (through tied ops) have an alias defined

This fixes a design issue in the original `hal.tensor.export` optional storage feature that would lead to the export happening after any `hal.tensor.barrier` ops that may have been used on the source tensor. The new op is intended to be inserted prior to the barriers and can also be inserted elsewhere (not just at ABI boundaries). Minor improvements were required to folding of `stream.async.update` in order to ensure the aliased buffers are used in cases where barriers are present between producers and the alias ops consuming the values. #17135 made the folder too conservative and would result in all in-place operations of external values getting extra copies. Fixes #17316.

…7339) This fixes a design issue in the original `hal.tensor.export` optional storage feature that would lead to the export happening after any `hal.tensor.barrier` ops that may have been used on the source tensor. The new op is intended to be inserted prior to the barriers and can also be inserted elsewhere (not just at ABI boundaries). Minor improvements were required to folding of `stream.async.update` in order to ensure the aliased buffers are used in cases where barriers are present between producers and the alias ops consuming the values. iree-org#17135 made the folder too conservative and would result in all in-place operations of external values getting extra copies. Fixes iree-org#17316.

…7339) This fixes a design issue in the original `hal.tensor.export` optional storage feature that would lead to the export happening after any `hal.tensor.barrier` ops that may have been used on the source tensor. The new op is intended to be inserted prior to the barriers and can also be inserted elsewhere (not just at ABI boundaries). Minor improvements were required to folding of `stream.async.update` in order to ensure the aliased buffers are used in cases where barriers are present between producers and the alias ops consuming the values. iree-org#17135 made the folder too conservative and would result in all in-place operations of external values getting extra copies. Fixes iree-org#17316. Signed-off-by: Lubo Litchev <lubol@google.com>

aviator19941 added the bug 🐞 Something isn't working label May 8, 2024

ScottTodd added the integrations/pytorch PyTorch integration work label May 8, 2024

benvanik mentioned this issue May 10, 2024

Replacing hal.tensor.export storage for hal.tensor.alias. #17339

Merged

benvanik closed this as completed in #17339 May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using torch.overwrite.tensor.contents to overwrite input argument fails at runtime #17316

Using torch.overwrite.tensor.contents to overwrite input argument fails at runtime #17316

aviator19941 commented May 8, 2024

benvanik commented May 8, 2024

ScottTodd commented May 8, 2024

benvanik commented May 8, 2024

ScottTodd commented May 8, 2024

benvanik commented May 8, 2024 •

edited

Loading

stellaraccident commented May 8, 2024

benvanik commented May 8, 2024

stellaraccident commented May 9, 2024

benvanik commented May 9, 2024

Using torch.overwrite.tensor.contents to overwrite input argument fails at runtime #17316

Using torch.overwrite.tensor.contents to overwrite input argument fails at runtime #17316

Comments

aviator19941 commented May 8, 2024

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

benvanik commented May 8, 2024

ScottTodd commented May 8, 2024

benvanik commented May 8, 2024

ScottTodd commented May 8, 2024

benvanik commented May 8, 2024 • edited Loading

stellaraccident commented May 8, 2024

benvanik commented May 8, 2024

stellaraccident commented May 9, 2024

benvanik commented May 9, 2024

benvanik commented May 8, 2024 •

edited

Loading