Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Missing out variants: {'aten::alias'} #1132

Closed
adonnini opened this issue Nov 2, 2023 · 19 comments
Closed

RuntimeError: Missing out variants: {'aten::alias'} #1132

adonnini opened this issue Nov 2, 2023 · 19 comments
Assignees
Labels
need-user-input The issue needs more information from the reporter before moving forward triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module wontfix This will not be worked on

Comments

@adonnini
Copy link

adonnini commented Nov 2, 2023

I was able to have exir.capture run the trace of my model (I think). However, now the code fails with the error listed below. Could yo please take a look and let me know what you think I am doing wrong and what I should do next?
Thanks

<executorch.exir.program._program.ExirExportedProgram object at 0x7f59c4f14f40>
  0%|                                                                                                       | 0/25 [00:48<?, ?it/s]
Traceback (most recent call last):
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/fx/passes/infra/pass_manager.py", line 270, in __call__
    res = fn(module)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/fx/passes/infra/pass_base.py", line 41, in __call__
    self.ensures(graph_module)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/passes/__init__.py", line 311, in ensures
    raise RuntimeError(f"Missing out variants: {self.missing_out_vars}")
RuntimeError: Missing out variants: {'aten::alias'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/train.py", line 318, in <module>
    open("tfmodel.pte", "wb").write(exir.capture(m, (enc_input, dec_input, dec_source_mask, dec_target_mask))
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/program/_program.py", line 181, in to_executorch
    new_prog = ep._transform(*edge_to_executorch_passes(config))
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/export/exported_program.py", line 569, in _transform
    res = pm(self.graph_module)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/fx/passes/infra/pass_manager.py", line 296, in __call__
    raise Exception(msg) from e
Exception: An error occurred when running the 'ToOutVarPass' pass after the following passes: ['SpecPropPass', 'EdgeToBackendOpsPass', 'RemoveAssertAsyncPass', 'HintBasedSymShapeEvalPass']
@JacobSzwejbka
Copy link
Contributor

Hmm alias shouldnt be appearing only alias_copy. cc @SS-JIA to take a look

@mergennachin
Copy link
Contributor

@adonnini

Do you have an example model that we can reproduce this issue on our end?

@adonnini
Copy link
Author

adonnini commented Nov 3, 2023

@mergennachin
What do you need exactly? the source code for the model I am want to run on Android devices?
Thanks

@larryliu0820
Copy link
Contributor

@adonnini can you print out the program? I'm thinking maybe we should remove this node, just want to verify if it's a noop.

@adonnini
Copy link
Author

adonnini commented Nov 6, 2023

I thought it would be easier and give you the information you are seeking if I sent you the link to the github repository I got the model from

https://github.com/sharonrichushaji/trajectory-prediction-transformers/tree/master

I added the executorch code to train.py after torch.save(

Please let me know if you need anything else

@adonnini
Copy link
Author

adonnini commented Nov 6, 2023

@mergennachin

I tried the following code:

               print(exir.capture(m, (enc_input, dec_input, dec_source_mask, dec_target_mask)).to_edge())

            pre_autograd_aten_dialect = capture_pre_autograd_graph(m, (enc_input, dec_input, dec_source_mask, dec_target_mask))
        aten_dialect: ExportedProgram = export(pre_autograd_aten_dialect, (enc_input, dec_input, dec_source_mask, dec_target_mask))
        edge_program: exir.EdgeProgramManager = exir.to_edge(aten_dialect)
        executorch_program: exir.ExecutorchProgramManager = edge_program.to_executorch(
        ExecutorchBackendConfig(
        )
        )

It failed with the following error:

<executorch.exir.program._program.ExirExportedProgram object at 0x7f5c823c5b20>
/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/utils/_pytree.py:590: UserWarning: pytree_to_str is deprecated. Please use treespec_dumps
  warnings.warn("pytree_to_str is deprecated. Please use treespec_dumps")
  0%|                                                    | 0/25 [01:17<?, ?it/s]
Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/train.py", line 326, in <module>
    executorch_program: exir.ExecutorchProgramManager = edge_program.to_executorch(
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/program/_program.py", line 787, in to_executorch
    return ExecutorchProgramManager(
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/program/_program.py", line 843, in __init__
    self._buffer: bytes = _serialize_pte_binary(
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/_serialize/_program.py", line 459, in serialize_pte_binary
    result: _FlatbufferResult = _program_json_to_flatbuffer(
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/_serialize/_flatbuffer.py", line 281, in _program_json_to_flatbuffer
    _flatc_compile(temp_dir, schema_info.root_path, json_path)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/_serialize/_flatbuffer.py", line 205, in _flatc_compile
    _run_flatc(
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/_serialize/_flatbuffer.py", line 191, in _run_flatc
    subprocess.run([flatc_path] + list(args), check=True)
  File "/home/adonnini1/anaconda3/lib/python3.9/subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/home/adonnini1/anaconda3/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/adonnini1/anaconda3/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'flatc'

line 326 in train.py is

        executorch_program: exir.ExecutorchProgramManager = edge_program.to_executorch(
        ExecutorchBackendConfig(
        )
        )

@larryliu0820
Copy link
Contributor

re flatc: if you run bash build/install_flatc.sh it should fix the issue

@larryliu0820
Copy link
Contributor

Looking at your code it seems it should be

        pre_autograd_aten_dialect = capture_pre_autograd_graph(model_loaded, (enc_input, dec_input, dec_source_mask, dec_target_mask))
        aten_dialect: ExportedProgram = export(pre_autograd_aten_dialect, (enc_input, dec_input, dec_source_mask, dec_target_mask))
        edge_program: exir.EdgeProgramManager = exir.to_edge(aten_dialect)
        executorch_program: exir.ExecutorchProgramManager = edge_program.to_executorch()

Notice that you should call model_loaded.eval() before running into this code.

@larryliu0820
Copy link
Contributor

I added the executorch code to train.py after torch.save(

BTW I'm not able to follow your instruction to run train.py. It is complaining the test dataset is missing from datasets/raw/test.

@kimishpatel
Copy link
Contributor

assinging to you @larryliu0820

@adonnini
Copy link
Author

adonnini commented Nov 6, 2023

Answering in order of occurrence:

  1. @larryliu0820
    Thanks. I should have taken care of the flatc issue on my own

  2. @larryliu0820 this code:

        pre_autograd_aten_dialect = capture_pre_autograd_graph(m, (enc_input, dec_input, dec_source_mask, dec_target_mask))
        aten_dialect: ExportedProgram = export(pre_autograd_aten_dialect, (enc_input, dec_input, dec_source_mask, dec_target_mask))
        edge_program: exir.EdgeProgramManager = exir.to_edge(aten_dialect)
        executorch_program: exir.ExecutorchProgramManager = edge_program.to_executorch(
        ExecutorchBackendConfig(
        )
        )

        with open("/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/models/tfmodel.pte", "wb") as file:
            file.write(executorch_program.buffer)

seems to work now. It produces a .pte file of around 200MB

please note that m is an instance of my model (i.e. I called the constructor)

  1. @larryliu0820 I am not sure why you are not able to run the model. Sorry for asking the obvious. Did you follow the instructions in
    https://github.com/sharonrichushaji/trajectory-prediction-transformers/tree/master#running-the-training-and-evaluation-loop
    When I first attempted to run the model I followed the instruction in the readme.md page

@adonnini
Copy link
Author

adonnini commented Nov 6, 2023

Update on code execution. After running successfully for four epochs, the execution failed with the error listed below.

Please note that the line numbers of model.py and train.py listed in the traceback do not correspond to the line numbers in the model on github as I made some small changes to the code.

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/tracer.py", line 667, in dynamo_trace
    return torchdynamo.export(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 1213, in inner
    result_traced = opt_f(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1528, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 401, in _fn
    return fn(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1528, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 549, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 142, in _fn
    return fn(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 384, in _convert_frame_assert
    return _compile(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 570, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 221, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 492, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
    transformations(instructions, code_options)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 462, in transform
    tracer.run()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2107, in run
    super().run()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 747, in run
    and self.step()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 710, in step
    getattr(self, inst.opname)(inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 405, in wrapper
    return inner_fn(self, inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1143, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 582, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 307, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 261, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 618, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2234, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2358, in inline_call_
    tracer.run()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 747, in run
    and self.step()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 710, in step
    getattr(self, inst.opname)(inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 405, in wrapper
    return inner_fn(self, inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1143, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 582, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 307, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 261, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 618, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2234, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2358, in inline_call_
    tracer.run()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 747, in run
    and self.step()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 710, in step
    getattr(self, inst.opname)(inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 405, in wrapper
    return inner_fn(self, inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1143, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 582, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 307, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 261, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 618, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2234, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2358, in inline_call_
    tracer.run()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 747, in run
    and self.step()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 710, in step
    getattr(self, inst.opname)(inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 405, in wrapper
    return inner_fn(self, inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1143, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 582, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 261, in call_function
    return super().call_function(tx, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
    return tx.inline_user_function_return(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 618, in inline_user_function_return
    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2234, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2358, in inline_call_
    tracer.run()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 747, in run
    and self.step()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 710, in step
    getattr(self, inst.opname)(inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 405, in wrapper
    return inner_fn(self, inst)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1143, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 582, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/misc.py", line 648, in call_function
    return self.obj.call_method(tx, self.name, args, kwargs).add_options(self)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/tensor.py", line 703, in call_method
    return wrap_fx_proxy(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/builder.py", line 1304, in wrap_fx_proxy
    return wrap_fx_proxy_cls(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/variables/builder.py", line 1391, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 1422, in get_fake_value
    raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 1383, in get_fake_value
    return wrap_fake_exception(
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 952, in wrap_fake_exception
    return fn()
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 1384, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 1483, in run_node
    raise RuntimeError(fn_str + str(e)).with_traceback(e.__traceback__) from e
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 1464, in run_node
    return getattr(args[0], node.target)(*args[1:], **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/utils/_stats.py", line 20, in wrapper
    return fn(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_subclasses/fake_tensor.py", line 1323, in __torch_dispatch__
    return self.dispatch(func, types, args, kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_subclasses/fake_tensor.py", line 1621, in dispatch
    r = func(*args, **kwargs)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_ops.py", line 516, in __call__
    return self._op(*args, **kwargs or {})
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_meta_registrations.py", line 3585, in meta_masked_fill_
    check_inplace_broadcast(self.shape, mask.shape)
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_meta_registrations.py", line 68, in check_inplace_broadcast
    broadcasted_shape = tuple(_broadcast_shapes(self_shape, *args_shape))
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/torch/_refs/__init__.py", line 398, in _broadcast_shapes
    raise RuntimeError(
torch._dynamo.exc.TorchRuntimeError: Failed running call_method masked_fill_(*(FakeTensor(..., size=(27, 8, 12, 12), grad_fn=<DivBackward0>), FakeTensor(..., size=(24, 1, 1, 1), dtype=torch.bool), -1000000000.0), **{}):
Attempting to broadcast a dimension of length 24 at -4! Mismatching argument at index 1 had torch.Size([24, 1, 1, 1]); but expected shape should be broadcastable to [27, 8, 12, 12]

from user code:
   File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/model.py", line 490, in forward
    decoder_output = self.decoder_block.forward(dec_embed, encoder_output, dec_source_mask, dec_target_mask)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/model.py", line 305, in forward
    x = layer.forward(x, enc_output, source_mask, target_mask)  # Shape = (B, N, C)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/model.py", line 253, in forward
    x = x + self.dropout(self.attn.forward(self.norm_attn(x), \
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/model.py", line 87, in forward
    attn_output = attention(Q, K, V, mask, self.dropout)  # Shape = (B, H, N, C//H)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/utils.py", line 53, in attention
    scores = scores.masked_fill_(mask == 0, -1e9)


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/train.py", line 321, in <module>
    print(exir.capture(m, (enc_input, dec_input, dec_source_mask, dec_target_mask)).to_edge())
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/capture/_capture.py", line 146, in capture
    graph_module, _ = dynamo_trace(
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/exir/tracer.py", line 686, in dynamo_trace
    raise InternalError(
executorch.exir.error.InternalError: torchdynamo internal error occured. Please see above stacktrace

@larryliu0820
Copy link
Contributor

Do you mind sharing your code?

@adonnini
Copy link
Author

adonnini commented Nov 7, 2023

Below you will find a link to the github repository with my code and dataset. A couple of points to note:

  1. Please do not make any changes to the files in folder dataset and its sub-folders
  2. Folder executorch is empty. It should be populated with a completely set up and initialized executorch. I did not attempt to upload its content from my system for obvious reasons (57k+ files)

https://github.com/adonnini/trajectory-prediction-transformers-masterContextQ/tree/main/trajectory-prediction-transformers-master

Please let me know if you have any questions or encounter any problems

@kimishpatel
Copy link
Contributor

@adonnini are you running training on exported model? If so, are the input sizes changing from epoch to epoch?

@larryliu0820
Copy link
Contributor

@adonnini it seems like you are trying to export after every training epoch. One suspect I have is that you may be using different input shapes in each epoch. Can you provide a minimum repro? For example, we would really appreciate it if you can give a code snippet that only contains the model and the input, and the code to export it.

@adonnini
Copy link
Author

adonnini commented Nov 7, 2023

@larryliu0820 did you try to run the code I sent you? If you did, did it fail as I reported?
You could easily extract what you are seeking from the code I sent you. Just look at the code inside the epoch loop.
What if the input shapes are different for each epoch?
a) If it is a problem, what is the suggested solution?
b) Why would it be a problem if input shapes differ, and why would execution fail only after the fourth time around the loop?

@mergennachin
Copy link
Contributor

mergennachin commented Nov 7, 2023

@adonnini

Okay before we go into specifics of this particular issue and debugging, let's step back a bit. Could you elaborate what kind of problem you are trying to solve and how executorch fits in your scenario?

It looks like you are using executorch for training, which is not the intended use. As of today, we don't support training. ExecuTorch is an inference engine for on-device deployment.

We expect developers to do training (either in eager mode or compiled mode). Once they have a trained model, we expect them to use torch.export and to_executorch to generate an ExecuTorch program artifact once, so that they can deploy inference for edge/embedded devices.

If you are trying to speed up the training, we recommend to use torch.compile instead.

@adonnini
Copy link
Author

adonnini commented Nov 7, 2023

@mergennachin Thanks for your help. I think I resolved my problem. I am all set for now. I will probably need assistance (questions and issue resolution) once I attempt to deploy for inference on Android devices.
Thanks

@mergennachin mergennachin added need-user-input The issue needs more information from the reporter before moving forward wontfix This will not be worked on triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-user-input The issue needs more information from the reporter before moving forward triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants