Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch-compiled runtime does not work for any of our models #84

Closed
jeremyfowers opened this issue Jan 10, 2024 · 2 comments
Closed

torch-compiled runtime does not work for any of our models #84

jeremyfowers opened this issue Jan 10, 2024 · 2 comments
Labels
bug Something isn't working p0 Top priority

Comments

@jeremyfowers
Copy link
Collaborator

From my testing, it seems as though no models in our corpora currently work with our torch-compiled runtime.

I would certainly expect a basic model like resnet18 to work with torch-compiled.

Reproduction: turnkey benchmark timm/resnet18.py --runtime torch-compiled

Error message:

Models discovered during profiling:

resnet18.py:
        model (executed 1x)
                Model Type:     Pytorch (torch.nn.Module)
                Class:          ResNet (<class 'timm.models.resnet.ResNet'>)
                Location:       /home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/timm/models/_builder.py, line 394
                Parameters:     11,689,512 (44.59 MB)
                Input Shape:    'Positional Arg 1': (1, 3, 224, 224)
                Hash:           465f6391
                Build dir:      /home/jfowers/.cache/turnkey/resnet18_timm_465f6391
                Status:         Unknown turnkey error: from user code: File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/timm/models/resnet.py", line 579, in forward x = self.forward_head(x) File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/timm/models/resnet.py", line 575, in forward_head return x if pre_logits else self.fc(x) Set torch._dynamo.config.verbose=True for more information You can suppress this exception and fall back to eager by setting: torch._dynamo.config.suppress_errors = True
                Traceback (most recent call last):
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 1199, in run_node
                    return nnmodule(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
                    return forward_call(*args, **kwargs)
                  File "/home/jfowers/turnkeyml/src/turnkeyml/analyze/script.py", line 596, in forward_spy
                    return old_forward(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
                    return F.linear(input, self.weight, self.bias)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/utils/_stats.py", line 20, in wrapper
                    return fn(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_subclasses/fake_tensor.py", line 987, in __torch_dispatch__
                    return self.dispatch(func, types, args, kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_subclasses/fake_tensor.py", line 1066, in dispatch
                    args, kwargs = self.validate_and_convert_non_fake_tensors(
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_subclasses/fake_tensor.py", line 1220, in validate_and_convert_non_fake_tensors
                    return tree_map_only(
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/utils/_pytree.py", line 266, in tree_map_only
                    return tree_map(map_only(ty)(fn), pytree)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/utils/_pytree.py", line 196, in tree_map
                    return tree_unflatten([fn(i) for i in flat_args], spec)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/utils/_pytree.py", line 196, in <listcomp>
                    return tree_unflatten([fn(i) for i in flat_args], spec)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/utils/_pytree.py", line 247, in inner
                    return f(x)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_subclasses/fake_tensor.py", line 1212, in validate
                    raise Exception(
                Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.t.default(*(Parameter containing:
                tensor([[-0.0203,  0.0187,  0.0376,  ...,  0.0188, -0.0052,  0.0296],
                        [-0.0113,  0.0263, -0.0063,  ..., -0.0364,  0.0129,  0.0371],
                        [ 0.0429, -0.0385,  0.0036,  ...,  0.0110, -0.0319,  0.0118],
                        ...,
                        [-0.0279,  0.0096, -0.0364,  ...,  0.0286,  0.0325,  0.0440],
                        [ 0.0202,  0.0066, -0.0420,  ...,  0.0435,  0.0141,  0.0165],
                        [ 0.0136,  0.0318,  0.0145,  ...,  0.0280, -0.0356,  0.0351]],
                       requires_grad=True),), **{}) 

                The above exception was the direct cause of the following exception:

                Traceback (most recent call last):
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 1152, in get_fake_value
                    return wrap_fake_exception(
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 808, in wrap_fake_exception
                    return fn()
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 1153, in <lambda>
                    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 1206, in run_node
                    raise RuntimeError(
                RuntimeError: Failed running call_module self_fc(*(FakeTensor(FakeTensor(..., device='meta', size=(1, 512), grad_fn=<ViewBackward0>), cpu),), **{}):
                Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.t.default(*(Parameter containing:
                tensor([[-0.0203,  0.0187,  0.0376,  ...,  0.0188, -0.0052,  0.0296],
                        [-0.0113,  0.0263, -0.0063,  ..., -0.0364,  0.0129,  0.0371],
                        [ 0.0429, -0.0385,  0.0036,  ...,  0.0110, -0.0319,  0.0118],
                        ...,
                        [-0.0279,  0.0096, -0.0364,  ...,  0.0286,  0.0325,  0.0440],
                        [ 0.0202,  0.0066, -0.0420,  ...,  0.0435,  0.0141,  0.0165],
                        [ 0.0136,  0.0318,  0.0145,  ...,  0.0280, -0.0356,  0.0351]],
                       requires_grad=True),), **{}) 
                (scroll up for backtrace)

                The above exception was the direct cause of the following exception:

                Traceback (most recent call last):
                  File "/home/jfowers/turnkeyml/src/turnkeyml/analyze/script.py", line 325, in explore_invocation
                    perf = model_handle.benchmark()
                  File "/home/jfowers/turnkeyml/src/turnkeyml/run/torchrt/runtime.py", line 74, in benchmark
                    self.model(**self.inputs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
                    return forward_call(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
                    return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
                    return fn(*args, **kwargs)
                  File "/home/jfowers/turnkeyml/src/turnkeyml/analyze/script.py", line 587, in forward_spy
                    tracer = sys.getprofile()
                  File "/home/jfowers/turnkeyml/src/turnkeyml/analyze/script.py", line 596, in <graph break in forward_spy>
                    return old_forward(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 337, in catch_errors
                    return callback(frame, cache_size, hooks)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
                    result = inner_convert(frame, cache_size, hooks)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
                    return fn(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
                    return _compile(
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
                    r = func(*args, **kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
                    out_code = transform_code_object(code, transform)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
                    transformations(instructions, code_options)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
                    tracer.run()
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
                    super().run()
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
                    and self.step()
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
                    getattr(self, inst.opname)(inst)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 342, in wrapper
                    return inner_fn(self, inst)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 965, in CALL_FUNCTION
                    self.call_function(fn, args, {})
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 474, in call_function
                    self.push(fn.call_function(self, args, kwargs))
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/variables/functions.py", line 291, in call_function
                    return super().call_function(tx, args, kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/variables/functions.py", line 259, in call_function
                    return super().call_function(tx, args, kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/variables/functions.py", line 92, in call_function
                    return tx.inline_user_function_return(
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 510, in inline_user_function_return
                    result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1806, in inline_call
                    return cls.inline_call_(parent, func, args, kwargs)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 1862, in inline_call_
                    tracer.run()
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
                    and self.step()
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
                    getattr(self, inst.opname)(inst)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 342, in wrapper
                    return inner_fn(self, inst)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 965, in CALL_FUNCTION
                    self.call_function(fn, args, {})
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 474, in call_function
                    self.push(fn.call_function(self, args, kwargs))
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/variables/nn_module.py", line 203, in call_function
                    return wrap_fx_proxy(
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/variables/builder.py", line 754, in wrap_fx_proxy
                    return wrap_fx_proxy_cls(
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/variables/builder.py", line 789, in wrap_fx_proxy_cls
                    example_value = get_fake_value(proxy.node, tx)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 1173, in get_fake_value
                    raise TorchRuntimeError() from e
                torch._dynamo.exc.TorchRuntimeError: 

                from user code:
                   File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/timm/models/resnet.py", line 579, in forward
                    x = self.forward_head(x)
                  File "/home/jfowers/miniconda3/envs/tkml/lib/python3.8/site-packages/timm/models/resnet.py", line 575, in forward_head
                    return x if pre_logits else self.fc(x)

                Set torch._dynamo.config.verbose=True for more information


                You can suppress this exception and fall back to eager by setting:
                    torch._dynamo.config.suppress_errors = True

cc @danielholanda

@jeremyfowers jeremyfowers added the bug Something isn't working label Jan 10, 2024
@danielholanda danielholanda added the p0 Top priority label Jan 10, 2024
@danielholanda
Copy link
Collaborator

danielholanda commented Feb 6, 2024

turnkey benchmark timm/resnet18.py --runtime torch-compiled works for me on the latest version of main
@jeremyfowers can you please also try on your side to see if we can close this issue?

If this does not work on your system, please share which version of torch you are using.

@jeremyfowers
Copy link
Collaborator Author

I have root caused the problem: this bug only presents on certain versions of torch 2.x. What I have tried so far:

  • torch 2.0.0: fine
  • torch 2.0.1: doesn't work
  • torch 2.2.0: fine

If anyone runs into this problem in the future, lets suggest that they update their torch installation. Closing issue.

PS. this is yet more motivation for us to enforce a more strict environment for benchmarking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p0 Top priority
Projects
None yet
Development

No branches or pull requests

2 participants