Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchscript] stop serializing default arguments #54613

Closed
suo opened this issue Mar 24, 2021 · 9 comments
Closed

[torchscript] stop serializing default arguments #54613

suo opened this issue Mar 24, 2021 · 9 comments
Assignees
Labels
high priority oncall: jit Add this issue/PR to JIT oncall triage queue triage review
Projects

Comments

@suo
Copy link
Member

suo commented Mar 24, 2021

Today, when serializing TorchScript code we make no distinction between arguments that were explicitly specified by the user, and arguments that can from defaults. This leads us to "burn in" default arguments.

This behavior causes forward compatibility problems when a new argument with a default value is added to an operator. Consider:

# version 1
aten.foo(Tensor arg1)

becoming

# version 2
aten.foo(Tensor arg1, String new_arg="my_default_value")

The serialized code at version 2 looks like:

aten.foo(my_tensor_input, "my_default_value")

so when you try to run it in a binary at version 1, you get a "too many arguments for schema" error.
One option is to not serialize the default value, so then the serialized code at version 2 looks like:

aten.foo(my_tensor_input)  # the schema matcher will insert new_arg="my_default_value"

and thus we retain compatibility.

This is a practically important issue, currently adding a new default value to a custom op will break PyPER due to forward compat issues: https://fburl.com/3fr2kjvn

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @gmagogsfm

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 24, 2021
@github-actions github-actions bot added this to Need triage in JIT Triage Mar 24, 2021
@suo
Copy link
Member Author

suo commented Mar 24, 2021

related SEV4: S226373

@gmagogsfm gmagogsfm self-assigned this Mar 24, 2021
@gmagogsfm gmagogsfm moved this from Need triage to In progress in JIT Triage Mar 24, 2021
@albanD
Copy link
Collaborator

albanD commented Mar 25, 2021

Is there any concern that the behavior of the underlying function will silently change? Because if the old runner does not support that argument, then running that code with an older version won't give the same result as running with the new version (which is the one expected by the user when serializing).

@dzhulgakov
Copy link
Collaborator

@albanD - if that's the case then whoever added the new argument in the first place has broken the backward compatibility already.

In the above example aten.foo(x, "my_default_value") should return exactly the same value as aten.foo(x) did in the previous version of the runtime. Otherwise just regular python eager code doing aten.foo(x) won't be backward compatible. If that's the case, we'd need to go through BC process with adapters and such. But that's not the common case.

The only case where there would be a different if we implement this issue if we at any point decide to change the default value of an argument. I'd argue it's a pretty rare situation and kind of a sketchy thing to do in any case because it might break some python code inadvertently.

cc @raziel fyi

@gmagogsfm
Copy link
Contributor

I think the right fix is to not include default values of arguments when user code doesn't specify it.

On top of the reasons @dzhulgakov mentioned, I think not saving default values is more aligned with PyTorch eager and future torch.package behavior, which both faithfully record exact Python code user writes. In this case, the user code is arguably forward-compatible, it is TorchScript's implementation that breaks originally forward-compatible user code.

Additionally, TorchScript is currently in a slightly strange state in terms of "providing hermetic packaging for PyTorch program". As we have seen in this case, TorchScript pull information (default value of argument) from aten op schema of version A , bakes it in IR, but attempts to execute the program with an aten library of version B. This is effectively only baking in part of aten library implementation while in theory we should either not baking in the default value or bake in default value + schema + aten kernels.

@ezyang
Copy link
Contributor

ezyang commented May 26, 2021

sooo anyone gonna work on this?

@gmagogsfm
Copy link
Contributor

Update:
On normal TS runtime side, this is already implemented.

The approach is to find values that are

  1. Generated by constant node
  2. Used as an argument to an operator invocation
  3. Its constant value matches the default value of that argument

During serialization, do not print this argument, instead relying on future compilation to fill up needed arguments according to schemas available.

On mobile side, this isn't enabled yet due to 1) needing backport feature because this changes breaks backward compatibility (WIP) 2) long deployment timeline on mobile devices

@gmagogsfm
Copy link
Contributor

Update:
This has been addressed for both server and mobile deployment. Closing.

JIT Triage automation moved this from In progress to Done Jun 23, 2021
@ezyang
Copy link
Contributor

ezyang commented Jun 25, 2021

@gmagogsfm could you please tag which PR fixed this for server and mobile?

@gmagogsfm
Copy link
Contributor

Somehow Github won't let me add PRs to "linked PR" column, so I will just list them here:
#57498
#57778
#56845
#58852
#59714

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority oncall: jit Add this issue/PR to JIT oncall triage queue triage review
Projects
JIT Triage
  
Done
Development

No branches or pull requests

6 participants