New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[export] Initial deserialization v2 #102716
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102716
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit bf91008: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Reviewed By: zhxchen17 Differential Revision: D46362466 Pulled By: angelayi fbshipit-source-id: 1d3fc157a7a5c2e615dbcc7f0e87d76f2f4c43ed
ret["stack_trace"] = stack_trace | ||
# Need an explicit None check instead of walrus operator, because | ||
# module_fqn can be the empty string if the node belongs to the root. | ||
# The walrus operator returns False on an empty string :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I don't like the walrus operator anyway. Too many falsy values, including empty lists and dicts as well...
module_fqn = metadata.get("module_fqn") | ||
if module_fqn is not None: | ||
ret["module_fqn"] = module_fqn | ||
# TODO(angelayi) add nn_module_stack and source_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there existing tests for nn_module_stack
preservation? Not here, but for the future: might be sweet to run an existing set of tests through serde roundtrip "automatically," say by wrapping export
with export
+ serde
with a test-only decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I have an upcoming diff once this lands that checks for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should leave the symbolic int parts out of the scope of this diff.
For example, we try to deserialize sym ints and bools from the serialized buffer, but I don't think we have a clear plan to serialize them yet, and another example is we're trying to serialize "_operator" ops, but I would simply leave that out until we're more clear what we should do next for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clicked twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion offline I think we could move forward for now after addressing all the comments. Next time please consider separating one big PR touching different parts.
@pytorchbot merge -f "failures appear on master" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
v2 of #102126. mentally stacked on top of #102707
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10