[export] allow register dataclass as pytree node #106160

ydwu4 · 2023-07-27T20:08:40Z

In this pr, we allow users to register a customized flatten/unflatten/serialization/deserialization for a dataclass. We provide some default implementation for flatten/unflatten. We could implement a decorator based on it when needed.

Motivation:

HuggingFace and many internal models return dataclass output and torch.export wants to maintain the invariant that export result (i.e. exported_program) has the same calling convention and result as the original callable.

This is not supported in export yet: we cannot recover the original dataclass from flattened output produced by the underlying graph module (produced by dynamo and processed further by aot_export). We need to have a place to store the metadata of the dataclass so that we can re-construct it. To avoid adding hacky code in export and allow princinpled extensibility, we think extending pytree may be a good option.

Implementation:

@zou3519 mentioned https://github.com/pytorch/pytorch/pull/93214/files and jax-2371, which suggests that it's not a good idea to make dataclass a default pytree node but it could be good to provide a default implementation for dataclass. Since currently, this seems to be an export-only feature, we added this extension point in export.

We also add "return_none_fields" flag to control whether none fields are returned after flattening, which is expected to be False in produce_matching of dynamo.export.

Also added some tests.

pytorch-bot · 2023-07-27T20:08:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106160

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8eaea18:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ydwu4 · 2023-07-28T17:30:09Z

@pytorchbot merge

pytorchmergebot · 2023-07-28T17:33:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@zou3519

In this pr, we allow users to register a customized flatten/unflatten/serialization/deserialization for a dataclass. We provide some default implementation for flatten/unflatten. We could implement a decorator based on it when needed. ## Motivation: HuggingFace and many internal models return dataclass output and torch.export wants to maintain the invariant that export result (i.e. exported_program) has the same calling convention and result as the original callable. This is not supported in export yet: we cannot recover the original dataclass from flattened output produced by the underlying graph module (produced by dynamo and processed further by aot_export). We need to have a place to store the metadata of the dataclass so that we can re-construct it. To avoid adding hacky code in export and allow princinpled extensibility, we think extending pytree may be a good option. ## Implementation: @zou3519 mentioned https://github.com/pytorch/pytorch/pull/93214/files and [jax-2371](google/jax#2371 (comment)), which suggests that it's not a good idea to make dataclass a default pytree node but it could be good to provide a default implementation for dataclass. Since currently, this seems to be an export-only feature, we added this extension point in export. We also add "return_none_fields" flag to control whether none fields are returned after flattening, which is expected to be False in produce_matching of dynamo.export. Also added some tests. Pull Request resolved: pytorch#106160 Approved by: https://github.com/zhxchen17

Allow register dataclass as pytree node

8eaea18

ydwu4 requested review from tugsbayasgalan, gmagogsfm and angelayi and removed request for gmagogsfm July 27, 2023 20:08

github-actions bot added the module: export label Jul 27, 2023

ydwu4 requested a review from zhxchen17 July 27, 2023 20:08

ydwu4 added release notes: export ciflow/trunk Trigger trunk jobs on your pull request labels Jul 27, 2023

ydwu4 changed the title ~~Allow to register dataclass as pytree node~~ [export] add entrypoint in export to register dataclass as pytree node Jul 27, 2023

ydwu4 changed the title ~~[export] add entrypoint in export to register dataclass as pytree node~~ [export] allow register dataclass as pytree node Jul 27, 2023

zhxchen17 approved these changes Jul 27, 2023

View reviewed changes

zou3519 self-requested a review July 27, 2023 21:31

pytorchmergebot added the merging label Jul 28, 2023

pytorchmergebot added Merged and removed merging labels Jul 28, 2023

pytorchmergebot closed this in 5237ed5 Jul 28, 2023

XuehaiPan mentioned this pull request Nov 30, 2023

[pytree] support PyStructSequence types for Python pytree #113258

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[export] allow register dataclass as pytree node #106160

[export] allow register dataclass as pytree node #106160

ydwu4 commented Jul 27, 2023 •

edited

pytorch-bot bot commented Jul 27, 2023 •

edited

ydwu4 commented Jul 28, 2023

pytorchmergebot commented Jul 28, 2023

[export] allow register dataclass as pytree node #106160

[export] allow register dataclass as pytree node #106160

Conversation

ydwu4 commented Jul 27, 2023 • edited

Motivation:

Implementation:

pytorch-bot bot commented Jul 27, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106160

✅ No Failures

ydwu4 commented Jul 28, 2023

pytorchmergebot commented Jul 28, 2023

Merge started

ydwu4 commented Jul 27, 2023 •

edited

pytorch-bot bot commented Jul 27, 2023 •

edited