Serialize pytree to string #102577

angelayi · 2023-05-30T22:00:10Z

list --> L(value1,value2)
tuple --> T(value1,value2)
dict --> D(key1:value1,key2:value2)
ordered dict --> O(key1:value1,key2:value2)
namedtuple --> N(type(key1, key2),value1,value2)
leaf --> *

Restrictions

serializing custom types is not supported
we only support serializing string keys in dictionaries

pytorch-bot · 2023-05-30T22:00:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102577

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6a75431:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zhxchen17 · 2023-05-30T23:18:31Z

Why we chose word "type" for N(type(key1, key2),value1,value2)?

angelayi · 2023-05-30T23:19:32Z

Why we chose word "type" for N(type(key1, key2),value1,value2)?

what's a better word.. 😅

zhxchen17 · 2023-05-31T01:00:18Z

wrong button pressed sorry lol

zhxchen17 · 2023-05-31T01:02:37Z

Why we chose word "type" for N(type(key1, key2),value1,value2)?

what's a better word.. 😅

sorry I think I misunderstood. feel free to ignore my previous comment

Serialization TODOs: - [ ] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] shape env - [ ] graph module metadata? [ghstack-poisoned]

avikchaudhuri

Why not include dataclass support as well, while we're at it? Better than NamedTuple.

Do we really need both dict and OrderedDict? Why?

Serialization TODOs: - [ ] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] shape env - [ ] graph module metadata? [ghstack-poisoned]

angelayi · 2023-06-01T06:32:03Z

Why not include dataclass support as well, while we're at it? Better than NamedTuple. Do we really need both dict and OrderedDict? Why?

dataclass isn't pytree-ed right now. I'm just choosing what to pytree based on the exising default types that are pytree-ed.

updates to #102708

v2 of #102577 Pull Request resolved: #102708 Approved by: https://github.com/avikchaudhuri

v2 of pytorch#102577 Pull Request resolved: pytorch#102708 Approved by: https://github.com/avikchaudhuri

Summary: v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Reviewed By: zhxchen17 Differential Revision: D46362466 Pulled By: angelayi fbshipit-source-id: 1d3fc157a7a5c2e615dbcc7f0e87d76f2f4c43ed

Summary: v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Reviewed By: zhxchen17 Differential Revision: D46362466 Pulled By: angelayi fbshipit-source-id: 033cb9a22d905d944e182dba3b191df4c52413c8

Summary: v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Reviewed By: zhxchen17 Differential Revision: D46362466 Pulled By: angelayi fbshipit-source-id: 22b0c38ddf3887e5966c0fe0b00c6984c30d98a9

Summary: v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Reviewed By: zhxchen17 Differential Revision: D46362466 Pulled By: angelayi fbshipit-source-id: 32766639106abc0c4cea03bd298254140e7f3a1a

Summary: v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Reviewed By: zhxchen17 Differential Revision: D46362466 Pulled By: angelayi fbshipit-source-id: 8e7d5cd4769bd6b4dcf64036dab43d54d7d4493a

Summary: v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Reviewed By: zhxchen17 Differential Revision: D46362466 Pulled By: angelayi fbshipit-source-id: 8627d9f783cea5af9c36b09f4216c7effc021593

v2 of #102125 because of git issues corresponding deserialization diff: #102716 Implementing serialization of the exported program to a python dataclass, and then from that dataclass to json. This is split into a couple of sections: - `serialize(ep: ep.ExportedProgram, opset_version: Dict[str, int]) -> Tuple[bytes, bytes]` -- takes an exported program object, a dictionary mapping opset namespaces to versions, and returns the serialized exported program in bytes, and separately the state dict serialized in bytes - `GraphModuleSerializer` class that serializes torch.fx.GraphModule to the schema.GraphModule dataclass - `ExportedProgramSerializer` class that serializes torch._export.exported_program.ExportedProgram to the schema.ExportedProgram dataclass Serialization TODOs: - [x] pytree spec: #102577 - [ ] higher order ops - [ ] node metadata (specifically nn_module_stack/source_fn) - [ ] constraints - [ ] graph module metadata The tests are not super comprehensive, but that's because I think it'll be better tested + easier to test once deserialization is implemented. Pull Request resolved: #102707 Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17

ezyang · 2023-07-26T03:04:59Z

Is there any design doc about why we picked this particular format for string serialization (and also why we hand-rolled a string parser?) Single letter signifiers means that downstream pytree implementers are significantly at risk of collisions (you can only support 26 distinct types under this parsing scheme). Additionally, lack of quoting means that dictionary keys with colons can confuse the parser. It almost nearly would have been better to just use JSON instead...

Fixes #102577 (comment) Serializing to json is more stable, and renamed the API: ``` # Takes in a treespec and returns the serialized treespec as a string. Also optionally takes in a protocol version number. def treespec_dumps(treespec: TreeSpec, protocol: Optional[int] = None) -> str: # Takes in a serialized treespec and outputs a TreeSpec def treespec_loads(data: str) -> TreeSpec: ``` If users want to register their own serialization format for a given pytree, they can go through the `_register_treespec_serializer` API which optionally takes in a `getstate` and `setstate` function. ``` _register_treespec_serializer(type_, *, getstate, setstate) # Takes in the context, and outputs a json-dumpable context def getstate(context: Context) -> DumpableContext: # Takes in a json-dumpable context, and reconstructs the original context def setstate(dumpable_context: DumpableContext) -> Context: ``` We will serialize to the following dataclass, and then json.dump this it to string. ``` class TreeSpec type: Optional[str] # a string name of the type. null for the case of a LeafSpec context: Optional[Any] # optional, a json dumpable format of the context children_specs: List[TreeSpec], } ``` If no getstate/setstate function is registered, we will by default serialize the context using `json.dumps/loads`. We will also serialize the type through `f"{typ.__module__}.{typ.__name__}"`. Pull Request resolved: #106116 Approved by: https://github.com/zou3519

init

66b4351

C

b2df383

angelayi requested a review from zhxchen17 May 30, 2023 22:01

angelayi added the release notes: export label May 30, 2023

namedtuple+dict

6a75431

angelayi marked this pull request as ready for review May 30, 2023 23:16

angelayi mentioned this pull request May 30, 2023

[export] Initial serialization #102125

Closed

5 tasks

zhxchen17 closed this May 31, 2023

zhxchen17 reopened this May 31, 2023

avikchaudhuri approved these changes May 31, 2023

View reviewed changes

angelayi mentioned this pull request Jun 1, 2023

Serialize pytree to string v2 #102708

Closed

angelayi closed this Jun 1, 2023

pytorchmergebot pushed a commit that referenced this pull request Jun 1, 2023

Serialize pytree to string v2 (#102708)

bd0a4e2

v2 of #102577 Pull Request resolved: #102708 Approved by: https://github.com/avikchaudhuri

angelayi mentioned this pull request Jun 2, 2023

[export] Initial serialization v2 #102707

Closed

5 tasks

alimoezzi pushed a commit to alimoezzi/pytorch that referenced this pull request Jun 3, 2023

Serialize pytree to string v2 (pytorch#102708)

f038732

v2 of pytorch#102577 Pull Request resolved: pytorch#102708 Approved by: https://github.com/avikchaudhuri

angelayi mentioned this pull request Jul 27, 2023

Serialize pytree to json string #106116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize pytree to string #102577

Serialize pytree to string #102577

angelayi commented May 30, 2023 •

edited

pytorch-bot bot commented May 30, 2023 •

edited

zhxchen17 commented May 30, 2023

angelayi commented May 30, 2023

zhxchen17 commented May 31, 2023

zhxchen17 commented May 31, 2023

avikchaudhuri left a comment

angelayi commented Jun 1, 2023

ezyang commented Jul 26, 2023

Serialize pytree to string #102577

Serialize pytree to string #102577

Conversation

angelayi commented May 30, 2023 • edited

pytorch-bot bot commented May 30, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102577

✅ No Failures

zhxchen17 commented May 30, 2023

angelayi commented May 30, 2023

zhxchen17 commented May 31, 2023

zhxchen17 commented May 31, 2023

avikchaudhuri left a comment

Choose a reason for hiding this comment

angelayi commented Jun 1, 2023

ezyang commented Jul 26, 2023

angelayi commented May 30, 2023 •

edited

pytorch-bot bot commented May 30, 2023 •

edited