Fix ONNXProgram.save to use torch.load(..., mmap=True) for large models #117295

thiagocrepaldi · 2024-01-11T21:07:37Z

Stack from ghstack (oldest at bottom):

-> Fix ONNXProgram.save to use torch.load(..., mmap=True) for large models #117295
Update initializer path for ONNXProgram.save due to onnx.checker limitation #117294

During ONNXProgram.save, the implicit/explicit state_dict passed in must
be loaded in memory in order to read each initializer and create an
external tensor proto with them

This PR ensures torch.load uses memory-map to support large models that
cannot fit in memory

During ONNXProgram.save, the implicit/explicit state_dict passed in must be loaded in memory in order to read each initializer and create an external tensor proto with them This PR ensures torch.load uses memory-map to support large models that cannot fit in memory [ghstack-poisoned]

pytorch-bot · 2024-01-11T21:07:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117295

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d5c3f17 with merge base b4a3563 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… large models" During ONNXProgram.save, the implicit/explicit state_dict passed in must be loaded in memory in order to read each initializer and create an external tensor proto with them This PR ensures torch.load uses memory-map to support large models that cannot fit in memory [ghstack-poisoned]

During ONNXProgram.save, the implicit/explicit state_dict passed in must be loaded in memory in order to read each initializer and create an external tensor proto with them This PR ensures torch.load uses memory-map to support large models that cannot fit in memory ghstack-source-id: 5b1b31e Pull Request resolved: #117295

BowenBao

Very nice. Do you have a case study show casing the efficiency/effectiveness before and after?

BowenBao · 2024-01-12T01:03:18Z

torch/onnx/_internal/exporter.py

-                    extra_state_dict = torch.load(path)
+                    # Loads checkpoint using memory-map on CPU to succeed with large models
+                    extra_state_dict = torch.load(
+                        path, map_location="cpu", mmap=True, weights_only=True


noticed weights_only=True is added. is this intended?

It is a security feature. when weights_only=False, malicious pickled checkpoints can execute code on the machine.
If the checkpoint can be loaded with weights_only=True, we should - but I am experimenting with it

thiagocrepaldi · 2024-01-12T01:41:29Z

Very nice. Do you have a case study show casing the efficiency/effectiveness before and after?

Not yet.

thiagocrepaldi · 2024-01-12T01:42:52Z

@pytorchbot merge

pytorchmergebot · 2024-01-12T01:44:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

thiagocrepaldi requested review from BowenBao, abock and wschin as code owners January 11, 2024 21:07

thiagocrepaldi mentioned this pull request Jan 11, 2024

Update initializer path for ONNXProgram.save due to onnx.checker limitation #117294

Closed

pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Jan 11, 2024

thiagocrepaldi added module: onnx Related to torch.onnx onnx-triaged triaged by ONNX team labels Jan 11, 2024

pytorchbot added the open source label Jan 11, 2024

BowenBao reviewed Jan 12, 2024

View reviewed changes

BowenBao approved these changes Jan 12, 2024

View reviewed changes

BowenBao reviewed Jan 12, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 12, 2024

pytorchmergebot added the merging label Jan 12, 2024

pytorchmergebot closed this in d29bf0a Jan 12, 2024

pytorchmergebot added Merged and removed merging labels Jan 12, 2024

facebook-github-bot deleted the gh/thiagocrepaldi/19/head branch January 15, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ONNXProgram.save to use torch.load(..., mmap=True) for large models #117295

Fix ONNXProgram.save to use torch.load(..., mmap=True) for large models #117295

Uh oh!

thiagocrepaldi commented Jan 11, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 11, 2024 •

edited

Loading

Uh oh!

BowenBao left a comment

Uh oh!

BowenBao Jan 12, 2024

Uh oh!

thiagocrepaldi Jan 12, 2024

Uh oh!

thiagocrepaldi commented Jan 12, 2024

Uh oh!

thiagocrepaldi commented Jan 12, 2024

Uh oh!

pytorchmergebot commented Jan 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix ONNXProgram.save to use torch.load(..., mmap=True) for large models #117295

Fix ONNXProgram.save to use torch.load(..., mmap=True) for large models #117295

Uh oh!

Conversation

thiagocrepaldi commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117295

✅ No Failures

Uh oh!

BowenBao left a comment

Choose a reason for hiding this comment

Uh oh!

BowenBao Jan 12, 2024

Choose a reason for hiding this comment

Uh oh!

thiagocrepaldi Jan 12, 2024

Choose a reason for hiding this comment

Uh oh!

thiagocrepaldi commented Jan 12, 2024

Uh oh!

thiagocrepaldi commented Jan 12, 2024

Uh oh!

pytorchmergebot commented Jan 12, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thiagocrepaldi commented Jan 11, 2024 •

edited

Loading

pytorch-bot bot commented Jan 11, 2024 •

edited

Loading