Fix tensor.deepcopy for lazy device #73197

comaniac · 2022-02-21T23:05:43Z

A small bug that misses lazy in tensor.deepcopy, which results in segmentation when deepcopy a lazy model.

facebook-github-bot · 2022-02-21T23:05:47Z

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

pytorch-bot · 2022-02-21T23:05:47Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/comaniac/pytorch/blob/8aedb8eedb0816bc2e09f19c7a2159acc4b461ec/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-02-21T23:05:48Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/73197
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 8aedb8e (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-02-22T00:08:50Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

anijain2305 · 2022-02-22T05:55:39Z

@wconstab Can you please take a look?

wconstab · 2022-02-22T17:47:47Z

@comaniac Thanks, this looks right at first glance but can you say how you tested it? I will verify it on our staging branch too.

@comaniac

- parallel change to that submitted on master by @comaniac

comaniac · 2022-02-22T18:00:37Z

@comaniac Thanks, this looks right at first glance but can you say how you tested it? I will verify it on our staging branch too.

Thanks @wconstab for the review. I basically just did the following.

model = torchvision.models.resnet18(num_classes=1000)
model.to(device="lazy")
model.train()
copy.deepcopy(model) # error

The above code snippet will trigger the following error. Looks like PyTorch tries to copy a lazy tensor when deep copying a model, although lazy tensor being a model parameter doesn't make any sense to me.

[E tensor_impl.cpp:171] Lazy tensors do not have storage

free(): double free detected in tcache 2
Fatal Python error: Aborted

Current thread 0x00007fcea69252c0 (most recent call first):
  File "/home/ubuntu/anaconda3/envs/py37_torch/lib/python3.7/site-packages/torch/_tensor.py", line 181 in storage
  File "/home/ubuntu/anaconda3/envs/py37_torch/lib/python3.7/site-packages/torch/_tensor.py", line 98 in __deepcopy__
  File "/home/ubuntu/anaconda3/envs/py37_torch/lib/python3.7/copy.py", line 161 in deepcopy
  File "/home/ubuntu/anaconda3/envs/py37_torch/lib/python3.7/copy.py", line 307 in _reconstruct
  ...

Note 1: I register the Lazy backend by myself. It's pretty much the same way as Torch/XLA.
Note 2: I also tested with LeNet-5 but didn't trigger this bug. I didn't dive into the reason tho.

facebook-github-bot · 2022-02-22T18:16:53Z

@wconstab has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: A small bug that misses `lazy` in tensor.__deepcopy__, which results in segmentation when deepcopy a lazy model. Pull Request resolved: #73197 Reviewed By: jbschlosser Differential Revision: D34394482 Pulled By: wconstab fbshipit-source-id: c84fdb9b3a827677971fd3477a92679d7dbce3c0

@comaniac

- parallel change to that submitted on master by @comaniac

Summary: A small bug that misses `lazy` in tensor.__deepcopy__, which results in segmentation when deepcopy a lazy model. Pull Request resolved: pytorch/pytorch#73197 Reviewed By: jbschlosser Differential Revision: D34394482 Pulled By: wconstab fbshipit-source-id: c84fdb9b3a827677971fd3477a92679d7dbce3c0 (cherry picked from commit c003d150cea969a6595ef8004ea82596fb9431b6)

Fix tensor.__deepcopy__ for lazy device

8aedb8e

pytorch-bot bot added the ciflow/default label Feb 21, 2022

pytorchbot added the open source label Feb 21, 2022

facebook-github-bot added the cla signed label Feb 22, 2022

wconstab added a commit that referenced this pull request Feb 22, 2022

Fix tensor.__deepcopy__ for lazy device #73197

e3f13e6

- parallel change to that submitted on master by @comaniac

wconstab self-requested a review February 22, 2022 18:16

wconstab approved these changes Feb 22, 2022

View reviewed changes

pytorchmergebot closed this in 1ef244e Feb 23, 2022

comaniac deleted the patch-1 branch February 23, 2022 02:37

wconstab added a commit that referenced this pull request Feb 24, 2022

Fix tensor.__deepcopy__ for lazy device #73197 (#73216)

f2da2e1

- parallel change to that submitted on master by @comaniac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix tensor.deepcopy for lazy device #73197

Fix tensor.deepcopy for lazy device #73197

Uh oh!

comaniac commented Feb 21, 2022

Uh oh!

facebook-github-bot commented Feb 21, 2022

Uh oh!

pytorch-bot bot commented Feb 21, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Feb 21, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 22, 2022

Uh oh!

anijain2305 commented Feb 22, 2022

Uh oh!

wconstab commented Feb 22, 2022

Uh oh!

comaniac commented Feb 22, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix tensor.__deepcopy__ for lazy device #73197

Fix tensor.__deepcopy__ for lazy device #73197

Uh oh!

Conversation

comaniac commented Feb 21, 2022

Uh oh!

facebook-github-bot commented Feb 21, 2022

Action Required

Process

Uh oh!

pytorch-bot bot commented Feb 21, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Feb 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

facebook-github-bot commented Feb 22, 2022

Uh oh!

anijain2305 commented Feb 22, 2022

Uh oh!

wconstab commented Feb 22, 2022

Uh oh!

comaniac commented Feb 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Feb 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix tensor.deepcopy for lazy device #73197

Fix tensor.deepcopy for lazy device #73197

facebook-github-bot commented Feb 21, 2022 •

edited

Loading

comaniac commented Feb 22, 2022 •

edited

Loading