Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a link in RPC doc page to point to PT Distributed overview #41108

Closed
wants to merge 2 commits into from

Conversation

mrshenli
Copy link
Contributor

@mrshenli mrshenli commented Jul 8, 2020

Stack from ghstack:

Differential Revision: D22440751

@mrshenli
Copy link
Contributor Author

mrshenli commented Jul 8, 2020

@jlin27 please let me know if this stack should first go into master then cherry-pick to release/1.6 or if this should directly go into release/1.6. Thanks!

@mrshenli mrshenli requested a review from jlin27 July 8, 2020 01:58
@dr-ci
Copy link

dr-ci bot commented Jul 8, 2020

💊 CI failures summary and remediations

As of commit 6321236 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 08 14:46:19 [E request_callback_impl.cpp:168] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save() instead.
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Jul 08 14:46:19  
Jul 08 14:46:19 [E request_callback_impl.cpp:168] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save(<filename>) instead. 
Jul 08 14:46:19  
Jul 08 14:46:19 At: 
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/jit/__init__.py(1154): __getstate__ 
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Jul 08 14:46:19  
Jul 08 14:46:19 [E request_callback_impl.cpp:168] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save(<filename>) instead. 
Jul 08 14:46:19  
Jul 08 14:46:19 At: 
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/jit/__init__.py(1154): __getstate__ 
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Jul 08 14:46:19   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Jul 08 14:46:19  
Jul 08 14:46:19 ok (1.336s) 
Jul 08 14:46:20   test_unexepected_kwarg_is_specified (__main__.JitRpcTestWithSpawn) ... ok (1.297s) 
Jul 08 14:46:22   test_user_rrefs_confirmed (__main__.JitRpcTestWithSpawn) ... ok (1.273s) 
Jul 08 14:46:23   test_user_rrefs_confirmed_remote (__main__.JitRpcTestWithSpawn) ... ok (1.260s) 

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32 (1/1)

Step: "pytorch android gradle build only x86_32 (for PR)" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:fff7795428560442086f7b2bb6004b65245dc11a-63212364ed248b48726d9945065143a94e22ce5f-android-x86_32 not found
docker_image_libtorch_android_x86_32: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:fff7795428560442086f7b2bb6004b65245dc11a-63212364ed248b48726d9945065143a94e22ce5f-android-x86_32 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:fff7795428560442086f7b2bb6004b65245dc11a-63212364ed248b48726d9945065143a94e22ce5f-android-x86_32 not found 

🚧 3 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


ci.pytorch.org: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 10 times.

@jlin27
Copy link
Contributor

jlin27 commented Jul 8, 2020

@mrshenli - Please have this first go into master and then cherry-pick to release/1.6. Thanks!

@mrshenli mrshenli requested a review from apaszke as a code owner July 8, 2020 20:19
mrshenli added a commit that referenced this pull request Jul 8, 2020
ghstack-source-id: 43f192adc53d48feb141034b82c2f77292ee3ef1
Pull Request resolved: #41108
mrshenli added a commit to mrshenli/pytorch that referenced this pull request Jul 8, 2020
…ch#41108)

Summary: Pull Request resolved: pytorch#41108

Test Plan: Imported from OSS

Differential Revision: D22440751

Pulled By: mrshenli

fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb
@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in 0edbe6b.

malfet pushed a commit that referenced this pull request Jul 9, 2020
… (#41156)

Summary: Pull Request resolved: #41108

Test Plan: Imported from OSS

Differential Revision: D22440751

Pulled By: mrshenli

fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb
@facebook-github-bot facebook-github-bot deleted the gh/mrshenli/208/head branch July 12, 2020 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants