Skip to content

Conversation

peterbell10
Copy link
Collaborator

Closes #36977

This avoid the division by zero that was causing NaNs to appear in the output. AvgPooling2d and AvgPooling3d both had this issue on CPU and CUDA.

@dr-ci
Copy link

dr-ci bot commented Jul 13, 2020

💊 CI failures summary and remediations

As of commit 0b00129 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 13 16:44:18 [E request_callback_impl.cpp:168] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save() instead.
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Jul 13 16:44:18  
Jul 13 16:44:18 [E request_callback_impl.cpp:168] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save(<filename>) instead. 
Jul 13 16:44:18  
Jul 13 16:44:18 At: 
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/jit/_script.py(570): __getstate__ 
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Jul 13 16:44:18  
Jul 13 16:44:18 [E request_callback_impl.cpp:168] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save(<filename>) instead. 
Jul 13 16:44:18  
Jul 13 16:44:18 At: 
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/jit/_script.py(570): __getstate__ 
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Jul 13 16:44:18   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Jul 13 16:44:18  
Jul 13 16:44:19 ok (2.252s) 
Jul 13 16:44:21   test_unexepected_kwarg_is_specified (__main__.JitRpcTestWithSpawn) ... ok (2.494s) 
Jul 13 16:44:24   test_user_rrefs_confirmed (__main__.JitRpcTestWithSpawn) ... ok (2.123s) 
Jul 13 16:44:26   test_user_rrefs_confirmed_remote (__main__.JitRpcTestWithSpawn) ... ok (2.316s) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

@ezyang
Copy link
Contributor

ezyang commented Jul 14, 2020

Needs benchmarks before and after

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks pretty reasonable. Benchmarking should show a perf hit but hopefully it is not too bad.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 87bf04f.

@peterbell10
Copy link
Collaborator Author

peterbell10 commented Jul 15, 2020

I've run the long benchmarks for AvgPool2d and AvgPool3d. CUDA benchmarks show no measurable difference, cpu benchmark shows up to a 1% slowdown for AvgPool2d and 2% for AvgPool3d.

facebook-github-bot pushed a commit that referenced this pull request Jul 15, 2020
Summary:
Related to #41368

These benchmarks support CUDA already so there is no reason for it not to be in the benchmark config.

Pull Request resolved: #41438

Reviewed By: zhangguanheng66

Differential Revision: D22540756

Pulled By: ezyang

fbshipit-source-id: 621eceff37377c1ab06ff7483b39fc00dc34bd46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Under ceil_mode=True for AvgPooling2d, Pytorch fails in calculating pooling output shape as expected, and gets NaN results.

4 participants