[PyTorch] Use real if constexpr behind macro in hot template#51368
[PyTorch] Use real if constexpr behind macro in hot template#51368swolchok wants to merge 7 commits intogh/swolchok/105/basefrom
Conversation
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit f309519 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 120697490 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 120761381 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121256867 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121300917 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121363469 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
…hind macro in hot template" This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/) [ghstack-poisoned]
Pull Request resolved: #51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121378959 Differential Revision: [D26154964](https://our.internmc.facebook.com/intern/diff/D26154964/)
|
Do we have any data on the perf benefit from this? I can believe the lambdas in the polyfill API add overhead (or inline disruption risk, etc.) over the native feature, but given that we'll want to use this kind of surface-level #ifdef sparingly for readability's sake, it'd be good to establish a ballpark quantitative threshold for when it's justified. |
bhosmer
left a comment
There was a problem hiding this comment.
Ah nvm my comment, saw the numbers in D26154964. To answer my own question, I'd say ~8% reduction in build time makes the surface-level splintering tolerable.
(BTW, wasn't marked as a reviewer on GH and missed it on Phab, hence the delay, apologies)
|
@bhosmer note that IIUC, Ubuntu 16.04 LTS is the blocker for raising our minimum C++ version to C++17. It hits end of life in April, so we should be able to move to C++17 and use |
Yup! @smessmer actually has the move to C++17 on his H1 roadmap, so you're right, it'll clean up pretty quick. |
|
This pull request has been merged in 6e1a5b1. |
…#51368) Summary: Pull Request resolved: pytorch#51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121378959 Test Plan: Confirmed this speeds up RegisterCPU.cpp optimized build by simply running builds under `time(1)`: previous diff: [50.53, 50.41, 50.57, 50.67, 50.94] mean: 50.6 std: 0.179 this diff: [45.71, 45.89, 46.21, 48.51, 45.84] mean: 46.4 std: 1.05 Reviewed By: bhosmer Differential Revision: D26154964 fbshipit-source-id: 62ee2f5a872007db032dfebf7ad4d1b6e1ce63d1
Stack from ghstack:
This seems to noticably reduce build times, at least for
RegisterCPU.cpp. It makes sense that a compiler builtin would be
faster than simulating the same builtin with templates.
Identified with templight.
Differential Revision: D26154964