Skip to content

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Apr 17, 2025

It ends up being templated over a bunch of reference-to-array-of-characters types with different lengths, such as `char const (&) [88]`, which is an annoyance when profiling and possibly a source of code bloat.

Differential Revision: [D73129450](https://our.internmc.facebook.com/intern/diff/D73129450/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151626

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7ca4a4b with merge base bedefa4 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73129450

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 18, 2025
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have two questions:

  • Any impact on import time? Since we have so many of these defs
  • There area a lot of calls to this function from out of core, is this change BC-breaking for our C++ API?

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, but let's slap a BC-breaking label, as it could potentially result in some failures.
And if those changes are about perf, would be nice to include some before/after stats on either binary size or import time

torch/library.h Outdated
Library& def(
Schema&& raw_schema,
c10::FunctionSchema s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass it by reference if we expect it to be moved anyway?

Copy link
Contributor Author

@swolchok swolchok Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass-by-value is a succinct way to implement "move if we can, copy if we have to". however, I checked the C++ core guidelines, and it appears they just recommend the more efficient approach of having two overloads, one for const ref and one for rvalue ref, so I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out this wouldn't have built in the first place with const c10::FunctionSchema&, so we can just use rvalue ref directly.

@malfet malfet added topic: bc breaking topic category module: cpp Related to C++ API labels Apr 18, 2025
@swolchok
Copy link
Contributor Author

swolchok commented Apr 18, 2025

Re: bc breaking, can you elaborate on the argument types that you think worked before this PR and won't work after? I think it should be fine and if you can point to a type I missed, we can just cover it.

To elaborate, prior to this change, def() would only compile if Schema&& could be passed to torch::schema(). torch::schema() is overloaded, not templated, and can accept either a const char* or a c10::FunctionSchema&&. I believe I covered those types with this change.

@swolchok
Copy link
Contributor Author

if those changes are about perf, would be nice to include some before/after stats on either binary size or import time

the motivation here was convenience while profiling; it's annoying to see stack traces fan out into a bunch of different instantiations of def. That said, I had expected that def should've been getting boiled away by inlining, but I checked assembly for RegisterSchema.cpp in OSS PyTorch build (just python setup.py develop, which seems to end up using -O2 despite a mention of -O3) on my Mac, and apparently instead _def was getting inlined into def. The size improvement was roughly 85500 bytes in just RegisterSchema.cpp.o.

It ends up being templated over a bunch of reference-to-array-of-characters types with different lengths, such as `char const (&) [88]`, which is an annoyance when profiling and possibly a source of code bloat.

Differential Revision: [D73129450](https://our.internmc.facebook.com/intern/diff/D73129450/)

cc jbschlosser

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73129450

@albanD
Copy link
Collaborator

albanD commented Apr 18, 2025

Re: bc breaking, can you elaborate on the argument types that you think worked before this PR and won't work after? I think it should be fine and if you can point to a type I missed, we can just cover it.

I don't have a concrete example off the top of my head. But the standard approach for BC-breaking change we have is that the PR author has to prove that everything that was possible before is still possible, not for the reviewer to come up with example where it breaks.
This is because we don't control call sites, and quite a lot of them are not even public code. So we have to consider the worst case scenario all the time to avoid breaking users.

If you're saying only these two types worked before and they still work after this change, that sounds good to me!
But then we don't need the bc-breaking label?

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #151630

pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2025
1) reserving is much better than not reserving
2) std::transform for a 1-line-body loop is generally not considered to be an improvement (and doesn't get seem to get boiled away by clang under -Oz)

Differential Revision: [D73013363](https://our.internmc.facebook.com/intern/diff/D73013363/)
Pull Request resolved: #151627
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #151626
pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2025
Clear missing reserve (we should expect that pieces are not empty).

Differential Revision: [D73129445](https://our.internmc.facebook.com/intern/diff/D73129445/)

Pull Request resolved: #151628
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #151626, #151627
pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2025
Observed several ms taken during `import torch` by c10::str here.

Differential Revision: [D73129453](https://our.internmc.facebook.com/intern/diff/D73129453/)
Pull Request resolved: #151629
Approved by: https://github.com/cyyever, https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet
ghstack dependencies: #151626, #151627, #151628
pytorchmergebot pushed a commit that referenced this pull request Apr 18, 2025
No need to create an AliasInfo...unless we need it.

Differential Revision: [D73129452](https://our.internmc.facebook.com/intern/diff/D73129452/)

Pull Request resolved: #151630
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #151626, #151627, #151628, #151629
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73129450

pytorchmergebot pushed a commit that referenced this pull request Apr 21, 2025
…KeySet tests (#151697)

Doesn't seem to be a reason to have two test files for this.

Differential Revision: [D73274020](https://our.internmc.facebook.com/intern/diff/D73274020/)
Pull Request resolved: #151697
Approved by: https://github.com/Skylion007
ghstack dependencies: #151626, #151627, #151628, #151629, #151630
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
It ends up being templated over a bunch of reference-to-array-of-characters types with different lengths, such as `char const (&) [88]`, which is an annoyance when profiling and possibly a source of code bloat.

Pull Request resolved: pytorch/pytorch#151626
Approved by: https://github.com/Skylion007, https://github.com/malfet


Internal:
<< DO NOT EDIT BELOW THIS LINE >>

**GitHub Author**: Scott Wolchok <swolchok@meta.com> (Meta Employee)
**GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch)
**GitHub Pull Request**: [#151626](pytorch/pytorch#151626)

Initially generated by: https://www.internalfb.com/intern/sandcastle/job/22517999932401279/

This was imported as part of a Diff Train.
Please review this as soon as possible. Since it is a direct copy of a commit on
GitHub, there shouldn't be much to do.

Below line forces Sandcastle to run only specified contbuilds.
@build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore]
#skipfbcodelongtail
#disable_code_coverage
@pytorch-oss-diff-train

diff-train-source-id: fc7d493

Differential Revision: [D73129450](https://our.internmc.facebook.com/intern/diff/D73129450/)
ghstack-source-id: 279132903
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpp Related to C++ API topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants