-
Notifications
You must be signed in to change notification settings - Fork 13.8k
arg: slightly reduce compilation time #17324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Quite an interesting thing, here is the breakdown of the compilation time of each object: https://gist.github.com/ngxson/5569a8ec8f380a2c51df029feab260da Summarized by AI:
|
|
Have you considered using precompiled headers? I've had good luck with them in other projects, at least with msvc. |
I tried to address #17329 using pre-compiled header but it doesn't resolve the problem. Free free to give it a try on your side. For the current PR, I'm not quite sure if pre-compiled header can help, as much of the time spent on creating copy constructor of |
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't observe a significant compile-time reduction with this change on M2 Ultra. The reported 15% speed-up is good, though it seems a bit too large for such a change. If you can confirm it then I guess it's ok. Otherwise, if the speedup is small, I'd recommend keeping the old idiomatic l-value reference implementation.
|
@ggerganov Thanks for testing. I'll hold off this PR a bit more for verification. One of the goal of this PR is to avoid copy constructor, though I still haven't yet disassemble the compile code to verify that. Will check and merge if this really prevents some copies. |
| * - if both {LLAMA_EXAMPLE_COMMON, LLAMA_EXAMPLE_*,} are set, we will prioritize the LLAMA_EXAMPLE_* matching current example | ||
| */ | ||
| auto add_opt = [&](common_arg arg) { | ||
| auto add_opt = [&](common_arg && arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is the only change that is needed, and which actually avoids copying.
All the other changes shouldn't make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I just realized the change makes it an R-value.
I think, just change it to common_arg & (regular L-value reference). It still allows moving, should guarantee no copies, and no other changes needed. It's 10% faster for me compared to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how it supposes to compile without other changes?
......./common/arg.cpp:771:5: error: no matching function for call to object of type '(lambda at ......./common/arg.cpp:720:20)'
771 | add_opt(common_arg(
| ^~~~~~~
......./common/arg.cpp:720:20: note: candidate function not viable: expects an rvalue for 1st argument
720 | auto add_opt = [&](common_arg && arg) {
| ^ ~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common_arg & doesn't work either, please test before comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I tested it with MSVC and it works, but it's technically not correct (to bind temporary to l-value ref).
Hm I really wanted there to be a simpler solution, because the way the functions work now they implicitly transform L-values into R-values, which I think is not nice.
To illustrate the problem:
common_arg arg;
std::vector<common_arg> vec;
vec.push_back(arg.set_sparam());
// use arg here is UB, because set_sparam made it possible to be moved into the vector, which is not obvious
To avoid that, the functions can be declared like this:
common_arg && set_sparam() &&; // can only be callsed if it is a R-value already
but then you typically have to supply L-value overloads too, and that's a lot of boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a l-value overload as we never use that case in practice. The current infrastructure of arg.cpp is currently built on many assumptions like that, mainly for syntactic sugar.
But I'd agree that we're maybe going for too many boilerplates and hacks for too little improvement, I think the current PR is not worth merging as @ggerganov suggested above. Closing it now and maybe revisit it later if there are better ideas.
Tested on my Macbook M3 Max, time reduced from ~13.46s to ~11.33s
master
PR