Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#include <fmt/format.h> is expensive on Debian Testing #1998

Closed
ldalessa opened this issue Nov 9, 2020 · 12 comments
Closed

#include <fmt/format.h> is expensive on Debian Testing #1998

ldalessa opened this issue Nov 9, 2020 · 12 comments

Comments

@ldalessa
Copy link

ldalessa commented Nov 9, 2020

I'd like to provide format capabilities to my objects, however #include <fmt/format.h> adds .2s - .25s time to every translation unit that includes the resulting header. This doesn't depend on any use of fmt::, it's just the include.

This might not seem like a lot, but can add up to meaningful times in larger projects. This seems like quite a large opportunity-cost just to provide the capability.

As a comparison, #include <iostream> adds a difficult-to-measure 0.02s, and #include <ranges> isn't adding any measurable time on my debian-testing platform.

@vitaut
Copy link
Contributor

vitaut commented Nov 10, 2020

The cost comes from the common standard library includes that are likely to be transitively included anyway pretty much in every translation unit in a nontrivial C++ project. You can further improve compile times using precompiled headers.

That said I moved two includes, <functional> and <vector>, to a separate header in e01d26e since they are non-essential.

With this change ~90% of the compile time for fmt/core.h comes from 4 standard library includes, e.g. <string> highlighted in this time trace:

image

(fmt/format.h results are similar with a just a few more includes).

Also 20 ms for <iostream> looks suspiciously low. For example

% cat > test.cc << EOF
#include <iostream>
EOF
time c++ -c test.cc
c++ -c test.cc  0.29s user 0.05s system 98% cpu 0.337 total

or 337 ms on clang/macOS/libc++ (and this is one of the faster runs =)). I guess it could depend on the standard library implementation but I would be surprised to see a 15x difference. There is probably something wrong with measurement.

@vitaut vitaut closed this as completed Nov 10, 2020
@ldalessa
Copy link
Author

Yes, it looks like my iostream number was off by an order of magnitude, I see ~0.20 as well now.

However when I compile code with -std=c++20 I'm seeing dramatically different timing than you.

I was under the impression that I need <fmt/format.h> in order to provide formatting for my own classes, not <fmt/core.h>.

ldalessa@portland:~/test$ cat test.cc 
#include <fmt/format.h>
ldalessa@portland:~/test$ time g++-10.2 -std=c++20 -c test.cc

real	0m1.172s
user	0m1.134s
sys	0m0.038s
ldalessa@portland:~/test$ time clang++-10 -std=c++20 -c test.cc

real	0m0.847s
user	0m0.783s
sys	0m0.045s

I'm willing to believe that this is user error, but it is 100% repeatable on debian.

@vitaut
Copy link
Contributor

vitaut commented Nov 10, 2020

You need fmt.format.h only if you want to reuse existing formatter specializations, otherwise you can just forward declare formatter and provide a specialization with just fmt/core.h. If you need fmt/format.h then make sure to use {fmt} 7.0 or later because it has much better compile times: https://github.com/fmtlib/fmt/releases/tag/7.0.0.

@vitaut
Copy link
Contributor

vitaut commented Nov 10, 2020

Unfortunately Debian only provides versions 5.x and 6.x at the moment: https://packages.debian.org/search?keywords=libfmt-dev but it's easy to install a newer one from source or embed it in your project.

@ldalessa
Copy link
Author

I'm using trunk. You should try your test with -std=c++20.

@ldalessa
Copy link
Author

Is there a forward declaration of format in core.h?

@ldalessa
Copy link
Author

Oh, you mean that it's expected that fmt/format.h should take 1.3s to include, but core.h is better. So if there were a forward declare in core.h then I wouldn't have run into this at all. I see.

I don't know why the compile time goes through the roof for format.h and -std=c++20.

Thanks for looking at this.

@vitaut
Copy link
Contributor

vitaut commented Nov 10, 2020

you mean that it's expected that fmt/format.h should take 1.3s to include, but core.h is better.

Sort of. 1.3s is still surprisingly slow but fmt/core.h should be faster by design.

On macOS/clang I see that -std=c++2a is somewhat slower but not dramatically:

% time clang++ -std=c++11 -c test.cc -I include
clang++ -std=c++11 -c test.cc -I include  0.23s user 0.04s system 98% cpu 0.267 total
% time clang++ -std=c++2a -c test.cc -I include
clang++ -std=c++2a -c test.cc -I include  0.26s user 0.03s system 99% cpu 0.301 total

I'll check what's happening on Debian testing, thanks for reporting.

@vitaut vitaut reopened this Nov 10, 2020
@vitaut vitaut changed the title #include <fmt/format.h> is expensive #include <fmt/format.h> is expensive on Debian Testing Nov 10, 2020
@vitaut
Copy link
Contributor

vitaut commented Nov 10, 2020

I managed to repro this on Debian Testing. The time trace shows that the problematic include is <algorithm> (highlighted) that takes 613ms out of the total 869ms when compiled with -std=c++20.

image

Looks like the new algorithms aren't cheap =).

@ldalessa
Copy link
Author

Ouch. It's possible that <algorithm> sucks in the range-based APIs that all make heavy use of <concepts>. So this isn't directly a fmt problem and should hopefully get better on its own. Thanks for sorting this out.

In the meantime I'll try and figure out what the declaration of formatter looks like so that I can avoid these headers alltogether unless the types are actually used by fmt::print.

@vitaut
Copy link
Contributor

vitaut commented Nov 10, 2020

Kicked one more header out of fmt/core.h: 14f6bd0

vitaut added a commit that referenced this issue Nov 11, 2020
vitaut added a commit that referenced this issue Nov 11, 2020
vitaut added a commit that referenced this issue Nov 11, 2020
@vitaut
Copy link
Contributor

vitaut commented Nov 11, 2020

Removed the <algorithm> include in b5dac0f which should somewhat improve the situation although other headers are also more expensive in C++20.

@vitaut vitaut closed this as completed Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants