New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iprintf/small_printf opts using Emscripten OS triple in LLVM backend #8348
Conversation
Doing some experimentation, iprintf hello world is 14720 bytes. Adding in long double makes it 22040. If long doubles are just doubles, that shrinks to 17468. So long doubles are the bigger deal here, and I lean towards replacing them with doubles, if we don't find a better solution. |
Ok, this is now ready for review. For landing, we need to wait for https://reviews.llvm.org/D57620 and then to land our own triple. (Tests will fail for the wasm backend path until then.) Summary for the squashed commit:
Code size wise, iprintf (no float support) is around 3K smaller, and small (just double, no long double) is around 750 bytes smaller. Full printf is larger than small because it has the code to cast a float128 to a double. Note that full printf now does not have long double printing at full precision, so we save 4K or so there. |
I remember one of the arguments for making long double be float128 is so that it would work seamlessly with library code i.e. libc i.e. printf. And I remember my argument against being something along the lines of, if the way to get reasonable codesize is to downcast to 64 bits, then we don't actually get any of that benefit, and just have the additional complexity of working around it. Not complaining about this implementation, that seems like the most reasonable thing given the constraints. Rather I'm complaining about the constraints, i.e. long double being float128. |
Yeah, I think that's fair. We are doing extra work here just to keep it an option to interoperate with wasi, basically. It's an imperfect tradeoff for sure. |
} | ||
''' % code) | ||
run_process([PYTHON, EMCC, 'src.c', '-O1']) | ||
return os.path.getsize('a.out.wasm') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to delete src.c at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, when the test ends the temp dir it ran in will be cleaned up.
LLVM side landed. This can land as soon as we have a new lkgr (the LLVM change should not break the bots, since we don't actually use the new OS triple until this PR). |
tests/runner.py
Outdated
@@ -135,6 +138,12 @@ def decorated(f): | |||
return decorated | |||
|
|||
|
|||
def only_wasm_backend(note=''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about no_fastcomp
to match the existing no_wasm_backend
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, changing to no_fastcomp
.
src = open(path_from_root('tests', 'core', 'test_strtold.c')).read() | ||
expected = open(path_from_root('tests', 'core', expected_file)).read() | ||
self.do_run(src, expected) | ||
self.do_run_in_out_file_test('tests', 'core', 'test_strtold') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? Did fastcomp get fixed? Maybe a separate change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changed because the wasm backend used to print float128s with full precision, and after this PR, with float64 precision. There is no change to fastcomp - the wasm backend now emits the same as fastcomp here (so no more need to have two options for different outputs in this test).
@@ -952,7 +952,7 @@ def apply_configuration(): | |||
|
|||
# Target choice. | |||
ASM_JS_TARGET = 'asmjs-unknown-emscripten' | |||
WASM_TARGET = 'wasm32-unknown-unknown-wasm' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop the final -wasm
here too while you are there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, removed -wasm
.
…mscripten-core#8348) Use the new emscripten triple in LLVM. This lets us avoid needing to define __EMSCRIPTEN__ ourselves, and enables optimizations for iprintf (integer printf), __small_printf (printf with floats but not long double), and friends. Refactor/hack musl to have compact versions of those functions. To simplify things and avoid the duplicate code problem of having both printf and iprintf linked in (from separate object files) this uses an indirect call for fmt_fp (the key method that formats a floating point number) and pop_arg_long_double (a new method that pops a long double from the arguments). We then pass around either NULL or function pointers to the real implementation, e.g. iprintf sends NULL for both. This means an extra indirect call on them when they are actually used, but this is not too bad since fmt_fp is fairly heavy anyhow - adding an indirect call or two on top is probably not a huge deal. This also makes printf only use doubles for actual formatting. That is, it casts a float128 to a float64 for printing. This makes things simpler (no need to create two fmt_fps and avoids having duplicate code if both are linked in. This also keeps us consistent with how fastcomp prints currently, so this is not a regression. If users ask for full float128 printing, we can add it later (it has never come up so far). Code size wise, iprintf (no float support) is around 3K smaller, and small (just double, no long double) is around 750 bytes smaller. Full printf is larger than small because it has the code to cast a float128 to a double. Note that full printf now does not have long double printing at full precision, so we save 4K or so there.
…backend (emscripten-core#8348)" This reverts commit 9e837aa.
…backend (emscripten-core#8348)" This reverts commit 9e837aa.
…mscripten-core#8348) Use the new emscripten triple in LLVM. This lets us avoid needing to define __EMSCRIPTEN__ ourselves, and enables optimizations for iprintf (integer printf), __small_printf (printf with floats but not long double), and friends. Refactor/hack musl to have compact versions of those functions. To simplify things and avoid the duplicate code problem of having both printf and iprintf linked in (from separate object files) this uses an indirect call for fmt_fp (the key method that formats a floating point number) and pop_arg_long_double (a new method that pops a long double from the arguments). We then pass around either NULL or function pointers to the real implementation, e.g. iprintf sends NULL for both. This means an extra indirect call on them when they are actually used, but this is not too bad since fmt_fp is fairly heavy anyhow - adding an indirect call or two on top is probably not a huge deal. This also makes printf only use doubles for actual formatting. That is, it casts a float128 to a float64 for printing. This makes things simpler (no need to create two fmt_fps and avoids having duplicate code if both are linked in. This also keeps us consistent with how fastcomp prints currently, so this is not a regression. If users ask for full float128 printing, we can add it later (it has never come up so far). Code size wise, iprintf (no float support) is around 3K smaller, and small (just double, no long double) is around 750 bytes smaller. Full printf is larger than small because it has the code to cast a float128 to a double. Note that full printf now does not have long double printing at full precision, so we save 4K or so there.
The optimization option was changed from `-Oz` to `-O0` due to `iprintf` calls in emscripten-core#8348.
The optimization option was changed from `-Oz` to `-O0` due to `iprintf` calls in emscripten-core#8348.
The optimization option was changed from `-Oz` to `-O0` due to `iprintf` calls in #8348.
This uses LLVM with @sunfishcode 's printf-optimizing-patch plus some local patches to define an Emscripten OS in clang, so that we get the printf opt (note the triple change). This PR then has some hackery to split out iprintf in musl, that is, it adds a printf variant with no float support.
This mostly works - printf with only integers leads to 7K less code. Code using floats is a tiny bit bigger, 50 bytes. Code using both float and int printf is an additional 70 bytes - so many programs may be 120 bytes or so larger. I don't see an obvious way to avoid that without adding magic elsewhere in the toolchain.
I worry about optimizing
__printf_small
, that is, the version of printf with double but not long double, which is not done in this PR. That seems even trickier as the keyfmt_fp
method in musl is long double specific, and it seems we'd need to duplicate and modify it. That's not easy because the code is tricky, and it's not obvious to me that we can do it without a code size cost if both versions end up linked in. But I didn't look into it very carefully - perhaps @sunfishcode has a good idea?If there isn't a good option here then overall I lean towards optimizing iprintf, despite the musl hackery, but that we fix the long double issue by removing float128 as we've been considering, and so don't need the
__printf_small
stuff.