-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
mpprint: Correctly format leading zeros with separators. #18092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18092 +/- ##
=======================================
Coverage 98.38% 98.38%
=======================================
Files 171 171
Lines 22299 22307 +8
=======================================
+ Hits 21939 21947 +8
Misses 360 360 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code size report:
|
8140ef4
to
52bfb7c
Compare
py/mpprint.c
Outdated
// strings with minimal flash size: | ||
// 0000000000000000 <- pad_zeros | ||
// 0000_000 <- pad_zeros_comma (offset: 12) | ||
// 000,00 <- pad_zeros_comma (offset: 17) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos in the comment.
zeros
-> zeroes
pad_zeros_comma (offset: 12)
-> pad_zeroes_underscore (offset: 12)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I think I fixed this now.
(projectwide zeros vs zeroes seems to be inconsistent but I'm happy to be consistent in this file!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The #defines in the following lines use zeroes
. Sorry, my comment was wrong then as well.
83f1b76
to
7bfb342
Compare
Besides that is works in my test at a SAMD device. I could have used the UNIX port. |
fwiw this was actually giving me trouble when I was working on #17688 and wanted to print out the constants in the uctypes module in hex with leading zeros and grouping chars. it's not just a random bug find. |
Having the digits grouped is pretty convenient, so IMHO it's a good change. |
A cpydiff for this would be good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor tweaks, but nothing that isn't just a strict formal equivalent.
I've tested this on my Pico2 / RP2350 / Cortex M33 @ 300MHz and can confirm that all relevant tests pass.
""" | ||
categories: Types,str | ||
description: MicroPython accepts but does not properly implement the "," or "_" grouping character for float values | ||
cause: To reduce code size, MicroPython does not implement this combination. Grouping characters will not appear in the number's significant digits and will appear at incorrect locations in leading leading zeros. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cause: To reduce code size, MicroPython does not implement this combination. Grouping characters will not appear in the number's significant digits and will appear at incorrect locations in leading leading zeros. | |
cause: To reduce code size, MicroPython does not implement this combination. Grouping characters will not appear in the number's significant digits and will appear at incorrect locations in leading zeros. |
py/mpprint.c
Outdated
} else if (fill == '0' && !grouping) { | ||
pad_chars = pad_zeroes; | ||
pad_size = sizeof(pad_zeroes) - 1; | ||
pad_size = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps move these size values into #define
constants? Just to keep all the information about these overlapping strings all together in one place.
pad_size = 16; | |
pad_size = pad_zeroes_size; |
py/mpprint.c
Outdated
} else if (fill == '0') { | ||
if (grouping == '_') { | ||
pad_chars = pad_zeroes_underscore; | ||
pad_size = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pad_size = 5; | |
pad_size = pad_zeroes_underscore_size; |
py/mpprint.c
Outdated
pad_size = 5; | ||
} else { | ||
pad_chars = pad_zeroes_comma; | ||
pad_size = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pad_size = 4; | |
pad_size = pad_zeroes_comma_size; |
py/mpprint.c
Outdated
pad_chars = pad_spaces; | ||
pad_size = sizeof(pad_spaces) - 1; | ||
} else if (fill == '0') { | ||
pad_size = sizeof(pad_spaces); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this too, for symmetry?
pad_size = sizeof(pad_spaces); | |
pad_size = pad_spaces_size; |
Plus just this up with the other size definitions
#define pad_spaces_size (sizeof(pad_spaces))
#define pad_zeroes (pad_common + 0) | ||
#define pad_zeroes_comma (pad_common + 17) | ||
#define pad_zeroes_underscore (pad_common + 12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#define pad_zeroes (pad_common + 0) | |
#define pad_zeroes_comma (pad_common + 17) | |
#define pad_zeroes_underscore (pad_common + 12) | |
#define pad_zeroes (pad_common + 0) | |
#define pad_zeroes_size (16) | |
#define pad_zeroes_comma (pad_common + 17) | |
#define pad_zeroes_comma_size (4) | |
#define pad_zeroes_underscore (pad_common + 12) | |
#define pad_zeroes_underscore_size (5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good ideas, done. thanks also for catching the doc mistake.
static const char pad_spaces[16] = {' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '}; | ||
static const char pad_common[23] = {'0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '_', '0', '0', '0', ',', '0', '0'}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would've been a perfect case for array range initializers if we didn't need to target MSVC. 🙃
#define pad_spaces_size (16)
static const char pad_spaces[pad_spaces_size] = { [0 ... pad_spaces_size - 1] = ' ' };
#define pad_common_size (23)
static const char pad_common[pad_common_size] = { [0 ... pad_common_size - 1] = '0', [16] = '_', [20] = ',' };
Did a little bit of benchmarking just to make sure I understood how much advantage this code actually gets from having those padding buffers btw, though I'd share. Tested formatting all integers 1 to 20000, comparing the performance of the base case padding with exclamation marks vs the optimized case, for different padding lengths, on my Pico2 / RP2350 / Cortex M33 @ 300MHz:
I also did a test with deleting and simplifying away all of the code from the optimized cases, and then padding with zeroes:
So it seems like doing this padding buffer optimization at all incurs about a 5% performance penalty to padding short integers 15 characters or less, but ends up cutting the execution time pretty well in half for long padding lengths. |
Since you're set up for benchmarking, maybe you'd see how using groups of 4 works out. It looks like it would be inexpensive in code size to let any character be padded in groups of 4. |
569ae3f
to
0a78e8e
Compare
text size of
The space economized version is https://github.com/micropython/micropython/compare/master...jepler:leading-zeroes-alternate?expand=1 and would need to be squashed up. It uses the hard coded patterns for zeroes+grouping and a synthesized 4 byte fill for everything else, getting rid of some of the static array data. |
The new padding patterns for commas-and-zeroes and underscores-and- zeroes are smooshed together into the existing pad_zeroes to save space. Only the two combinations of (decimal + commas) and (other bases + underscores) are properly supported. Add a test for it. Closes micropython#18082 Signed-off-by: Jeff Epler <jepler@unpythonic.net>
Signed-off-by: Jeff Epler <jepler@unpythonic.net>
0a78e8e
to
068e110
Compare
TBH a similar thought occurred to me as well. The real expensive bit of overhead here isn't the actual raw 1-byte-at-a-time data copying (instead of machine-word size transfers etc) --- it's that each print call involves dispatching a function pointer (and a whole lot more extra bookkeeping). It might be just as much of a speedup --- and possibly a code-size reduction --- to drop the extra .rodata and the conditionals to use it and just fill a new buffer on the stack every time. (Also another optimization that might be worthwhile is to see if there's a way to get the compiler to speculatively devirtualize some of those calls...) |
I've spent some time playing around with this and was able to confirm my theory --- using a 20 byte padding buffer on the stack (=lcm(4,5), so it divides neatly into the stride length for both the underscore and comma cases), I'm able to drop about 80ms off the benchmark times for the short-padding cases, while still getting the ~2x speedup over the original "one character at a time" base-case strategy for any padding character on the long-padding cases --- while also dropping 32 bytes off the RP2 build size. master...AJMansfield:micropython:leading-zeros-alt2 The benchmark results are from my Pico2 RP2350 running in Cortex M33 mode at 300 MHz. (Nice thing about using an embedded processor to benchmark is that you get near enough exact run-to-run repeatability.) Raw Benchmark ResultsmasterCurrent master behavior.
leading-zerosJepler's original version that implements grouping.
leading-zeros-alt2My new version that uses a fixed-size buffer on the stack that's filled at each call.
|
That looks like a real promising alternative, especially if it's smaller than the others. |
How to proceed here, then? It feels like there's really two different things here now, and I don't want to steal the grouping feature from you either. My thought is to PR another version of this that limits its scope just to updating |
If you have a branch that's fixes the bug I was trying to address and is better in other respects, I'm not worried about the git Author or Co-Author credit. |
This reworks `mp_print_strn` to use a stack-allocated padding buffer rather than special-cased hardcoded ROM strings in order to reduce code size and improve string formatting performance. Note that this is actually just as performant, even for zeroes and spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster for short-padding cases, and 3.4% faster for long-padding cases. I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but the results definitely won't generalize to other architectures, and probably not even to other implementations of the same architecture. The buffer size of 20 is chosen as the smallest size that easily admits a later implementation of micropython#18092 to support padding with grouping characters, to avoid pessimizing the short-padding cases any more than required. I've also explored alternatives involving using `alloca` for the padding buffer, but the conditionals and fallback logic needed to bound stack usage for the pathological cases end up pessimizing code size beyond what's reasonable for the very marginal additional speed gains. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
This reworks `mp_print_strn` to use a stack-allocated padding buffer rather than special-cased hardcoded ROM strings in order to reduce code size and improve string formatting performance. Note that this is actually just as performant, even for zeroes and spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster for short-padding cases, and 3.4% faster for long-padding cases. I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but the results definitely won't generalize to other architectures, and probably not even to other implementations of the same architecture. The buffer size of 20 is chosen as the smallest size that easily admits a later implementation of micropython#18092 to support padding with grouping characters, to avoid pessimizing the short-padding cases any more than required. I've also explored alternatives involving using `alloca` for the padding buffer, but the conditionals and fallback logic needed to bound stack usage for the pathological cases end up pessimizing code size beyond what's reasonable for the very marginal additional speed gains. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
If it was just about a vanity credit I wouldn't be fussed either lol. To me it's far more about preserving the chain of ideas and keeping the development history as easy to follow as possible for the next guy having to dig through a git blame trace to track down some obscure bug. And either way --- I still think the case for using a buffer on the stack is strong enough to stand on its own, and more easily defended without the whole factorial space of other micro-optimisations that doing it together with the grouping feature adds. |
This reworks `mp_print_strn` to use a stack-allocated padding buffer rather than special-cased hardcoded ROM strings in order to reduce code size and improve string formatting performance. Note that this is actually just as performant, even for zeroes and spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster for short-padding cases, and 3.4% faster for long-padding cases. I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but the results definitely won't generalize to other architectures, and probably not even to other implementations of the same architecture. The buffer size of 20 is chosen as the smallest size that easily admits a later implementation of micropython#18092 to support padding with grouping characters, to avoid pessimizing the short-padding cases any more than required. I've also explored alternatives involving using `alloca` for the padding buffer, but the conditionals and fallback logic needed to bound stack usage for the pathological cases end up pessimizing code size beyond what's reasonable for the very marginal additional speed gains. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
This reworks `mp_print_strn` to use a stack-allocated padding buffer rather than special-cased hardcoded ROM strings in order to reduce code size and improve string formatting performance. Note that this is actually just as performant, even for zeroes and spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster for short-padding cases, and 3.4% faster for long-padding cases. I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but the results definitely won't generalize to other architectures, and probably not even to other implementations of the same architecture. The buffer size of 20 is chosen as the smallest size that easily admits a later implementation of micropython#18092 to support padding with grouping characters, to avoid pessimizing the short-padding cases any more than required. I've also explored alternatives involving using `alloca` for the padding buffer, but the conditionals and fallback logic needed to bound stack usage for the pathological cases end up pessimizing code size beyond what's reasonable for the very marginal additional speed gains. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
This reworks `mp_print_strn` to use a stack-allocated padding buffer rather than special-cased hardcoded ROM strings in order to reduce code size and improve string formatting performance. Note that this is actually just as performant, even for zeroes and spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster for short-padding cases, and 3.4% faster for long-padding cases. I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but the results definitely won't generalize to other architectures, and probably not even to other implementations of the same architecture. The buffer size of 20 is chosen as the smallest size that easily admits a later implementation of micropython#18092 to support padding with grouping characters, to avoid pessimizing the short-padding cases any more than required. I've also explored alternatives involving using `alloca` for the padding buffer, but the conditionals and fallback logic needed to bound stack usage for the pathological cases end up pessimizing code size beyond what's reasonable for the very marginal additional speed gains. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
This reworks `mp_print_strn` to use a stack-allocated padding buffer rather than special-cased hardcoded ROM strings in order to reduce code size and improve string formatting performance. Note that this is actually just as performant, even for zeroes and spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster for short-padding cases, and 3.4% faster for long-padding cases. I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but the results definitely won't generalize to other architectures, and probably not even to other implementations of the same architecture. The buffer size of 20 is chosen as the smallest size that easily admits a later implementation of micropython#18092 to support padding with grouping characters, to avoid pessimizing the short-padding cases any more than required. I've also explored alternatives involving using `alloca` for the padding buffer, but the conditionals and fallback logic needed to bound stack usage for the pathological cases end up pessimizing code size beyond what's reasonable for the very marginal additional speed gains. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
This reworks `mp_print_strn` to use a stack-allocated padding buffer rather than special-cased hardcoded ROM strings in order to reduce code size and improve string formatting performance. Note that this is actually just as performant, even for zeroes and spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster for short-padding cases, and 3.4% faster for long-padding cases. I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but the results definitely won't generalize to other architectures, and probably not even to other implementations of the same architecture. The buffer size of 20 is chosen as the smallest size that easily admits a later implementation of micropython#18092 to support padding with grouping characters, to avoid pessimizing the short-padding cases any more than required. I've also explored alternatives involving using `alloca` for the padding buffer, but the conditionals and fallback logic needed to bound stack usage for the pathological cases end up pessimizing code size beyond what's reasonable for the very marginal additional speed gains. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
I can probably "rebuild" this atop your branch if that's how it ends up happening. |
Summary
Correctly format integers with a grouping character and leading zeroes. such as
"{:04,d}".format(0x100)
->"0,256"
.Closes #18082.
Testing
I added a new test to ensure the implementation matches standard Python for the tested cases.
Trade-offs and Alternatives
I combined three different padding strings into a single string to reduce growth in const data.
The separator format option is already accepted but not supported for floating point numbers. Now, incorrect separator characters would be inserted in the padding positions when formatting an FP number, like
+0,000,0003141.150