Skip to content

Conversation

jepler
Copy link
Contributor

@jepler jepler commented Sep 17, 2025

Summary

Correctly format integers with a grouping character and leading zeroes. such as "{:04,d}".format(0x100) -> "0,256".

Closes #18082.

Testing

I added a new test to ensure the implementation matches standard Python for the tested cases.

Trade-offs and Alternatives

I combined three different padding strings into a single string to reduce growth in const data.

The separator format option is already accepted but not supported for floating point numbers. Now, incorrect separator characters would be inserted in the padding positions when formatting an FP number, like +0,000,0003141.150

Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.38%. Comparing base (44986b1) to head (068e110).
⚠️ Report is 62 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #18092   +/-   ##
=======================================
  Coverage   98.38%   98.38%           
=======================================
  Files         171      171           
  Lines       22299    22307    +8     
=======================================
+ Hits        21939    21947    +8     
  Misses        360      360           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Code size report:

   bare-arm:   +76 +0.134% 
minimal x86:   +57 +0.030% 
   unix x64:   +64 +0.007% standard
      stm32:   +72 +0.018% PYBV10
     mimxrt:   +64 +0.017% TEENSY40
        rp2:   +80 +0.009% RPI_PICO_W
       samd:   +68 +0.025% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   +61 +0.013% VIRT_RV32

py/mpprint.c Outdated
// strings with minimal flash size:
// 0000000000000000 <- pad_zeros
// 0000_000 <- pad_zeros_comma (offset: 12)
// 000,00 <- pad_zeros_comma (offset: 17)
Copy link
Contributor

@robert-hh robert-hh Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few typos in the comment.
zeros -> zeroes
pad_zeros_comma (offset: 12) -> pad_zeroes_underscore (offset: 12)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I think I fixed this now.

(projectwide zeros vs zeroes seems to be inconsistent but I'm happy to be consistent in this file!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #defines in the following lines use zeroes. Sorry, my comment was wrong then as well.

@jepler jepler force-pushed the leading-zeros branch 2 times, most recently from 83f1b76 to 7bfb342 Compare September 17, 2025 16:26
@robert-hh
Copy link
Contributor

Besides that is works in my test at a SAMD device. I could have used the UNIX port.

@jepler
Copy link
Contributor Author

jepler commented Sep 17, 2025

fwiw this was actually giving me trouble when I was working on #17688 and wanted to print out the constants in the uctypes module in hex with leading zeros and grouping chars. it's not just a random bug find.

@robert-hh
Copy link
Contributor

Having the digits grouped is pretty convenient, so IMHO it's a good change.

@AJMansfield
Copy link
Contributor

The separator format option is already accepted but not supported for floating point numbers. Now, incorrect separator characters would be inserted in the padding positions when formatting an FP number, like +0,000,0003141.150

A cpydiff for this would be good!

@jepler
Copy link
Contributor Author

jepler commented Sep 17, 2025

good idea, added.
image

Copy link
Contributor

@AJMansfield AJMansfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor tweaks, but nothing that isn't just a strict formal equivalent.

I've tested this on my Pico2 / RP2350 / Cortex M33 @ 300MHz and can confirm that all relevant tests pass.

"""
categories: Types,str
description: MicroPython accepts but does not properly implement the "," or "_" grouping character for float values
cause: To reduce code size, MicroPython does not implement this combination. Grouping characters will not appear in the number's significant digits and will appear at incorrect locations in leading leading zeros.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cause: To reduce code size, MicroPython does not implement this combination. Grouping characters will not appear in the number's significant digits and will appear at incorrect locations in leading leading zeros.
cause: To reduce code size, MicroPython does not implement this combination. Grouping characters will not appear in the number's significant digits and will appear at incorrect locations in leading zeros.

py/mpprint.c Outdated
} else if (fill == '0' && !grouping) {
pad_chars = pad_zeroes;
pad_size = sizeof(pad_zeroes) - 1;
pad_size = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move these size values into #define constants? Just to keep all the information about these overlapping strings all together in one place.

Suggested change
pad_size = 16;
pad_size = pad_zeroes_size;

py/mpprint.c Outdated
} else if (fill == '0') {
if (grouping == '_') {
pad_chars = pad_zeroes_underscore;
pad_size = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pad_size = 5;
pad_size = pad_zeroes_underscore_size;

py/mpprint.c Outdated
pad_size = 5;
} else {
pad_chars = pad_zeroes_comma;
pad_size = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pad_size = 4;
pad_size = pad_zeroes_comma_size;

py/mpprint.c Outdated
pad_chars = pad_spaces;
pad_size = sizeof(pad_spaces) - 1;
} else if (fill == '0') {
pad_size = sizeof(pad_spaces);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this too, for symmetry?

Suggested change
pad_size = sizeof(pad_spaces);
pad_size = pad_spaces_size;

Plus just this up with the other size definitions

#define pad_spaces_size (sizeof(pad_spaces))

Comment on lines 50 to 53
#define pad_zeroes (pad_common + 0)
#define pad_zeroes_comma (pad_common + 17)
#define pad_zeroes_underscore (pad_common + 12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define pad_zeroes (pad_common + 0)
#define pad_zeroes_comma (pad_common + 17)
#define pad_zeroes_underscore (pad_common + 12)
#define pad_zeroes (pad_common + 0)
#define pad_zeroes_size (16)
#define pad_zeroes_comma (pad_common + 17)
#define pad_zeroes_comma_size (4)
#define pad_zeroes_underscore (pad_common + 12)
#define pad_zeroes_underscore_size (5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good ideas, done. thanks also for catching the doc mistake.

Comment on lines 43 to 45
static const char pad_spaces[16] = {' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '};
static const char pad_common[23] = {'0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '_', '0', '0', '0', ',', '0', '0'};
Copy link
Contributor

@AJMansfield AJMansfield Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would've been a perfect case for array range initializers if we didn't need to target MSVC. 🙃

#define pad_spaces_size (16)
static const char pad_spaces[pad_spaces_size] = { [0 ... pad_spaces_size - 1] = ' ' };
#define pad_common_size (23)
static const char pad_common[pad_common_size] = { [0 ... pad_common_size - 1] = '0', [16] = '_', [20] = ',' };

@AJMansfield
Copy link
Contributor

AJMansfield commented Sep 18, 2025

Did a little bit of benchmarking just to make sure I understood how much advantage this code actually gets from having those padding buffers btw, though I'd share.

Tested formatting all integers 1 to 20000, comparing the performance of the base case padding with exclamation marks vs the optimized case, for different padding lengths, on my Pico2 / RP2350 / Cortex M33 @ 300MHz:

sz char='!' char=' ' change
1 0.708 0.708 +0.00%
2 0.708 0.708 +0.00%
3 0.709 0.708 -0.14%
5 0.709 0.709 +0.00%
7 0.719 0.712 -0.97%
10 0.742 0.721 -2.83%
15 0.759 0.714 -5.93%
22 0.840 0.768 -8.57%
33 0.908 0.784 -13.66%
47 1.018 0.815 -19.94%
68 1.178 0.880 -25.30%
100 1.409 0.976 -30.73%
150 1.779 1.108 -37.72%
220 2.318 1.339 -42.23%
330 3.203 1.683 -47.46%
470 4.314 2.147 -50.23%
680 6.299 3.127 -50.36%
1000 9.408 4.664 -50.43%

I also did a test with deleting and simplifying away all of the code from the optimized cases, and then padding with zeroes:

sz deleted retained change
none 0.653 0.686 +5.05%
1 0.676 0.712 +5.33%
2 0.676 0.712 +5.33%
3 0.676 0.712 +5.33%
5 0.676 0.712 +5.33%
7 0.685 0.719 +4.96%
10 0.708 0.728 +2.82%
15 0.722 0.723 +0.14%
22 0.810 0.774 -4.44%
33 0.875 0.793 -9.37%
47 0.979 0.824 -15.83%
68 1.133 0.885 -21.89%
100 1.352 0.981 -27.44%
150 1.706 1.114 -34.70%
220 2.222 1.348 -39.33%
330 3.070 1.692 -44.89%
470 4.135 2.152 -47.96%
680 6.049 3.135 -48.17%
1000 9.053 4.672 -48.39%

So it seems like doing this padding buffer optimization at all incurs about a 5% performance penalty to padding short integers 15 characters or less, but ends up cutting the execution time pretty well in half for long padding lengths.

@jepler
Copy link
Contributor Author

jepler commented Sep 19, 2025

Since you're set up for benchmarking, maybe you'd see how using groups of 4 works out. It looks like it would be inexpensive in code size to let any character be padded in groups of 4.

@jepler
Copy link
Contributor Author

jepler commented Sep 19, 2025

text size of build-ADAFRUIT_ITSYBITSY_M4_EXPRESS/py/mpprint.o for various alternatives:

1924       - original version
1997 (+73) - implemented separators
1969 (+45) - space economized version 

The space economized version is https://github.com/micropython/micropython/compare/master...jepler:leading-zeroes-alternate?expand=1 and would need to be squashed up. It uses the hard coded patterns for zeroes+grouping and a synthesized 4 byte fill for everything else, getting rid of some of the static array data.

The new padding patterns for commas-and-zeroes and underscores-and-
zeroes are smooshed together into the existing pad_zeroes to save
space.

Only the two combinations of (decimal + commas) and
(other bases + underscores) are properly supported.

Add a test for it.

Closes micropython#18082

Signed-off-by: Jeff Epler <jepler@unpythonic.net>
Signed-off-by: Jeff Epler <jepler@unpythonic.net>
@AJMansfield
Copy link
Contributor

AJMansfield commented Sep 19, 2025

Since you're set up for benchmarking, maybe you'd see how using groups of 4 works out. It looks like it would be inexpensive in code size to let any character be padded in groups of 4.

TBH a similar thought occurred to me as well. The real expensive bit of overhead here isn't the actual raw 1-byte-at-a-time data copying (instead of machine-word size transfers etc) --- it's that each print call involves dispatching a function pointer (and a whole lot more extra bookkeeping).

It might be just as much of a speedup --- and possibly a code-size reduction --- to drop the extra .rodata and the conditionals to use it and just fill a new buffer on the stack every time.

(Also another optimization that might be worthwhile is to see if there's a way to get the compiler to speculatively devirtualize some of those calls...)

@AJMansfield
Copy link
Contributor

AJMansfield commented Sep 21, 2025

I've spent some time playing around with this and was able to confirm my theory --- using a 20 byte padding buffer on the stack (=lcm(4,5), so it divides neatly into the stride length for both the underscore and comma cases), I'm able to drop about 80ms off the benchmark times for the short-padding cases, while still getting the ~2x speedup over the original "one character at a time" base-case strategy for any padding character on the long-padding cases --- while also dropping 32 bytes off the RP2 build size.

master...AJMansfield:micropython:leading-zeros-alt2

The benchmark results are from my Pico2 RP2350 running in Cortex M33 mode at 300 MHz. (Nice thing about using an embedded processor to benchmark is that you get near enough exact run-to-run repeatability.)

Raw Benchmark Results

master

Current master behavior.
branch: v1.26.0-162-g5284e0980

   text    data     bss     dec     hex filename
 309904       0    5020  314924   4ce2c /home/anson/mpy/micropython/ports/rp2/build-RPI_PICO2_M33/firmware.elf
internal_bench/format:
    0.682s (-00.00%) internal_bench/format-1-int.py
    0.709s (+03.86%) internal_bench/format-2.00-int-space-pad1.py
    0.708s (+03.85%) internal_bench/format-2.01-int-space-pad2.py
    0.708s (+03.83%) internal_bench/format-2.02-int-space-pad3.py
    0.708s (+03.83%) internal_bench/format-2.03-int-space-pad5.py
    0.710s (+04.11%) internal_bench/format-2.04-int-space-pad7.py
    0.720s (+05.53%) internal_bench/format-2.05-int-space-pad10.py
    0.706s (+03.49%) internal_bench/format-2.06-int-space-pad15.py
    0.767s (+12.50%) internal_bench/format-2.07-int-space-pad22.py
    0.784s (+14.87%) internal_bench/format-2.08-int-space-pad33.py
    0.811s (+18.86%) internal_bench/format-2.09-int-space-pad47.py
    0.870s (+27.59%) internal_bench/format-2.10-int-space-pad68.py
    0.959s (+40.63%) internal_bench/format-2.11-int-space-pad100.py
    1.079s (+58.17%) internal_bench/format-2.12-int-space-pad150.py
    1.296s (+90.03%) internal_bench/format-2.13-int-space-pad220.py
    1.616s (+136.90%) internal_bench/format-2.14-int-space-pad330.py
    2.048s (+200.21%) internal_bench/format-2.15-int-space-pad470.py
    2.984s (+337.45%) internal_bench/format-2.16-int-space-pad680.py
    4.452s (+552.57%) internal_bench/format-2.17-int-space-pad1000.py
    0.708s (+03.85%) internal_bench/format-3.00-int-unusual-pad1.py
    0.709s (+03.86%) internal_bench/format-3.01-int-unusual-pad2.py
    0.708s (+03.85%) internal_bench/format-3.02-int-unusual-pad3.py
    0.709s (+03.86%) internal_bench/format-3.03-int-unusual-pad5.py
    0.717s (+05.17%) internal_bench/format-3.04-int-unusual-pad7.py
    0.741s (+08.68%) internal_bench/format-3.05-int-unusual-pad10.py
    0.751s (+10.06%) internal_bench/format-3.06-int-unusual-pad15.py
    0.843s (+23.62%) internal_bench/format-3.07-int-unusual-pad22.py
    0.911s (+33.57%) internal_bench/format-3.08-int-unusual-pad33.py
    1.021s (+49.61%) internal_bench/format-3.09-int-unusual-pad47.py
    1.181s (+73.11%) internal_bench/format-3.10-int-unusual-pad68.py
    1.412s (+106.94%) internal_bench/format-3.11-int-unusual-pad100.py
    1.782s (+161.27%) internal_bench/format-3.12-int-unusual-pad150.py
    2.322s (+240.34%) internal_bench/format-3.13-int-unusual-pad220.py
    3.206s (+370.01%) internal_bench/format-3.14-int-unusual-pad330.py
    4.318s (+532.96%) internal_bench/format-3.15-int-unusual-pad470.py
    6.304s (+824.12%) internal_bench/format-3.16-int-unusual-pad680.py
    9.415s (+1280.15%) internal_bench/format-3.17-int-unusual-pad1000.py
    0.716s (+04.93%) internal_bench/format-4.00-int-group-pad1.py
    0.716s (+04.94%) internal_bench/format-4.01-int-group-pad2.py
    0.716s (+04.97%) internal_bench/format-4.02-int-group-pad3.py
    0.716s (+05.03%) internal_bench/format-4.03-int-group-pad5.py
    0.723s (+05.98%) internal_bench/format-4.04-int-group-pad7.py
    0.733s (+07.42%) internal_bench/format-4.05-int-group-pad10.py
    0.719s (+05.37%) internal_bench/format-4.06-int-group-pad15.py
    0.781s (+14.53%) internal_bench/format-4.07-int-group-pad22.py
    0.800s (+17.28%) internal_bench/format-4.08-int-group-pad33.py
    0.831s (+21.79%) internal_bench/format-4.09-int-group-pad47.py
    0.893s (+30.86%) internal_bench/format-4.10-int-group-pad68.py
    0.989s (+44.97%) internal_bench/format-4.11-int-group-pad100.py
    1.121s (+64.29%) internal_bench/format-4.12-int-group-pad150.py
    1.355s (+98.62%) internal_bench/format-4.13-int-group-pad220.py
    1.700s (+149.15%) internal_bench/format-4.14-int-group-pad330.py
    2.161s (+216.75%) internal_bench/format-4.15-int-group-pad470.py
    3.145s (+361.07%) internal_bench/format-4.16-int-group-pad680.py
    4.659s (+582.97%) internal_bench/format-4.17-int-group-pad1000.py
1 tests performed (55 individual testcases)

leading-zeros

Jepler's original version that implements grouping.
branch: v1.26.0-164-gf17e61759

   text    data     bss     dec     hex filename
 309984       0    5020  315004   4ce7c /home/anson/mpy/micropython/ports/rp2/build-RPI_PICO2_M33/firmware.elf
internal_bench/format:
    0.686s (+00.00%) internal_bench/format-1-int.py
    0.710s (+03.62%) internal_bench/format-2.00-int-space-pad1.py
    0.711s (+03.63%) internal_bench/format-2.01-int-space-pad2.py
    0.711s (+03.62%) internal_bench/format-2.02-int-space-pad3.py
    0.710s (+03.51%) internal_bench/format-2.03-int-space-pad5.py
    0.712s (+03.78%) internal_bench/format-2.04-int-space-pad7.py
    0.721s (+05.20%) internal_bench/format-2.05-int-space-pad10.py
    0.715s (+04.34%) internal_bench/format-2.06-int-space-pad15.py
    0.769s (+12.21%) internal_bench/format-2.07-int-space-pad22.py
    0.786s (+14.61%) internal_bench/format-2.08-int-space-pad33.py
    0.816s (+19.07%) internal_bench/format-2.09-int-space-pad47.py
    0.881s (+28.51%) internal_bench/format-2.10-int-space-pad68.py
    0.977s (+42.49%) internal_bench/format-2.11-int-space-pad100.py
    1.109s (+61.74%) internal_bench/format-2.12-int-space-pad150.py
    1.341s (+95.53%) internal_bench/format-2.13-int-space-pad220.py
    1.685s (+145.73%) internal_bench/format-2.14-int-space-pad330.py
    2.148s (+213.25%) internal_bench/format-2.15-int-space-pad470.py
    3.128s (+356.25%) internal_bench/format-2.16-int-space-pad680.py
    4.665s (+580.40%) internal_bench/format-2.17-int-space-pad1000.py
    0.711s (+03.62%) internal_bench/format-3.00-int-unusual-pad1.py
    0.711s (+03.64%) internal_bench/format-3.01-int-unusual-pad2.py
    0.711s (+03.62%) internal_bench/format-3.02-int-unusual-pad3.py
    0.710s (+03.56%) internal_bench/format-3.03-int-unusual-pad5.py
    0.719s (+04.84%) internal_bench/format-3.04-int-unusual-pad7.py
    0.743s (+08.30%) internal_bench/format-3.05-int-unusual-pad10.py
    0.760s (+10.85%) internal_bench/format-3.06-int-unusual-pad15.py
    0.842s (+22.76%) internal_bench/format-3.07-int-unusual-pad22.py
    0.910s (+32.65%) internal_bench/format-3.08-int-unusual-pad33.py
    1.019s (+48.60%) internal_bench/format-3.09-int-unusual-pad47.py
    1.179s (+71.99%) internal_bench/format-3.10-int-unusual-pad68.py
    1.410s (+105.63%) internal_bench/format-3.11-int-unusual-pad100.py
    1.781s (+159.71%) internal_bench/format-3.12-int-unusual-pad150.py
    2.320s (+238.33%) internal_bench/format-3.13-int-unusual-pad220.py
    3.204s (+367.29%) internal_bench/format-3.14-int-unusual-pad330.py
    4.315s (+529.35%) internal_bench/format-3.15-int-unusual-pad470.py
    6.300s (+818.79%) internal_bench/format-3.16-int-unusual-pad680.py
    9.410s (+1272.32%) internal_bench/format-3.17-int-unusual-pad1000.py
    0.719s (+04.90%) internal_bench/format-4.00-int-group-pad1.py
    0.719s (+04.89%) internal_bench/format-4.01-int-group-pad2.py
    0.719s (+04.90%) internal_bench/format-4.02-int-group-pad3.py
    0.720s (+04.99%) internal_bench/format-4.03-int-group-pad5.py
    0.727s (+06.05%) internal_bench/format-4.04-int-group-pad7.py
    0.739s (+07.81%) internal_bench/format-4.05-int-group-pad10.py
    0.740s (+07.97%) internal_bench/format-4.06-int-group-pad15.py
    0.795s (+16.01%) internal_bench/format-4.07-int-group-pad22.py
    0.824s (+20.12%) internal_bench/format-4.08-int-group-pad33.py
    0.887s (+29.31%) internal_bench/format-4.09-int-group-pad47.py
    0.974s (+42.00%) internal_bench/format-4.10-int-group-pad68.py
    1.092s (+59.22%) internal_bench/format-4.11-int-group-pad100.py
    1.284s (+87.31%) internal_bench/format-4.12-int-group-pad150.py
    1.580s (+130.39%) internal_bench/format-4.13-int-group-pad220.py
    2.076s (+202.72%) internal_bench/format-4.14-int-group-pad330.py
    2.694s (+292.84%) internal_bench/format-4.15-int-group-pad470.py
    3.941s (+474.81%) internal_bench/format-4.16-int-group-pad680.py
    5.977s (+771.71%) internal_bench/format-4.17-int-group-pad1000.py
1 tests performed (55 individual testcases)

leading-zeros-alt2

My new version that uses a fixed-size buffer on the stack that's filled at each call.
branch: v1.26.0-165-g1b42623f9

   text    data     bss     dec     hex filename
 309952       0    5020  314972   4ce5c /home/anson/mpy/micropython/ports/rp2/build-RPI_PICO2_M33/firmware.elf
internal_bench/format:
    0.615s (+00.00%) internal_bench/format-1-int.py
    0.625s (+01.52%) internal_bench/format-2.00-int-space-pad1.py
    0.625s (+01.52%) internal_bench/format-2.01-int-space-pad2.py
    0.625s (+01.53%) internal_bench/format-2.02-int-space-pad3.py
    0.633s (+02.88%) internal_bench/format-2.03-int-space-pad5.py
    0.644s (+04.61%) internal_bench/format-2.04-int-space-pad7.py
    0.653s (+06.19%) internal_bench/format-2.05-int-space-pad10.py
    0.647s (+05.18%) internal_bench/format-2.06-int-space-pad15.py
    0.693s (+12.66%) internal_bench/format-2.07-int-space-pad22.py
    0.713s (+15.84%) internal_bench/format-2.08-int-space-pad33.py
    0.757s (+23.05%) internal_bench/format-2.09-int-space-pad47.py
    0.795s (+29.22%) internal_bench/format-2.10-int-space-pad68.py
    0.881s (+43.27%) internal_bench/format-2.11-int-space-pad100.py
    0.995s (+61.80%) internal_bench/format-2.12-int-space-pad150.py
    1.180s (+91.81%) internal_bench/format-2.13-int-space-pad220.py
    1.487s (+141.75%) internal_bench/format-2.14-int-space-pad330.py
    1.874s (+204.54%) internal_bench/format-2.15-int-space-pad470.py
    2.687s (+336.72%) internal_bench/format-2.16-int-space-pad680.py
    3.994s (+549.14%) internal_bench/format-2.17-int-space-pad1000.py
    0.625s (+01.53%) internal_bench/format-3.00-int-unusual-pad1.py
    0.625s (+01.51%) internal_bench/format-3.01-int-unusual-pad2.py
    0.625s (+01.52%) internal_bench/format-3.02-int-unusual-pad3.py
    0.633s (+02.90%) internal_bench/format-3.03-int-unusual-pad5.py
    0.644s (+04.61%) internal_bench/format-3.04-int-unusual-pad7.py
    0.653s (+06.18%) internal_bench/format-3.05-int-unusual-pad10.py
    0.647s (+05.16%) internal_bench/format-3.06-int-unusual-pad15.py
    0.693s (+12.66%) internal_bench/format-3.07-int-unusual-pad22.py
    0.713s (+15.86%) internal_bench/format-3.08-int-unusual-pad33.py
    0.757s (+23.06%) internal_bench/format-3.09-int-unusual-pad47.py
    0.795s (+29.18%) internal_bench/format-3.10-int-unusual-pad68.py
    0.881s (+43.27%) internal_bench/format-3.11-int-unusual-pad100.py
    0.995s (+61.80%) internal_bench/format-3.12-int-unusual-pad150.py
    1.180s (+91.79%) internal_bench/format-3.13-int-unusual-pad220.py
    1.487s (+141.74%) internal_bench/format-3.14-int-unusual-pad330.py
    1.874s (+204.52%) internal_bench/format-3.15-int-unusual-pad470.py
    2.687s (+336.70%) internal_bench/format-3.16-int-unusual-pad680.py
    3.993s (+549.09%) internal_bench/format-3.17-int-unusual-pad1000.py
    0.632s (+02.74%) internal_bench/format-4.00-int-group-pad1.py
    0.632s (+02.74%) internal_bench/format-4.01-int-group-pad2.py
    0.632s (+02.77%) internal_bench/format-4.02-int-group-pad3.py
    0.633s (+02.94%) internal_bench/format-4.03-int-group-pad5.py
    0.656s (+06.56%) internal_bench/format-4.04-int-group-pad7.py
    0.665s (+08.16%) internal_bench/format-4.05-int-group-pad10.py
    0.659s (+07.11%) internal_bench/format-4.06-int-group-pad15.py
    0.705s (+14.56%) internal_bench/format-4.07-int-group-pad22.py
    0.725s (+17.76%) internal_bench/format-4.08-int-group-pad33.py
    0.769s (+24.99%) internal_bench/format-4.09-int-group-pad47.py
    0.808s (+31.39%) internal_bench/format-4.10-int-group-pad68.py
    0.895s (+45.45%) internal_bench/format-4.11-int-group-pad100.py
    1.007s (+63.73%) internal_bench/format-4.12-int-group-pad150.py
    1.193s (+93.98%) internal_bench/format-4.13-int-group-pad220.py
    1.500s (+143.75%) internal_bench/format-4.14-int-group-pad330.py
    1.885s (+206.44%) internal_bench/format-4.15-int-group-pad470.py
    2.758s (+348.33%) internal_bench/format-4.16-int-group-pad680.py
    4.050s (+558.32%) internal_bench/format-4.17-int-group-pad1000.py
1 tests performed (55 individual testcases)

@jepler
Copy link
Contributor Author

jepler commented Sep 22, 2025

That looks like a real promising alternative, especially if it's smaller than the others.

@AJMansfield
Copy link
Contributor

That looks like a real promising alternative, especially if it's smaller than the others.

How to proceed here, then? It feels like there's really two different things here now, and I don't want to steal the grouping feature from you either.

My thought is to PR another version of this that limits its scope just to updating mp_print_strn to use a buffer on the stack, to be evaluated purely on the performance merits without the grouping logic to conflate against it. And then assuming that's accepted, this could then be rebased downstream to add the grouping tests and implement grouping against that new version.

@jepler
Copy link
Contributor Author

jepler commented Sep 26, 2025

If you have a branch that's fixes the bug I was trying to address and is better in other respects, I'm not worried about the git Author or Co-Author credit.

AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Sep 26, 2025
This reworks `mp_print_strn` to use a stack-allocated padding buffer
rather than special-cased hardcoded ROM strings in order to reduce
code size and improve string formatting performance.

Note that this is actually just as performant, even for zeroes and
spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster
for short-padding cases, and 3.4% faster for long-padding cases.

I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but
the results definitely won't generalize to other architectures, and
probably not even to other implementations of the same architecture.
The buffer size of 20 is chosen as the smallest size that easily admits
a later implementation of micropython#18092 to support padding with grouping
characters, to avoid pessimizing the short-padding cases any more than
required.

I've also explored alternatives involving using `alloca` for the padding
buffer, but the conditionals and fallback logic needed to bound stack
usage for the pathological cases end up pessimizing code size beyond
what's reasonable for the very marginal additional speed gains.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Sep 26, 2025
This reworks `mp_print_strn` to use a stack-allocated padding buffer
rather than special-cased hardcoded ROM strings in order to reduce
code size and improve string formatting performance.

Note that this is actually just as performant, even for zeroes and
spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster
for short-padding cases, and 3.4% faster for long-padding cases.

I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but
the results definitely won't generalize to other architectures, and
probably not even to other implementations of the same architecture.
The buffer size of 20 is chosen as the smallest size that easily admits
a later implementation of micropython#18092 to support padding with grouping
characters, to avoid pessimizing the short-padding cases any more than
required.

I've also explored alternatives involving using `alloca` for the padding
buffer, but the conditionals and fallback logic needed to bound stack
usage for the pathological cases end up pessimizing code size beyond
what's reasonable for the very marginal additional speed gains.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
@AJMansfield
Copy link
Contributor

AJMansfield commented Sep 26, 2025

If you have a branch that's fixes the bug I was trying to address and is better in other respects, I'm not worried about the git Author or Co-Author credit.

If it was just about a vanity credit I wouldn't be fussed either lol. To me it's far more about preserving the chain of ideas and keeping the development history as easy to follow as possible for the next guy having to dig through a git blame trace to track down some obscure bug.

And either way --- I still think the case for using a buffer on the stack is strong enough to stand on its own, and more easily defended without the whole factorial space of other micro-optimisations that doing it together with the grouping feature adds.

AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Sep 26, 2025
This reworks `mp_print_strn` to use a stack-allocated padding buffer
rather than special-cased hardcoded ROM strings in order to reduce
code size and improve string formatting performance.

Note that this is actually just as performant, even for zeroes and
spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster
for short-padding cases, and 3.4% faster for long-padding cases.

I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but
the results definitely won't generalize to other architectures, and
probably not even to other implementations of the same architecture.
The buffer size of 20 is chosen as the smallest size that easily admits
a later implementation of micropython#18092 to support padding with grouping
characters, to avoid pessimizing the short-padding cases any more than
required.

I've also explored alternatives involving using `alloca` for the padding
buffer, but the conditionals and fallback logic needed to bound stack
usage for the pathological cases end up pessimizing code size beyond
what's reasonable for the very marginal additional speed gains.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Sep 26, 2025
This reworks `mp_print_strn` to use a stack-allocated padding buffer
rather than special-cased hardcoded ROM strings in order to reduce
code size and improve string formatting performance.

Note that this is actually just as performant, even for zeroes and
spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster
for short-padding cases, and 3.4% faster for long-padding cases.

I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but
the results definitely won't generalize to other architectures, and
probably not even to other implementations of the same architecture.
The buffer size of 20 is chosen as the smallest size that easily admits
a later implementation of micropython#18092 to support padding with grouping
characters, to avoid pessimizing the short-padding cases any more than
required.

I've also explored alternatives involving using `alloca` for the padding
buffer, but the conditionals and fallback logic needed to bound stack
usage for the pathological cases end up pessimizing code size beyond
what's reasonable for the very marginal additional speed gains.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Sep 26, 2025
This reworks `mp_print_strn` to use a stack-allocated padding buffer
rather than special-cased hardcoded ROM strings in order to reduce
code size and improve string formatting performance.

Note that this is actually just as performant, even for zeroes and
spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster
for short-padding cases, and 3.4% faster for long-padding cases.

I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but
the results definitely won't generalize to other architectures, and
probably not even to other implementations of the same architecture.
The buffer size of 20 is chosen as the smallest size that easily admits
a later implementation of micropython#18092 to support padding with grouping
characters, to avoid pessimizing the short-padding cases any more than
required.

I've also explored alternatives involving using `alloca` for the padding
buffer, but the conditionals and fallback logic needed to bound stack
usage for the pathological cases end up pessimizing code size beyond
what's reasonable for the very marginal additional speed gains.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Sep 26, 2025
This reworks `mp_print_strn` to use a stack-allocated padding buffer
rather than special-cased hardcoded ROM strings in order to reduce
code size and improve string formatting performance.

Note that this is actually just as performant, even for zeroes and
spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster
for short-padding cases, and 3.4% faster for long-padding cases.

I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but
the results definitely won't generalize to other architectures, and
probably not even to other implementations of the same architecture.
The buffer size of 20 is chosen as the smallest size that easily admits
a later implementation of micropython#18092 to support padding with grouping
characters, to avoid pessimizing the short-padding cases any more than
required.

I've also explored alternatives involving using `alloca` for the padding
buffer, but the conditionals and fallback logic needed to bound stack
usage for the pathological cases end up pessimizing code size beyond
what's reasonable for the very marginal additional speed gains.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
AJMansfield added a commit to AJMansfield/micropython that referenced this pull request Sep 26, 2025
This reworks `mp_print_strn` to use a stack-allocated padding buffer
rather than special-cased hardcoded ROM strings in order to reduce
code size and improve string formatting performance.

Note that this is actually just as performant, even for zeroes and
spaces! On my RP2350 Cortex M33 hardware, spaces are about 1% faster
for short-padding cases, and 3.4% faster for long-padding cases.

I've done some cursory tests for alternate values of `PAD_BUF_SIZE`, but
the results definitely won't generalize to other architectures, and
probably not even to other implementations of the same architecture.
The buffer size of 20 is chosen as the smallest size that easily admits
a later implementation of micropython#18092 to support padding with grouping
characters, to avoid pessimizing the short-padding cases any more than
required.

I've also explored alternatives involving using `alloca` for the padding
buffer, but the conditionals and fallback logic needed to bound stack
usage for the pathological cases end up pessimizing code size beyond
what's reasonable for the very marginal additional speed gains.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
@jepler
Copy link
Contributor Author

jepler commented Sep 28, 2025

I can probably "rebuild" this atop your branch if that's how it ends up happening.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

micropython incorrectly omits grouping character in leading zeros
4 participants