We often run all benchmarks of std packages with -count=5 or higher, it usually takes 5-6 hours to finish on our x86 and arm64 servers, the overall turnaround time might be reduced a lot if we can skip the auto adjustment from round #2 and reuse the iteration number computed before.
With an experimental change I got the following numbers:
old new
package math/big 2865.780s 2242.640s
package runtime 5936.460s 4926.073s
package encodng/json 486.691s 321.370s
I realized same attempt was made before but then dropped due to benchmark regressions reported in #25622, a change to computing the 'grain' value of RunParallel proposed in #37996 shall avoid the regression. Furthermore this feature could be guarded by an option if necessary.