Given a program with a large string addition
var x = 1 + 1 + 1 + ... 1
gofmt takes time quadratic in the size of the expression to print it back out. Note that this is an integer addition, not a string addition: the problem does not involve string concatenation.
go get -u rsc.io/tmp/bigprog
bigprog gofmt
prints on my system:
1000 int 0.035s strbal 0.008s strbigbal 0.030s str 0.026s strbig 0.047s
2000 int 0.071s strbal 0.012s strbigbal 0.051s str 0.079s strbig 0.115s
4000 int 0.291s strbal 0.022s strbigbal 0.092s str 0.271s strbig 0.362s
8000 int 1.049s strbal 0.032s strbigbal 0.173s str 1.092s strbig 1.249s
16000 int 4.332s strbal 0.070s strbigbal 0.346s str 4.338s strbig 4.852s
32000 int 18.897s strbal 0.112s strbigbal 0.662s str 18.212s strbig 18.889s
64000 int 82.845s strbal 0.224s strbigbal 1.292s str 84.138s strbig 83.718s
128000 int 382.780s strbal 0.425s strbigbal 2.588s str 381.781s strbig 392.788s
See #23222 for full description of this table, but the important part here is that the int, str, and strbig columns are addition chains like above, while the faster strbal and strbigbal columns have parentheses added to make the addition parse trees balanced. The balanced times double nicely as the input size doubles, so there is no problem with actually generating large outputs. In contrast the unbalanced inputs quadruple as input size doubles, a clear quadratic slowdown. The obvious guess is that it is in the code that decides how to format expressions, and probably in the code that decides whether to insert spaces around expressions. (If so, it's my fault and I apologize.)
This is a fairly minor issue, but it would make gofmt 4X faster even on concatenations of size 1000, which are plausible in generated code (I started this after finding one of size 729).
/cc @griesemer
Given a program with a large string addition
gofmt takes time quadratic in the size of the expression to print it back out. Note that this is an integer addition, not a string addition: the problem does not involve string concatenation.
prints on my system:
See #23222 for full description of this table, but the important part here is that the int, str, and strbig columns are addition chains like above, while the faster strbal and strbigbal columns have parentheses added to make the addition parse trees balanced. The balanced times double nicely as the input size doubles, so there is no problem with actually generating large outputs. In contrast the unbalanced inputs quadruple as input size doubles, a clear quadratic slowdown. The obvious guess is that it is in the code that decides how to format expressions, and probably in the code that decides whether to insert spaces around expressions. (If so, it's my fault and I apologize.)
This is a fairly minor issue, but it would make gofmt 4X faster even on concatenations of size 1000, which are plausible in generated code (I started this after finding one of size 729).
/cc @griesemer