You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The original Detab function uses a regex (_deTab) to convert tabs to an
appropriate amount of spaces. Looking at this approach, it seems that a lot
of strings are generated rather unnecessary, and a lot of overhead is
involved.
Preliminary performance benchmark was:
input string length: 475
6000 iterations in 4691 ms (0,781833333333333 ms per iteration)
input string length: 2356
1500 iterations in 4489 ms (2,99266666666667 ms per iteration)
input string length: 27737
150 iterations in 5159 ms (34,3933333333333 ms per iteration)
input string length: 11075
300 iterations in 4506 ms (15,02 ms per iteration)
input string length: 88607
35 iterations in 4268 ms (121,942857142857 ms per iteration)
input string length: 354431
10 iterations in 4857 ms (485,7 ms per iteration)
By doing this fairly crucial (many input lines match _deTab at least once)
part "manually" - see attached diff file for exact code - an about 10%
performance gain can be achieved:
input string length: 475
6000 iterations in 4187 ms (0,697833333333333 ms per iteration)
input string length: 2356
1500 iterations in 4011 ms (2,674 ms per iteration)
input string length: 27737
150 iterations in 4464 ms (29,76 ms per iteration)
input string length: 11075
300 iterations in 4087 ms (13,6233333333333 ms per iteration)
input string length: 88607
35 iterations in 3922 ms (112,057142857143 ms per iteration)
input string length: 354431
10 iterations in 4451 ms (445,1 ms per iteration)
Also, it is now fairly obvious that all '\t' are removed from the input
stream at this point, therefore all "[ \t]" (or similar) parts in the
regular expressions can have their tab-matching stripped out. The savings
in this (apart from readability) vary pretty big with respect to the input:
input string length: 475
6000 iterations in 4114 ms (0,685666666666667 ms per iteration)
input string length: 2356
1500 iterations in 3826 ms (2,55066666666667 ms per iteration)
input string length: 27737
150 iterations in 4239 ms (28,26 ms per iteration)
input string length: 11075
300 iterations in 3884 ms (12,9466666666667 ms per iteration)
input string length: 88607
35 iterations in 3719 ms (106,257142857143 ms per iteration)
input string length: 354431
10 iterations in 4264 ms (426,4 ms per iteration)
All unit tests pass the same after these modifications.
Original issue reported on code.google.com by Shio...@gmail.com on 10 Jan 2010 at 1:43
Actually, I just noticed that "CodeBlockEvaluator" calls Detab again. Since no
tabs are
inserted anywhere, this precaution is unnecessary and should be removable. This
does
not have any measurable impact on the Benchmark results on my machine, however.
Original comment by Shio...@gmail.com on 10 Jan 2010 at 2:30
I was thinking that we needed to do this, as PHPMarkdown does the same. Makes
WAY
more sense.
Excellent and very helpful contribution, checked in as r96
(and yes, I removed all the /t from regex and the codeblockevaluator call as
well --
we are TAB-FREE!)
Original comment by wump...@gmail.com on 11 Jan 2010 at 2:29
Original issue reported on code.google.com by
Shio...@gmail.com
on 10 Jan 2010 at 1:43Attachments:
The text was updated successfully, but these errors were encountered: