Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input normalization #23

Closed
GoogleCodeExporter opened this issue Dec 29, 2015 · 6 comments
Closed

input normalization #23

GoogleCodeExporter opened this issue Dec 29, 2015 · 6 comments

Comments

@GoogleCodeExporter
Copy link

Attached diff includes code for combining newline normalization and Detab 
function into a function "NormalizeInput". It also ensures that the input 
ends in at least two newlines.

As expected, this change yields performance, although not a lot:

Benchmark before changes:
input string length: 475
7000 iterations in 4816 ms (0,688 ms per iteration)
input string length: 2356
2000 iterations in 5040 ms (2,52 ms per iteration)
input string length: 27737
180 iterations in 5131 ms (28,5055555555556 ms per iteration)
input string length: 11075
300 iterations in 3951 ms (13,17 ms per iteration)
input string length: 88607
40 iterations in 4288 ms (107,2 ms per iteration)
input string length: 354431
10 iterations in 4260 ms (426 ms per iteration)

Benchmark after changes:
input string length: 475
7000 iterations in 4688 ms (0,669714285714286 ms per iteration)
input string length: 2356
2000 iterations in 4968 ms (2,484 ms per iteration)
input string length: 27737
180 iterations in 4953 ms (27,5166666666667 ms per iteration)
input string length: 11075
300 iterations in 3840 ms (12,8 ms per iteration)
input string length: 88607
40 iterations in 4226 ms (105,65 ms per iteration)
input string length: 354431
10 iterations in 4243 ms (424,3 ms per iteration)

Original issue reported on code.google.com by Shio...@gmail.com on 11 Jan 2010 at 11:21

Attachments:

@GoogleCodeExporter
Copy link
Author

excellent, checked in as r100 -- there is another opportunity to combine the 
_blankLines regex with this Normalize() routine as well, I think. But you'll 
have to 
use a line-oriented approach instead of the chunked way you're adding stuff to 
the 
stringbuilder now

Original comment by wump...@gmail.com on 12 Jan 2010 at 3:57

@GoogleCodeExporter
Copy link
Author

Ate the _blankLines regex as well.

Well, the chunked way did indeed get a bit cumbersome with all the cases in 
which 
stuff is added to the stringbuilder now ;)

Performance did improve, but not a lot:

Performance before changes:
input string length: 475
4000 iterations in 2642 ms (0,6605 ms per iteration)
input string length: 2356
1000 iterations in 2480 ms (2,48 ms per iteration)
input string length: 27737
100 iterations in 2740 ms (27,4 ms per iteration)
input string length: 11075
200 iterations in 2535 ms (12,675 ms per iteration)
input string length: 88607
30 iterations in 3090 ms (103 ms per iteration)
input string length: 354431
10 iterations in 4095 ms (409,5 ms per iteration)

Performance after changes:
input string length: 475
4000 iterations in 2597 ms (0,64925 ms per iteration)
input string length: 2356
1000 iterations in 2460 ms (2,46 ms per iteration)
input string length: 27737
100 iterations in 2709 ms (27,09 ms per iteration)
input string length: 11075
200 iterations in 2526 ms (12,63 ms per iteration)
input string length: 88607
30 iterations in 3056 ms (101,866666666667 ms per iteration)
input string length: 354431
10 iterations in 4044 ms (404,4 ms per iteration)


P.S.: Would you consider upping the default number of executions of especially 
the 
last 3 tests? They are done rather quickly otherwise ;)

Original comment by Shio...@gmail.com on 12 Jan 2010 at 1:06

Attachments:

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

checked in as r102

I have to manually incorporate your .diff files by hand because they are not in 
a
format that Tortoise understands.. **I am not sure I got it right this time, 
can you
check?**

As for the benchmark, the last 3 benchmark calls are to measure the cost of 
calling
the whole thing once. The previous 3 benchmark calls are loops of many 
thousands. We
need both.

Original comment by wump...@gmail.com on 12 Jan 2010 at 10:56

@GoogleCodeExporter
Copy link
Author

Normalize looks right to me.

Original comment by Shio...@gmail.com on 13 Jan 2010 at 12:34

@GoogleCodeExporter
Copy link
Author

ok, very good -- closing this as fixed them. Thanks again for the contribution!

Original comment by wump...@gmail.com on 13 Jan 2010 at 12:44

  • Changed state: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant