endDocument is called even after pause/resume used #28

Merged
merged 2 commits into from Sep 11, 2012

Projects

None yet

2 participants

@alexlatchford

With the current code after an interrupt is sent, i.e. a pause() is called the interrupt flag is never reset to false at any point so endDocument can never be called. I'm sure this could be improved further but it works for the scenario I need it to work so enjoy :)

Alex Latchford added some commits Sep 10, 2012
Alex Latchford Now endDocument is still called even when using pause & resumes. cbfd608
Alex Latchford Speed up parsing considerably when dealing with large strings, basica…
…lly only make it search the length of the needle at the start of the haystack instead of the rest of it.
048070b
@alexlatchford

This now also includes a major speed boost to the parser. It essentially rewrites the indexOf function so make it into something similar to a startswith function only you have to specify the start too..

The problem with the old _parse function was that as the length of the haystack, (i.e. the XML document to parse), increased and the needle wasn't found then it kept looking all the way until the end of the string which was very inefficient..

By adding this function I've sped up my program by a thousand times I'd estimate, I am reading in a 20mb file non-chunked correctly but in the near future the same parser will need to deal with upwards of 160mb so you may some more improvements. Currently it can parse that and put it into a mongodb in just under an hour we're estimating which is good enough for this project but it wouldn't be possible without this second improvement :)

@robrighter robrighter merged commit 3038a07 into robrighter:master Sep 11, 2012
@robrighter
Owner

Thanks, I will get this pushed out into npm in the next day or so..... sorry for the delay.

Thanks again!
rob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment