Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove regex replaces. #12

Closed
wants to merge 1 commit into from
Closed

Remove regex replaces. #12

wants to merge 1 commit into from

Conversation

artem-cliqz
Copy link

Problem: the library spends too much time locking regexes, according
to the discussion[1]

"If the lock there is dominating performance, then it means the regexp
is super trivial. In that case you will get an even bigger speedup by
just writing some code instead of doing a regexp match."

So, this commit replaces them with traversing the string by each byte
O(n) and replacing them.

[1] golang/go#8232

Problem: the library spends too much time locking regexes, according
to the discussion[1]

"If the lock there is dominating performance, then it means the regexp
is super trivial.  In that case you will get an even bigger speedup by
just writing some code instead of doing a regexp match."

So, this commit replaces them with traversing the string by each byte
O(n) and replacing them.

[1] golang/go#8232
spacingRe = regexp.MustCompile(`[ \r\n\t]+`)
newlineRe = regexp.MustCompile(`\n\n+`)
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually not be so radical as to replace the REs at all. They seem to be not too CPU intensive. But as far as the solution works...

res = append(res, string(current))
}
return strings.Join(res, " ")
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider using strings.FieldsFunc (used in strings.Fields) as far as we split the string.

@artem-cliqz artem-cliqz closed this Feb 8, 2017
@jaytaylor
Copy link
Owner

@artem-cliqz Greetings, I appreciate your interest!

Did you perform any benchmarking? Would love to see the numbers and if we can get a benefit by avoiding the RE's.

Best,
Jay

@artem-cliqz
Copy link
Author

@jaytaylor Sorry, I made upstream PR by mistake, it was meant to be within our fork. If it proves useful, we make another PR. Thanks.

@jaytaylor
Copy link
Owner

@artem-cliqz Gotcha, thanks for the info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants