Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: improve performance #19629

Closed
dongweigogo opened this issue Mar 21, 2017 · 5 comments
Closed

regexp: improve performance #19629

dongweigogo opened this issue Mar 21, 2017 · 5 comments

Comments

@dongweigogo
Copy link

@dongweigogo dongweigogo commented Mar 21, 2017

go 1.8, linux, amd64

Recently, I use regular expression a lot in golang. From the benchmarksgame, I see the regexp of go is very slow, about 5x-10x than many other languages.
After I made some modifications by removing unnecessary channels and allocations in the source code, the performance is improved about 1/4, but is still slower than even many dynamic languages.

from pprof:
(pprof) top 10
68.33s of 68.49s total (99.77%)
Dropped 31 nodes (cum <= 0.34s)
Showing top 10 nodes out of 17 (cum >= 63.59s)
flat flat% sum% cum cum%
32.46s 47.39% 47.39% 37.17s 54.27% regexp.(*machine).add
18.58s 27.13% 74.52% 27.05s 39.49% regexp.(*machine).step
7.06s 10.31% 84.83% 68.28s 99.69% regexp.(*machine).match
4.77s 6.96% 91.79% 4.77s 6.96% runtime.memmove
1.86s 2.72% 94.51% 1.86s 2.72% regexp/syntax.(*Inst).MatchRunePos
1.72s 2.51% 97.02% 1.72s 2.51% regexp.(*inputBytes).step
1.48s 2.16% 99.18% 1.48s 2.16% regexp/syntax.EmptyOpContext
0.39s 0.57% 99.75% 2.25s 3.29% regexp/syntax.(*Inst).MatchRune
0.01s 0.015% 99.77% 4.84s 7.07% regexp.(*Regexp).replaceAll
0 0% 99.77% 63.59s 92.85% main.countMatches

It seems that there are many type assertion in regexp, am I right? And is there any solution for this?

@ALTree
Copy link
Member

@ALTree ALTree commented Mar 21, 2017

The benchmark game is, as the name suggest, a game. A game that is won by tuning a specific program until it performs good on the specific input used in the benchmark and/or, in this case, using highly-optimized regexp libraries.

The page you linked, for example, shows that C++ is slower than PHP, Hack, Node.js, and Typescript.

Is it, though?

This issue is, IMHO, a little too broad to be actionable. "Make the regexp engine faster" well, yes, everyone likes fast regexp engines. We have a few (a little more specific) issues about that.

@igouy
Copy link

@igouy igouy commented Mar 21, 2017

@ALTree > The benchmark game is, as the name suggest, a game.
The name "benchmarks game" signifies nothing more than the fact that programmers contribute programs that compete (but try to remain comparable) for fun not money.

shows that C++ is slower than PHP, Hack, Node.js, and Typescript.

No. Shows a C++ program slower than PHP, Hack, Node.js and Typescript programs.

Is it, though?

Yes.

@dongweigogo

regex-redux is a new task, based on regexdna.

Try re-writing some of the old Go regex-dna programs to perform the new substitutions.

@dongweigogo
Copy link
Author

@dongweigogo dongweigogo commented Mar 21, 2017

@igouy ,
yes, the original code seems written a long time ago, and I've made some improvements.
After I tried various regular expressions and self-made input data to avoid biases, the conclusion is still the same that in all cases, the go regexp is beaten by many other languages (I tried python and rust. For python, I'm guessing it is because most of the part in python is implemented in C).
This bothers me! -_-!

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 21, 2017

As you say, in a language like Python, the regexp engine is implemented in C. So you are comparing a somewhat tuned Go implementation with a highly tuned C implementation. You also need to consider the characteristics of the engine. Go has chosen to follow the re2 path (not surprising, since Russ Cox is a major author of both Go and re2). re2 has much better performance characteristics than some other regexp engines, in that it never has an exponential slowdown, but that comes at a cost for other regexps (https://swtch.com/~rsc/regexp/). Even then, the C++ re2 implementation has many optimizations that are not in the Go implementation. See, for example, #11646.

So: can regexp be sped up? Yes. But it will take work. I'm not sure whether to bother leaving this issue open or not.

@ianlancetaylor ianlancetaylor changed the title can the regexp performance be improved in golang? regexp: improve performance Mar 21, 2017
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 21, 2017

I'm going to close this in favor of concrete optimization bugs like #11646.

@bradfitz bradfitz closed this Mar 21, 2017
@golang golang locked and limited conversation to collaborators Mar 21, 2018
@dsnet dsnet added the Performance label Aug 13, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.