Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: RegexpMatchEasy0_1K slower at tip #17456

Closed
0xmohit opened this issue Oct 15, 2016 · 10 comments
Closed

regexp: RegexpMatchEasy0_1K slower at tip #17456

0xmohit opened this issue Oct 15, 2016 · 10 comments
Assignees
Milestone

Comments

@0xmohit
Copy link
Contributor

@0xmohit 0xmohit commented Oct 15, 2016

What did you do?

Execute go1 benchmarks.

What did you see?

RegexpMatchEasy0_1K is much slower (at 86b2f29) as compared to 1.7.1

benchmark                          old ns/op     new ns/op     delta
BenchmarkRegexpMatchEasy0_32-4     102           100           -1.96%
BenchmarkRegexpMatchEasy0_1K-4     249           1016          +308.03%
BenchmarkRegexpMatchEasy1_32-4     105           99.3          -5.43%
BenchmarkRegexpMatchEasy1_1K-4     437           426           -2.52%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkRegexpMatchEasy0_32-4     310.92       319.40       1.03x
BenchmarkRegexpMatchEasy0_1K-4     4110.44      1007.33      0.25x
BenchmarkRegexpMatchEasy1_32-4     304.17       322.33       1.06x
BenchmarkRegexpMatchEasy1_1K-4     2342.11      2398.62      1.02x

System details

go version devel +86b2f29 Sat Oct 15 07:13:17 2016 +0000 linux/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/home/mohit/tmp/t/go"
GOTOOLDIR="/home/mohit/tmp/t/go/pkg/tool/linux_amd64"
TERM="dumb"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build951092435=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
uname -sr: Linux 4.4.0-38-generic
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:    16.04
Codename:   xenial
/lib/x86_64-linux-gnu/libc.so.6: GNU C Library (Ubuntu GLIBC 2.23-0ubuntu3) stable release version 2.23, by Roland McGrath et al.
gdb --version: GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
@bradfitz bradfitz added this to the Go1.8Maybe milestone Oct 16, 2016
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Oct 16, 2016

That looks odd. Are you sure that's not a fluke?

Can you try again with many runs (use -count=20 on both), and then paste the output of https://godoc.org/rsc.io/benchstat ?

@0xmohit
Copy link
Contributor Author

@0xmohit 0xmohit commented Oct 16, 2016

That looks odd. Are you sure that's not a fluke?

It did seem odd to me too initially. But git log --grep=MatchEasy also suggests that the specified benchmark is slower. As an example, please refer to the stats in commit messages for c03ed49 and dd24b10.

Following are the stats with -count=20:

name                   old time/op    new time/op    delta
RegexpMatchEasy0_32-4     108ns ± 8%     102ns ±10%    -5.53%  (p=0.000 n=20+19)
RegexpMatchEasy0_1K-4     248ns ± 1%    1023ns ± 1%  +312.02%  (p=0.000 n=18+18)
RegexpMatchEasy1_32-4     103ns ± 1%      99ns ± 4%    -3.70%  (p=0.000 n=16+18)
RegexpMatchEasy1_1K-4     461ns ±18%     434ns ± 3%    -5.83%  (p=0.000 n=18+18)

name                   old speed      new speed      delta
RegexpMatchEasy0_32-4   295MB/s ± 8%   313MB/s ± 9%    +5.92%  (p=0.000 n=20+19)
RegexpMatchEasy0_1K-4  4.12GB/s ± 1%  1.00GB/s ± 1%   -75.69%  (p=0.000 n=18+18)
RegexpMatchEasy1_32-4   310MB/s ± 1%   324MB/s ± 5%    +4.28%  (p=0.000 n=16+18)
RegexpMatchEasy1_1K-4  2.19GB/s ±16%  2.36GB/s ± 3%    +7.57%  (p=0.000 n=20+18)
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Oct 16, 2016

Thanks. I'll let @randall77 decide the importance of this regression for Go 1.8.

@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 17, 2016

Looks like this is related to the change in how bytes.Index works. @TocarIP

When looking for a smallish multicharacter string (in this case, 26 characters), we used to look for the first character, then check the remaining characters, then repeat. Now we use fun SSE (AVX?) instructions to look for multiple characters at once. The new way will definitely be faster when the first character matches but the rest don't (it is not slowed down by partial matches). The old way looks like it is faster when there are few matches of the first character. I'm surprised about how much faster it is, however.

It is possible that we should do some combination algorithm, looking for just the first character and switching once we get a few false positives (partial but not complete matches).

All that said, the regexp is still doing >1GB/sec. I think this may be an instance of "something blazingly fast got a bit less blazingly fast".

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Oct 17, 2016

Thanks for the investigation.

Fast or not, 3x slower doesn't seem like "a bit less".

@TocarIP
Copy link
Contributor

@TocarIP TocarIP commented Oct 17, 2016

I'll experiment with dynamically switching between indexbyte and substring search.

@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 17, 2016

I was thinking we could, in each of those assembly loops, skip ahead by 16 if we compare 16 bytes against the first byte (copied 16 times) and find no match.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 17, 2016

It does seem like we should fix this one way or the other.
The belief is that the kind of check in RegexpMatchEasy0_1K
is common enough that it shouldn't get 3x slower.

@rsc rsc modified the milestones: Go1.8, Go1.8Maybe Oct 17, 2016
@valyala
Copy link
Contributor

@valyala valyala commented Oct 19, 2016

Just FYI, RegexpMatchEasy0_1K is the main loser in go1 benchmarks when comparing go tip to go1.7:

BinaryTree17-4              2.88s ± 3%      2.81s ± 2%    -2.41%    (p=0.001 n=9+9)
Fannkuch11-4                3.10s ± 1%      3.03s ± 2%    -2.50%   (p=0.000 n=9+10)
FmtFprintfEmpty-4          54.7ns ± 7%     49.5ns ± 6%    -9.48%  (p=0.000 n=10+10)
FmtFprintfString-4          152ns ± 6%      140ns ±10%    -7.69%  (p=0.006 n=10+10)
FmtFprintfInt-4             142ns ± 8%      120ns ± 5%   -15.33%  (p=0.000 n=10+10)
FmtFprintfIntInt-4          218ns ± 9%      182ns ± 7%   -16.45%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4     210ns ± 8%      189ns ± 7%    -9.78%   (p=0.000 n=10+9)
FmtFprintfFloat-4           308ns ±11%      285ns ± 8%    -7.69%  (p=0.003 n=10+10)
FmtManyArgs-4               877ns ± 5%      809ns ± 1%    -7.74%    (p=0.000 n=9+9)
GobDecode-4                8.13ms ± 4%     7.23ms ± 3%   -11.03%  (p=0.000 n=10+10)
GobEncode-4                6.66ms ± 1%     6.57ms ±10%      ~      (p=0.515 n=8+10)
Gzip-4                      248ms ± 0%      249ms ± 1%      ~       (p=1.000 n=7+8)
Gunzip-4                   38.4ms ± 6%     38.8ms ±10%      ~     (p=0.739 n=10+10)
HTTPClientServer-4         57.5µs ± 7%     62.6µs ± 7%    +8.93%   (p=0.000 n=9+10)
JSONEncode-4               18.8ms ± 4%     18.8ms ± 4%      ~      (p=0.780 n=9+10)
JSONDecode-4               53.0ms ± 4%     54.7ms ± 9%      ~     (p=0.218 n=10+10)
Mandelbrot200-4            4.85ms ± 1%     4.79ms ± 1%    -1.20%  (p=0.002 n=10+10)
GoParse-4                  3.61ms ± 5%     3.47ms ± 2%    -3.76%   (p=0.000 n=10+8)
RegexpMatchEasy0_32-4      86.6ns ± 4%     86.4ns ± 5%      ~      (p=0.765 n=10+9)
RegexpMatchEasy0_1K-4       206ns ± 5%      859ns ± 3%  +317.14%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-4      87.5ns ± 4%     80.3ns ± 3%    -8.25%  (p=0.000 n=10+10)
RegexpMatchEasy1_1K-4       389ns ± 6%      359ns ± 2%    -7.59%  (p=0.000 n=10+10)
RegexpMatchMedium_32-4      137ns ± 4%      126ns ± 3%    -7.97%    (p=0.000 n=9+9)
RegexpMatchMedium_1K-4     41.8µs ± 3%     39.6µs ± 5%    -5.24%   (p=0.000 n=8+10)
RegexpMatchHard_32-4       1.93µs ± 0%     1.96µs ± 2%    +1.37%   (p=0.011 n=9+10)
RegexpMatchHard_1K-4       59.6µs ± 7%     59.5µs ± 5%      ~      (p=0.497 n=10+9)
Revcomp-4                   520ms ± 7%      461ms ± 5%   -11.36%    (p=0.000 n=9+9)
Template-4                 62.7ms ± 3%     55.4ms ± 1%   -11.58%   (p=0.000 n=9+10)
TimeParse-4                 338ns ±11%      308ns ± 2%    -8.74%   (p=0.000 n=10+9)
TimeFormat-4                350ns ± 0%      324ns ± 1%    -7.40%  (p=0.000 n=10+10)
@quentinmit quentinmit added the NeedsFix label Oct 21, 2016
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 21, 2016

CL https://golang.org/cl/31690 mentions this issue.

@gopherbot gopherbot closed this in f31492f Nov 1, 2016
@golang golang locked and limited conversation to collaborators Nov 1, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.