-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/url: optimize unescape and escape #17860
Comments
Based on the profiling data that you mention, it looks like the |
Regarding the allocations and |
See also: #18990 |
I'm not sure it's the kind of optimization you want, but I find that if I do some "hand optimization", just eliminating branches, I get pretty significant speed increases: const ESCAPE_ME = "net/url: optimize unescape and escape"
var s string
func BenchmarkPathEscape(b *testing.B) {
for i := 0; i < b.N; i++ {
s = url.PathEscape(ESCAPE_ME)
}
}
func shouldPathEscape(c byte) bool {
if 'A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || '0' <= c && c <= '9' {
return false
}
switch c {
case '-', '_', '.', '~', ':', '@', '&', '=', '+', '$':
return false
}
return true
}
func MyPathEscape(s string) string {
hexCount := 0
for i := 0; i < len(s); i++ {
if shouldPathEscape(s[i]) {
hexCount++
}
}
if hexCount == 0 {
return s
}
t := make([]byte, len(s)+2*hexCount)
j := 0
for i := 0; i < len(s); i++ {
c := s[i]
if shouldPathEscape(c) {
t[j] = '%'
t[j+1] = "0123456789ABCDEF"[c>>4]
t[j+2] = "0123456789ABCDEF"[c&15]
j += 3
} else {
t[j] = c
j++
}
}
return string(t)
}
func BenchmarkMyPathEscape(b *testing.B) {
for i := 0; i < b.N; i++ {
s = MyPathEscape(ESCAPE_ME)
}
}
It seems to me like calls to functions like func PathEscape(s string) string {
return escape(s, encodePathSegment) // not inlined, but probably should be
} |
I also experimented with some optimizations (mostly unescpae) but have not cleaned them up in a CL yet. So i am interested what we can improve too. Maybe i still get around to finish them for 1.10. I was thinking of creating a table [256]uint8 where byte contains some flags. e.g. shouldPathEscape (if we need more flags this can become uint32) For unhex i prefer using @ccbrown
if this is not already recognized by the compiler to become a setCC instruction with an add on AMD6 4 instead of a branch we should have a look at optimizing that. (maybe there is even an add that just takes the flag without setCC) Does your new version pass all the tests? |
Tried a few more things. Using a boolean LUT does make things faster: func MyBoolLUTEscape(s string, shouldEscapeLUT [256]bool) string {
hexCount := 0
for i := 0; i < len(s); i++ {
if shouldEscapeLUT[s[i]] {
hexCount++
}
}
if hexCount == 0 {
return s
}
t := make([]byte, len(s)+2*hexCount)
j := 0
for i := 0; i < len(s); i++ {
c := s[i]
if shouldEscapeLUT[c] {
t[j] = '%'
t[j+1] = "0123456789ABCDEF"[c>>4]
t[j+2] = "0123456789ABCDEF"[c&15]
j += 3
} else {
t[j] = c
j++
}
}
return string(t)
}
Compacting all of the different encodings into a single func MyLUTEscape(s string, mode encoding) string {
hexCount := 0
mask := byte(1 << byte(mode))
for i := 0; i < len(s); i++ {
if shouldEscapeLUT[s[i]]&mask != 0 {
hexCount++
}
}
if hexCount == 0 {
return s
}
t := make([]byte, len(s)+2*hexCount)
j := 0
for i := 0; i < len(s); i++ {
c := s[i]
if shouldEscapeLUT[s[i]]&mask != 0 {
t[j] = '%'
t[j+1] = "0123456789ABCDEF"[c>>4]
t[j+2] = "0123456789ABCDEF"[c&15]
j += 3
} else {
t[j] = c
j++
}
}
return string(t)
}
Note while these functions can be re-used by all of the encodings, I'm still excluding the
Tried both. Neither really made a measurable difference for me.
That function actually got inlined for me. So it didn't use any conditional instructions like that there. But in the LUT version, maybe that sort of thing could work here:
I'm not really gonna try to optimize this at such a low level yet though. |
Change https://golang.org/cl/134296 mentions this issue: |
Use a 64 byte array to avoid an allocation on the assumption that most url escaping is performed on short strings. Also adds a fast path for escaping strings whose only replacements are spaces which is common in query components. Adds benchmarks for QueryEscape, PathEscape, QueryUnescape and PathUnescape but no optimizations are include for the unescape functions so I don't include those benchmark results here. Reduces allocations by 10% in the existing String benchmark with a modest performance increase. name old time/op new time/op delta QueryEscape/#00-8 64.6ns ± 1% 43.8ns ± 0% -32.14% (p=0.000 n=9+9) QueryEscape/#1-8 276ns ± 3% 249ns ± 0% -9.62% (p=0.000 n=10+7) QueryEscape/#2-8 176ns ± 2% 155ns ± 3% -12.21% (p=0.000 n=10+10) QueryEscape/#3-8 388ns ± 1% 362ns ± 0% -6.55% (p=0.000 n=10+8) QueryEscape/#4-8 2.32µs ± 2% 2.27µs ± 2% -2.26% (p=0.001 n=10+10) PathEscape/#00-8 78.0ns ± 3% 63.4ns ± 1% -18.69% (p=0.000 n=10+10) PathEscape/#1-8 276ns ± 2% 260ns ± 0% -6.01% (p=0.000 n=10+10) PathEscape/#2-8 175ns ± 0% 153ns ± 0% -12.53% (p=0.000 n=8+10) PathEscape/#3-8 389ns ± 2% 361ns ± 0% -7.21% (p=0.000 n=10+9) PathEscape/#4-8 2.30µs ± 2% 2.27µs ± 1% -1.33% (p=0.001 n=9+10) String-8 3.56µs ± 4% 3.42µs ± 7% -4.00% (p=0.003 n=10+10) name old alloc/op new alloc/op delta QueryEscape/#00-8 16.0B ± 0% 8.0B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#1-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#2-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#3-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#4-8 832B ± 0% 832B ± 0% ~ (all equal) PathEscape/#00-8 32.0B ± 0% 16.0B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#1-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#2-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#3-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#4-8 704B ± 0% 704B ± 0% ~ (all equal) String-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.000 n=10+10) name old allocs/op new allocs/op delta QueryEscape/#00-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#1-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#2-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#3-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#4-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) PathEscape/#00-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#1-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#2-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#3-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#4-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String-8 69.0 ± 0% 61.0 ± 0% -11.59% (p=0.000 n=10+10) Updates #17860 Change-Id: I45c5e9d40b242f874c61f6ccc73bf94c494bb868 Reviewed-on: https://go-review.googlesource.com/134296 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
While working on 134296 I saw that the benchmark for the String() method still shows about 60 allocations being made. This is because we don't pre-size the strings.Builder and also call escape at least 5 times for different url components. I think this could be optimized by pre-sizing the builder with the known lengths (scheme, opaque, rawquery) and passing the builder to a new escape function that writes to a builder. The existing escape function could be modified to create a builder and call the new function or the existing callers of escape could do it (QueryEscape, PathEscape etc.) I can work on this if there is interest in this approach. |
@iand, sounds reasonable. |
After spending several more minutes examining the String benchmark I realised that the reported timings and allocations are the cumulative totals of all 20 test cases. I investigated modifying the various escape functions write to a shared strings.Builder but couldn't reduce the allocations and timings by enough to justify the extra complexity. Also looked at pre-sizing the builder. It's easy enough to calculate the capacity needed for the fixed components and estimate the length of most of the encoded parts, but path is little more tricky. Performing these calculations adds timing overhead which makes the function slower overall, even with the one or two allocations it saves. I found I could get just as good a benefit by simply growing the builder to a fixed 64 bytes. Some timings for variants on this approach are below. However, it's likely that I'm just fitting to the test cases. I'm happy to send a CL for a one line change setting the initial builder size to 64 bytes, but I'm not convinced it adds much value. Current master:
Pre-sized builder.
|
Change https://golang.org/cl/174998 mentions this issue: |
What is the state of this issue? Looks like it's stalled after 1.13 freeze, but no other problems are mentioned. Kindly ping @iand |
I write a new version of The detail is here: https://www.cnblogs.com/ahfuzhang/p/17473178.html |
Change https://go.dev/cl/514335 mentions this issue: |
According to a profiling data from a large number of servers at Google, URL related functions are among the top 150 functions by CPU time. The simplest optimizations I see are around using LUTs in
shouldEscape
,isHex
, andunHex
.The text was updated successfully, but these errors were encountered: