New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: optimize the performance of httpServer's doPub #1423
Conversation
ready for review |
nsqd/http.go
Outdated
if err != nil { | ||
return nil, http_api.Err{500, "INTERNAL_ERROR"} | ||
} | ||
|
||
body := bodyBuffer.Bytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an issue with lifetimes.
The bodyBuffer
is put back to the pool at the end of the function. This body
is a byte slice which is a reference into the buffer owned by the bytes.Buffer
, which is reused when the bytes.Buffer
is reset and reused. Below, NewMessage()
stores the byte slice (reference) in the Message. That message is put to the Topic, where it it can be retained in the memoryMsgChan for some time (usually a short amount but not always), and when the message is copied to the Channels the body byte slice still references the original buffer owned by the bytes.Buffer. The body contents are never actually copied (unless to/from the "backend disk-queue").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your guidance. I ignored this point when I used it.
Later, I tried to use a global bufferPool for optimization. When reading request body applies for a buffer and associates it with the message, the buffer is released to the pool after the consumer sends the FIN or when the diskqueue is used, but the effect of the stress test is not ideal and increases the complexity of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought about it. There are still some problems with my above design. Because there may be multiple channels and a message may be consumed several times, it is impossible to determine whether to recycle the buffer through the FIN.
I have no idea
With respect to the "dynamically resizing the slice", I suppose in cases where |
That's a good idea, thanks |
Use io.Copy instead of ioutil.ReadAll to improve performance
b5276b6
to
c4bd328
Compare
I benchmarked ioutil.ReadAll, io.Copy, and pre-make respectively. The results show that ioCopy is the most efficient. After escape analysis, it is found that the premake method will cause variable escape, and the memory cost is the largest, which may be the cause of low performance. The following is the process of benchmark. package main
import (
"bytes"
"io"
"io/ioutil"
"strings"
"testing"
)
var str1 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
var str2 = str1 + str1 + str1
var str3 = str2 + str2 + str2
var str4 = str3 + str3 + str3
var strReader1 = strings.NewReader(str1)
var strReader2 = strings.NewReader(str2)
var strReader3 = strings.NewReader(str3)
var strReader4 = strings.NewReader(str4)
func BenchmarkIOReadAll(b *testing.B) {
b.ResetTimer()
for n := 0; n < b.N; n++ {
b, _ := ioutil.ReadAll(strReader2)
handleBytes(b)
}
}
func BenchmarkPreMake(b *testing.B) {
b.ResetTimer()
for n := 0; n < b.N; n++ {
c := make([]byte, len(str2))
_, _ = strReader2.Read(c)
handleBytes(c)
}
}
func BenchmarkIOCopy(b *testing.B) {
b.ResetTimer()
for n := 0; n < b.N; n++ {
buf := bytes.Buffer{}
_, _ = io.Copy(&buf, strReader2)
handleBytes(buf.Bytes())
}
} result goos: linux
goarch: amd64
pkg: gotest/bufferpoll
cpu: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
BenchmarkIOReadAll
BenchmarkIOReadAll-32 7866291 145.8 ns/op 512 B/op 1 allocs/op
BenchmarkPreMake
BenchmarkPreMake-32 5291023 216.2 ns/op 320 B/op 1 allocs/op
BenchmarkIOCopy
BenchmarkIOCopy-32 17156047 74.78 ns/op 48 B/op 1 allocs/op
PASS
ok gotest/bufferpoll 4.040s |
hmm I'm skeptical of how |
When a dense request is encountered and a single message is large, the use of ioutil.ReadAll will have an impact on the performance, causing a certain degree of memory leak. I have encountered serious cases, resulting in OOM. Because ioutil.ReadAll is to read out the information at one time. A large amount of information will also cause the expansion of the slice and affect the performance. At the same time, it will cause memory escape and increase the burden of GC.
Using io.Copy() avoids reading out messages at one time, and using pool improves memory utilization and reduce the burden of GC.