Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: optimize the performance of httpServer's doPub #1423

Closed

Conversation

guozhao-coder
Copy link

When a dense request is encountered and a single message is large, the use of ioutil.ReadAll will have an impact on the performance, causing a certain degree of memory leak. I have encountered serious cases, resulting in OOM. Because ioutil.ReadAll is to read out the information at one time. A large amount of information will also cause the expansion of the slice and affect the performance. At the same time, it will cause memory escape and increase the burden of GC.

Using io.Copy() avoids reading out messages at one time, and using pool improves memory utilization and reduce the burden of GC.

@guozhao-coder
Copy link
Author

Here are some data before and after optimization.

Concurrent quantity: 20
Pressure test time: 20second
Size of each message: 1Mb

before optimization
Pressure test data
image
Process data. It can be seen that after the test, a lot of memory is occupied. If the pressure test is continued, it will be cause OOM in my environment.
image

after optimization
Pressure test data. It can be seen that the performance has been improved.
image
Process data. It can be seen that the memory consumption is much smaller.
image

@guozhao-coder
Copy link
Author

ready for review

@jehiah jehiah added the perf label Sep 2, 2022
nsqd/http.go Outdated
if err != nil {
return nil, http_api.Err{500, "INTERNAL_ERROR"}
}

body := bodyBuffer.Bytes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an issue with lifetimes.

The bodyBuffer is put back to the pool at the end of the function. This body is a byte slice which is a reference into the buffer owned by the bytes.Buffer, which is reused when the bytes.Buffer is reset and reused. Below, NewMessage() stores the byte slice (reference) in the Message. That message is put to the Topic, where it it can be retained in the memoryMsgChan for some time (usually a short amount but not always), and when the message is copied to the Channels the body byte slice still references the original buffer owned by the bytes.Buffer. The body contents are never actually copied (unless to/from the "backend disk-queue").

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your guidance. I ignored this point when I used it.

Later, I tried to use a global bufferPool for optimization. When reading request body applies for a buffer and associates it with the message, the buffer is released to the pool after the consumer sends the FIN or when the diskqueue is used, but the effect of the stress test is not ideal and increases the complexity of the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought about it. There are still some problems with my above design. Because there may be multiple channels and a message may be consumed several times, it is impossible to determine whether to recycle the buffer through the FIN.

I have no idea

@mreiferson
Copy link
Member

With respect to the "dynamically resizing the slice", I suppose in cases where Content-Length is specified (i.e. not Chunked) we can do much better than ioutil.ReadAll (io.ReadAll) with respect to pre-allocating the right size slice?

@mreiferson mreiferson changed the title nsqd:optimize the performance of httpServer's doPub nsqd: optimize the performance of httpServer's doPub Sep 25, 2022
@guozhao-coder guozhao-coder reopened this Sep 26, 2022
@guozhao-coder guozhao-coder marked this pull request as draft September 26, 2022 01:33
@guozhao-coder
Copy link
Author

That's a good idea, thanks

Use io.Copy instead of ioutil.ReadAll to improve performance
@guozhao-coder
Copy link
Author

I benchmarked ioutil.ReadAll, io.Copy, and pre-make respectively. The results show that ioCopy is the most efficient.

After escape analysis, it is found that the premake method will cause variable escape, and the memory cost is the largest, which may be the cause of low performance.

The following is the process of benchmark.

package main

import (
	"bytes"
	"io"
	"io/ioutil"
	"strings"
	"testing"
)

var str1 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
var str2 = str1 + str1 + str1
var str3 = str2 + str2 + str2
var str4 = str3 + str3 + str3

var strReader1 = strings.NewReader(str1)
var strReader2 = strings.NewReader(str2)
var strReader3 = strings.NewReader(str3)
var strReader4 = strings.NewReader(str4)

func BenchmarkIOReadAll(b *testing.B) {
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		b, _ := ioutil.ReadAll(strReader2)
		handleBytes(b)
	}
}

func BenchmarkPreMake(b *testing.B) {
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		c := make([]byte, len(str2))
		_, _ = strReader2.Read(c)
		handleBytes(c)
	}
}

func BenchmarkIOCopy(b *testing.B) {
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		buf := bytes.Buffer{}
		_, _ = io.Copy(&buf, strReader2)
		handleBytes(buf.Bytes())
	}
}

result

goos: linux
goarch: amd64
pkg: gotest/bufferpoll
cpu: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
BenchmarkIOReadAll
BenchmarkIOReadAll-32    	 7866291	       145.8 ns/op	     512 B/op	       1 allocs/op
BenchmarkPreMake
BenchmarkPreMake-32      	 5291023	       216.2 ns/op	     320 B/op	       1 allocs/op
BenchmarkIOCopy
BenchmarkIOCopy-32       	17156047	        74.78 ns/op	      48 B/op	       1 allocs/op
PASS
ok  	gotest/bufferpoll	4.040s

@guozhao-coder guozhao-coder marked this pull request as ready for review September 26, 2022 06:04
@ploxiln
Copy link
Member

ploxiln commented Oct 1, 2022

hmm I'm skeptical of how io.Copy() into a new bytes.Buffer{} without pre-allocated size could be better. I think that this benchmark may not be representative of a real workload because the copied messages can be garbage-collected immediately, whereas in nsqd they need to live on for an indeterminate amount of time and be accessed by other goroutines. (we can't see the implementation of handleBytes() ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants