nsqd: optimize the performance of httpServer's doPub #1423

guozhao-coder · 2022-09-01T05:39:16Z

When a dense request is encountered and a single message is large, the use of ioutil.ReadAll will have an impact on the performance, causing a certain degree of memory leak. I have encountered serious cases, resulting in OOM. Because ioutil.ReadAll is to read out the information at one time. A large amount of information will also cause the expansion of the slice and affect the performance. At the same time, it will cause memory escape and increase the burden of GC.

Using io.Copy() avoids reading out messages at one time, and using pool improves memory utilization and reduce the burden of GC.

guozhao-coder · 2022-09-01T09:10:04Z

Here are some data before and after optimization.

Concurrent quantity: 20
Pressure test time: 20second
Size of each message: 1Mb

before optimization
Pressure test data

Process data. It can be seen that after the test, a lot of memory is occupied. If the pressure test is continued, it will be cause OOM in my environment.

after optimization
Pressure test data. It can be seen that the performance has been improved.

Process data. It can be seen that the memory consumption is much smaller.

guozhao-coder · 2022-09-02T00:09:07Z

ready for review

ploxiln · 2022-09-05T07:27:01Z

nsqd/http.go

 	if err != nil {
 		return nil, http_api.Err{500, "INTERNAL_ERROR"}
 	}
+
+	body := bodyBuffer.Bytes()


I think there is an issue with lifetimes.

The bodyBuffer is put back to the pool at the end of the function. This body is a byte slice which is a reference into the buffer owned by the bytes.Buffer, which is reused when the bytes.Buffer is reset and reused. Below, NewMessage() stores the byte slice (reference) in the Message. That message is put to the Topic, where it it can be retained in the memoryMsgChan for some time (usually a short amount but not always), and when the message is copied to the Channels the body byte slice still references the original buffer owned by the bytes.Buffer. The body contents are never actually copied (unless to/from the "backend disk-queue").

Thank you for your guidance. I ignored this point when I used it.

Later, I tried to use a global bufferPool for optimization. When reading request body applies for a buffer and associates it with the message, the buffer is released to the pool after the consumer sends the FIN or when the diskqueue is used, but the effect of the stress test is not ideal and increases the complexity of the code.

I just thought about it. There are still some problems with my above design. Because there may be multiple channels and a message may be consumed several times, it is impossible to determine whether to recycle the buffer through the FIN.

I have no idea

mreiferson · 2022-09-25T18:40:50Z

With respect to the "dynamically resizing the slice", I suppose in cases where Content-Length is specified (i.e. not Chunked) we can do much better than ioutil.ReadAll (io.ReadAll) with respect to pre-allocating the right size slice?

guozhao-coder · 2022-09-26T01:49:29Z

That's a good idea, thanks

Use io.Copy instead of ioutil.ReadAll to improve performance

guozhao-coder · 2022-09-26T06:04:18Z

I benchmarked ioutil.ReadAll, io.Copy, and pre-make respectively. The results show that ioCopy is the most efficient.

After escape analysis, it is found that the premake method will cause variable escape, and the memory cost is the largest, which may be the cause of low performance.

The following is the process of benchmark.

package main

import (
	"bytes"
	"io"
	"io/ioutil"
	"strings"
	"testing"
)

var str1 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
var str2 = str1 + str1 + str1
var str3 = str2 + str2 + str2
var str4 = str3 + str3 + str3

var strReader1 = strings.NewReader(str1)
var strReader2 = strings.NewReader(str2)
var strReader3 = strings.NewReader(str3)
var strReader4 = strings.NewReader(str4)

func BenchmarkIOReadAll(b *testing.B) {
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		b, _ := ioutil.ReadAll(strReader2)
		handleBytes(b)
	}
}

func BenchmarkPreMake(b *testing.B) {
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		c := make([]byte, len(str2))
		_, _ = strReader2.Read(c)
		handleBytes(c)
	}
}

func BenchmarkIOCopy(b *testing.B) {
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		buf := bytes.Buffer{}
		_, _ = io.Copy(&buf, strReader2)
		handleBytes(buf.Bytes())
	}
}

result

goos: linux
goarch: amd64
pkg: gotest/bufferpoll
cpu: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
BenchmarkIOReadAll
BenchmarkIOReadAll-32    	 7866291	       145.8 ns/op	     512 B/op	       1 allocs/op
BenchmarkPreMake
BenchmarkPreMake-32      	 5291023	       216.2 ns/op	     320 B/op	       1 allocs/op
BenchmarkIOCopy
BenchmarkIOCopy-32       	17156047	        74.78 ns/op	      48 B/op	       1 allocs/op
PASS
ok  	gotest/bufferpoll	4.040s

ploxiln · 2022-10-01T01:43:38Z

hmm I'm skeptical of how io.Copy() into a new bytes.Buffer{} without pre-allocated size could be better. I think that this benchmark may not be representative of a real workload because the copied messages can be garbage-collected immediately, whereas in nsqd they need to live on for an indeterminate amount of time and be accessed by other goroutines. (we can't see the implementation of handleBytes() ...)

jehiah added the perf label Sep 2, 2022

ploxiln reviewed Sep 5, 2022

View reviewed changes

guozhao-coder closed this Sep 6, 2022

mreiferson changed the title ~~nsqd:optimize the performance of httpServer's doPub~~ nsqd: optimize the performance of httpServer's doPub Sep 25, 2022

guozhao-coder reopened this Sep 26, 2022

guozhao-coder marked this pull request as draft September 26, 2022 01:33

nsqd: optimize the performance of httpServer's doPub

c4bd328

Use io.Copy instead of ioutil.ReadAll to improve performance

guozhao-coder force-pushed the perf-nsqd-httpServer branch from b5276b6 to c4bd328 Compare September 26, 2022 05:49

guozhao-coder marked this pull request as ready for review September 26, 2022 06:04

guozhao-coder closed this Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsqd: optimize the performance of httpServer's doPub #1423

nsqd: optimize the performance of httpServer's doPub #1423

guozhao-coder commented Sep 1, 2022

guozhao-coder commented Sep 1, 2022

guozhao-coder commented Sep 2, 2022

ploxiln Sep 5, 2022

guozhao-coder Sep 5, 2022

guozhao-coder Sep 5, 2022

mreiferson commented Sep 25, 2022

guozhao-coder commented Sep 26, 2022

guozhao-coder commented Sep 26, 2022

ploxiln commented Oct 1, 2022

nsqd: optimize the performance of httpServer's doPub #1423

nsqd: optimize the performance of httpServer's doPub #1423

Conversation

guozhao-coder commented Sep 1, 2022

guozhao-coder commented Sep 1, 2022

guozhao-coder commented Sep 2, 2022

ploxiln Sep 5, 2022

Choose a reason for hiding this comment

guozhao-coder Sep 5, 2022

Choose a reason for hiding this comment

guozhao-coder Sep 5, 2022

Choose a reason for hiding this comment

mreiferson commented Sep 25, 2022

guozhao-coder commented Sep 26, 2022

guozhao-coder commented Sep 26, 2022

ploxiln commented Oct 1, 2022