-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/csv: do not use bufio.Writer in csv.Writer #33486
Comments
Can you post a small reproduction? The allocation of the buffer in |
No, not creating lots of csv files.
running
running
The steps are roughly as below:
The csv.Writer is used at the third step, for each record, we use a new csv.Writer to encode the record to csv string, and encode the string to customed message, and write the message to the user. Why not use csv.Writer for each csv file instead of for each record?
And currently, we use a global sync.Pool of bufio.Writer to prevent csv.Writer from allocating a new buffer. something like this: var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
var bufioWriterPool = sync.Pool{
New: func() interface{} {
// ioutil.Discard is just used to create the writer. Actual destination
// writer is set later by Reset() before using it.
return bufio.NewWriter(ioutil.Discard)
},
}
bufioWriter := bufioWriterPool.Get().(*bufio.Writer)
buf := bufPool.Get().(*bytes.Buffer)
bufioWriter.Reset(buf)
w := csv.NewWriter(bufioWriter)
w.Write(record)
w.Flush()
str := buf.String()
// encode this str to customed message, and send it to the user
bufioWriterPool.Put(bufioWriter)
bufPool.Put(buf) Using this way, we avoid the performance problem, but we still think, the go stdlib should use raw io.Writer, leave the choice of bufio or not to the user. |
@dsnet if this is a change that needs fixing, can you add the NeedsFix label? Thanks. |
It's a usage pattern that is certainly not what the |
@yaozongyou could you please provide a minimal benchmark that demonstrates the slowness here? |
here is a minimal benchmark to demonstrate the slowness: package main
import (
"bufio"
"bytes"
"encoding/csv"
"io/ioutil"
"sync"
"testing"
)
func BenchmarkCSV_1(b *testing.B) {
b.ResetTimer()
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
var buf bytes.Buffer
w := csv.NewWriter(&buf)
w.Write([]string{"Hello", "World"})
w.Flush()
// buf.String() will be "Hello,World\n"
}
})
}
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
var bufioWriterPool = sync.Pool{
New: func() interface{} {
// ioutil.Discard is just used to create the writer. Actual destination
// writer is set later by Reset() before using it.
return bufio.NewWriter(ioutil.Discard)
},
}
func BenchmarkCSV_2(b *testing.B) {
b.ResetTimer()
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
buf := bufPool.Get().(*bytes.Buffer)
buf.Reset()
// Use bufio Writer to prevent csv.Writer from allocating a new buffer.
bufioWriter := bufioWriterPool.Get().(*bufio.Writer)
bufioWriter.Reset(buf)
w := csv.NewWriter(bufioWriter)
w.Write([]string{"Hello", "World"})
w.Flush()
// buf.String() will be "Hello,World\n"
bufPool.Put(buf)
bufioWriterPool.Put(bufioWriter)
}
})
} the output result in my test environment: $ go test -cpu 10 -benchtime 100000x -bench .
goos: linux
goarch: amd64
BenchmarkCSV_1-10 100000 1311 ns/op 4272 B/op 4 allocs/op
BenchmarkCSV_2-10 100000 90.9 ns/op 0 B/op 0 allocs/op
PASS
ok _/home/richardyao/xxx 0.146s From the result, BenchmarkCSV_1 has 4 allocs for each op, and BenchmarkCSV_2 has no allocs for each op. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Consider this situation, in my program, down to the lowest call, i need to convert my
Record
to csv string, so I usecsv.NewWriter
and pass it with bytes.Buffer, and i Write my record, and Flush, at last, i got the csv string from bytes.Buffer. Everything is fine and great for now. But, after benching, I found my program running very slowly, and the bufio used in csv.Writer is the performance killer. Because for each record, bufio.defaultBufSize length of slice is maked, and it is not easy to prevent from this slice allocing.What did you expect to see?
Leave the choice of using bufio or not to the user.
in stdlib, use raw io.Writer, and if user want to use bufio. just wrapp it like this:
The text was updated successfully, but these errors were encountered: