Parse and aggregate 1B rows of text very quickly using Go for the 1BRC challenge.
Mainly done as an excuse to play around with pprof
.
NOTE: Originally developed at elh/1brc-go and copied into gunnarmorling/1brc.
UPDATE: with hyperfine parameter testing, got a 20% latency reduction by changing the read buffer size via PARSE_CHUNK_SIZE_MB
.
make evaluate
# Benchmark 1: ./bin/1brc-go 2>&1
# Time (mean ± σ): 4.665 s ± 0.028 s [User: 39.196 s, System: 1.816 s]
# Range (min … max): 4.634 s … 4.695 s 5 runs
On Jan 14, as measured on my 2023 Apple M2 Pro Macbook with 1brc's evaluate.sh
script, this performs very competitively against the top Java and Go submissions. At the very least this means, I have put in some effort optimizing this for the machine I have in front of me. There is a pretty funny lesson to be learned here that everyone's solution is the fastest on their own machine :)
Props to AlexanderYastrebov who is pushing on enabling language agnostic solutions in the leaderboard. Very curious now to see where this stacks up if I can run it on the test environment. I haven't looked at other Go solution code yet, but the discussion his started would motivate me to write my own.
# | Result (m:s.ms) | Implementation | JDK | Submitter | Notes |
---|---|---|---|---|---|
1 | 00:06.876 | link | 🔷 Go 1.21.5 | Eugene Huang | Mine! 👈 |
2 | 00:07.602 | link | 🔷 Go 1.21.5 | Corlin Palmer | Go implementation |
3 | 00:13.765 | link | 21.0.1-graal | Roy van Rijn | GraalVM native binary |
00:13.989 | link | 21.0.1-graal | Artsiom Korzun | ||
00:14.044 | link | 21.0.1-graal | Thomas Wuerthinger | GraalVM native binary | |
00:14.464 | link | 🔷 Go 1.21.5 | Alexander Yastrebov | Go implementation | |
00:14.839 | link | 21.0.1-graal | Marko Topolnik | ||
00:14.949 | link | 21.0.1-open | merykittyunsafe | ||
00:16.075 | link | 🔷 Go 1.21.5 | Jan-Otto Kröpke | Go implementation |
# builds the go executable and runs the following test
make run
# Result:
# ./bin/1brc-go 37.90s user 3.98s system 590% cpu 7.089 total
diff measurements.out <(time ./bin/1brc-go)
# takes an optional single command line arg for the measurements file path. If not provided, defaults to `measurements.txt`
# PROFILE is optional and enables profiling if "true"
go run main.go
Add a gitignore-d measurements.txt
file in the project root or use command line arg to override the path.
Times measured on 2023 Apple M2 Pro Macbook with 10 cores and 16GB RAM.
- 140s - Naive implementation. Reading lines with
Scanner.Text
. - 83s - Using
Scanner.ReadSlice
to reduce a massive string allocation. - 54s - Using
file.Read
directly and avoided somebytes
helper functions to iterate bytes in a single scan. - 45s - Using a far larger buffer to reduce IO latency. Removed a redundant map lookup. Using
unsafe.String
to avoid additional string allocs. - 35s - Implemented a custom, simplified replacement for
strconv.ParseFloat
.- Most of the time just goes into scanning byte slices. Biggest other bottleneck is
map[string]*Stats
lookups. Decently pleased with single threaded optimizations so now making it concurrent.
- Most of the time just goes into scanning byte slices. Biggest other bottleneck is
- 7s - Concurrently read and parse the file as byte index addressed chunks, then merge results in a single final goroutine. See tunable
numParsers
andparseChunkSize
parameters. - 5.3s - Select concurrency and read buffer size parameters for my machine using hyperfine parameter testing.
- 5.3s - Profile-guided optimization makes some compilation changes but does not affect runtime performance.
- 4.7s - Change concurrency and read buffer size parameters again. 14 goroutines (for 10 CPU cores) and 16MB parser chunk size.
- Optimize concurrency.
- Pipeline more work.
Tune concurrency and read buffer size parameters. I picked current values for my machine very quickly and unscientifically.Picked w/ hyperfine
- Custom map/set implementation for string->float
- try dolthub/swiss?
- Memory arena (or other optimizations of locality)?
- Faster way to iterate bytes?
- https://github.com/DataDog/go-profiler-notes/blob/main/guide/README.md#cpu-profiler ⭐️
- https://pkg.go.dev/bufio
- https://pkg.go.dev/unsafe#String
- https://go.dev/doc/pgo
- https://go.dev/src/arena/arena.go
Copyright © 2024 Eugene Huang