Skip to content

Latest commit

 

History

History
97 lines (71 loc) · 5.69 KB

README.md

File metadata and controls

97 lines (71 loc) · 5.69 KB

1brc-go

Parse and aggregate 1B rows of text very quickly using Go for the 1BRC challenge.

Mainly done as an excuse to play around with pprof.

NOTE: Originally developed at elh/1brc-go and copied into gunnarmorling/1brc.


Performance

UPDATE: with hyperfine parameter testing, got a 20% latency reduction by changing the read buffer size via PARSE_CHUNK_SIZE_MB.

make evaluate
# Benchmark 1: ./bin/1brc-go 2>&1
#   Time (mean ± σ):      4.665 s ±  0.028 s    [User: 39.196 s, System: 1.816 s]
#   Range (min … max):    4.634 s …  4.695 s    5 runs

Original Jan 14 testing [Outdated]

On Jan 14, as measured on my 2023 Apple M2 Pro Macbook with 1brc's evaluate.sh script, this performs very competitively against the top Java and Go submissions. At the very least this means, I have put in some effort optimizing this for the machine I have in front of me. There is a pretty funny lesson to be learned here that everyone's solution is the fastest on their own machine :)

Props to AlexanderYastrebov who is pushing on enabling language agnostic solutions in the leaderboard. Very curious now to see where this stacks up if I can run it on the test environment. I haven't looked at other Go solution code yet, but the discussion his started would motivate me to write my own.

# Result (m:s.ms) Implementation JDK Submitter Notes
1 00:06.876 link 🔷 Go 1.21.5 Eugene Huang Mine! 👈
2 00:07.602 link 🔷 Go 1.21.5 Corlin Palmer Go implementation
3 00:13.765 link 21.0.1-graal Roy van Rijn GraalVM native binary
00:13.989 link 21.0.1-graal Artsiom Korzun
00:14.044 link 21.0.1-graal Thomas Wuerthinger GraalVM native binary
00:14.464 link 🔷 Go 1.21.5 Alexander Yastrebov Go implementation
00:14.839 link 21.0.1-graal Marko Topolnik
00:14.949 link 21.0.1-open merykittyunsafe
00:16.075 link 🔷 Go 1.21.5 Jan-Otto Kröpke Go implementation

Usage

# builds the go executable and runs the following test
make run

# Result:
# ./bin/1brc-go  37.90s user 3.98s system 590% cpu 7.089 total
diff measurements.out <(time ./bin/1brc-go)

# takes an optional single command line arg for the measurements file path. If not provided, defaults to `measurements.txt`
# PROFILE is optional and enables profiling if "true"
go run main.go

Add a gitignore-d measurements.txt file in the project root or use command line arg to override the path.


Log

Times measured on 2023 Apple M2 Pro Macbook with 10 cores and 16GB RAM.

  • 140s - Naive implementation. Reading lines with Scanner.Text.
  • 83s - Using Scanner.ReadSlice to reduce a massive string allocation.
  • 54s - Using file.Read directly and avoided some bytes helper functions to iterate bytes in a single scan.
  • 45s - Using a far larger buffer to reduce IO latency. Removed a redundant map lookup. Using unsafe.String to avoid additional string allocs.
  • 35s - Implemented a custom, simplified replacement for strconv.ParseFloat.
    • Most of the time just goes into scanning byte slices. Biggest other bottleneck is map[string]*Stats lookups. Decently pleased with single threaded optimizations so now making it concurrent.
  • 7s - Concurrently read and parse the file as byte index addressed chunks, then merge results in a single final goroutine. See tunable numParsers and parseChunkSize parameters.
  • 5.3s - Select concurrency and read buffer size parameters for my machine using hyperfine parameter testing.
  • 5.3s - Profile-guided optimization makes some compilation changes but does not affect runtime performance.
  • 4.7s - Change concurrency and read buffer size parameters again. 14 goroutines (for 10 CPU cores) and 16MB parser chunk size.

Ideas

  • Optimize concurrency.
    • Pipeline more work.
    • Tune concurrency and read buffer size parameters. I picked current values for my machine very quickly and unscientifically. Picked w/ hyperfine
  • Custom map/set implementation for string->float
    • try dolthub/swiss?
  • Memory arena (or other optimizations of locality)?
  • Faster way to iterate bytes?

Reading



Copyright © 2024 Eugene Huang