1️⃣🐝🏎️ The One Billion Row Challenge

The fast-er Go variant of 1brc!

Because I also don't have access to the Hetzner Instance, here are measurements done locally (MBP M1 2020 16GB) on the generated 1B lines:

Program	Time
Java thomaswue (TOP)	15.350s
Go elh	7.571s
Go AlexanderYastrebov	17.966s
Go shraddhaag	18.809s
Go lunemec	7.128s
Go benhoyt	5.034s

I'm not including the latest variant by Renato Pereira because even though it was faster (6.674s), it did not produce valid output on my machine (after copy+paste from the blog, I could not find repo). Also it did use swissmap, which is against the rules (deps here are strictly tests).

Can we beat them? 🥁

Here are the results!

 λ hyperfine -w 2 'target/lunemec measurements.txt'
Benchmark 1: target/lunemec measurements.txt
  Time (mean ± σ):      7.128 s ±  0.019 s    [User: 47.592 s, System: 2.149 s]
  Range (min … max):    7.098 s …  7.163 s    10 runs

vs

 λ hyperfine -w 2 'target/elh measurements.txt'
Benchmark 1: target/elh measurements.txt
  Time (mean ± σ):      7.571 s ±  0.052 s    [User: 45.990 s, System: 2.245 s]
  Range (min … max):    7.498 s …  7.650 s    10 runs

Whole 3.5% faster! It ain't much, but its honest work!

However, there is new contender on the block:

 λ hyperfine -w 2 'target/benhoyt measurements.txt'
Benchmark 1: target/benhoyt measurements.txt
  Time (mean ± σ):      5.034 s ±  0.638 s    [User: 25.078 s, System: 2.786 s]
  Range (min … max):    4.548 s …  6.752 s    10 runs

### 10k unique station names

 λ hyperfine -w 2 'target/lunemec measurements3.txt'
Benchmark 1: target/lunemec measurements3.txt
  Time (mean ± σ):     14.423 s ±  0.083 s    [User: 101.053 s, System: 2.987 s]
  Range (min … max):   14.355 s … 14.584 s    10 runs

vs

 λ hyperfine -w 2 'target/elh measurements3.txt'
Benchmark 1: target/elh measurements3.txt
  Time (mean ± σ):     12.746 s ±  0.043 s    [User: 84.343 s, System: 2.981 s]
  Range (min … max):   12.672 s … 12.801 s    10 runs

and the fastest is:

 λ hyperfine -w 2 'target/benhoyt measurements3.txt'
Benchmark 1: target/benhoyt measurements3.txt
  Time (mean ± σ):      9.091 s ±  0.037 s    [User: 52.046 s, System: 4.699 s]
  Range (min … max):    9.038 s …  9.135 s    10 runs

Details

There are few differences in my approach, where some time is saved:

Custom hash function that does 2 bytes at once while not having too many collisions.
Custom hashmap implementation that allows me to hash only 1x and use the hashed position.
Reading only \n from the data chunk using bytes.IndexByte, which uses optimised assembly instructions and is very fast, then finding ; using byte index offset.
Unrolling the measurement by hand for all 4 variants.

There are some other general approaches that make it fast overall, but they were used in other implementations already:

Measurements (-99.9) are parsed into int16 (10x multiply), and transformed to float only 1x for the final print.
Using *stats in the hashmap so the values can be updated in-place.
Reading chunks of 20MiB per worker goroutine, mmap is 2-3x slower.
Using unsafe.String to avoid extra copies of the station name.

Measuring

By compiling the source code, and using hyperfine benchmarking tool.

Or by running the test benchmark harness like this:

func BenchmarkRun(b *testing.B) {
	bench = true
	for range b.N {
		run(defaultMeasurementsFile)
	}
}

This allows me to run each test 20x, and use benchstat to compare the versions.

go test -count 20 -run="^$" -bench "^BenchmarkRun$" . > full_lunemec.txt

Original 1BRC description snippet

The text file contains temperature values for a range of weather stations. Each row is one measurement in the format <string: station name>;<double: measurement>, with the measurement value having exactly one fractional digit.

The following shows ten rows as an example:

Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John's;15.2
Cracow;12.6
Bridgetown;26.9
Istanbul;6.2
Roseau;34.4
Conakry;31.2
Istanbul;23.0

The task is to write a program which reads the file, calculates the min, mean, and max temperature value per weather station, and emits the results on stdout like this (i.e. sorted alphabetically by station name, and the result values per station in the format <min>/<mean>/<max>, rounded to one fractional digit):

{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, Abéché=-10.0/29.4/69.0, Accra=-10.1/26.4/66.4, Addis Ababa=-23.7/16.0/67.0, Adelaide=-27.8/17.3/58.5, ...}

Name		Name	Last commit message	Last commit date
Latest commit History 658 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
1brc.png		1brc.png
LICENSE.txt		LICENSE.txt
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

.gitignore

.gitignore

1brc.png

1brc.png

LICENSE.txt

LICENSE.txt

README.md

README.md

go.mod

go.mod

go.sum

go.sum

main.go

main.go

main_test.go

main_test.go

Repository files navigation

1️⃣🐝🏎️ The One Billion Row Challenge

Can we beat them? 🥁

Details

Measuring

Original 1BRC description snippet

About

Releases

Packages

Languages

License

lunemec/1brc

Folders and files

Latest commit

History

Repository files navigation

1️⃣🐝🏎️ The One Billion Row Challenge

Can we beat them? 🥁

Details

Measuring

Original 1BRC description snippet

About

Resources

License

Stars

Watchers

Forks

Languages