Attempt of an initial version by rtreffer...... #186

rtreffer · 2024-01-06T20:31:09Z

Check List:

Tests pass (./test.sh rtreffer)
All formatting changes by the build are committed
Output matches that of calculate_average_baseline.sh

Execution time: 0:02.00 (2s)
Execution time of reference implementation: 1:34.59 (1m34s)

This one will need some judgement calls 🤷

It's a mashup pulling from @thomaswue (still 75% of the code) and @royvanrijn (SIMD idea). I really hope some of this makes it into other implementations.
The tiny unique ideas are: fnv64(alike) hashing of the city name as unique identifier + prime sized hash table.
This required some annoying work of the parse loop.

Using fnv64 (or any other hash function) on 10'000 station names is not guaranteed to be collision free, but it should be up to 99.99999999999995% correct ((1-1/(2^64))^10000). (Correction: should be closer to 99.999999999728922351%, ups)

That said it's really fast.

thomaswue · 2024-01-06T23:37:46Z

Very cool work! How much faster on your machine than my solution is it with those changes?

I think one can still read 8 bytes when parsing for the city length and just special case the last entry, but it adds some more complexity to the code.

The full collision check in 8 byte chunk comparisons instead of the hash compare should be quite OK. In my solution it is still on a per-byte basis and might be a bottleneck because of that.

rtreffer · 2024-01-07T00:40:51Z

@thomaswue yes, that's all correct, I am playing around with these alternatives, but it is getting way to late for my taste now 🙈

So this version right now is extremely fast as it skips the whole string comparison. That's most of the speedup. I think it would take back spot 1, but it is sorta questionable as I can be sure there must be names that have a collision, it's just very very hard to find 😆. I noted results on my machine in the code

 * - measurements3: rtreffer=3.9s baseline=2m6s thomaswue=8.5s
 * - measurements:  rtreffer=2.6s baseline=1m47 thomaswue=4.1s royvanrijn=3.2s

Although I must have gotten a bad run on your code, it seems a bit faster at the moment.

8 byte SIMD + a single 4 byte SIMD scan work. I have that lying around. I also went for 8 byte string comparisons, with a 4 and 2 byte comparison if needed (hack: you can always compare one more character, the ';'). However I am then close to your impl in speed, so I must have degraded at a few places.

I am right now considering to drop the whole OOP results space, make it one memory block and use that for the results.
At the end I can pack each city result into ~128 bytes (8 sum, 4 count, 4 min, 4 max, 100(+1) city name).
That's just 2-3x more than my L2 at an oversized hash table. But if I can make the hashtable dynamic then all of this should fit. Similar to the whole scanning approach.

🛌

gunnarmorling · 2024-01-07T18:19:18Z

Using fnv64 (or any other hash function) on 10'000 station names is not guaranteed to be collision free, but it should be up to 99.99999999999995% correct ((1-1/(2^64))^10000). (Correction: should be closer to 99.999999999728922351%, ups)

Hey @rtreffer, thanks a lot for this submission. Unfortunately, as is, it's not compliant with the requirements, specifically the rule "Implementations must not rely on specifics of a given data set, e.g. any valid station name as per the constraints above and any data distribution (number of measurements per station) must be supported". If there is an, albeit small, risk for hash collisions which may cause incorrect results to be emitted, that solution is not within the spec.

rtreffer · 2024-01-07T22:59:26Z

Imagined that, I'll do a new submission soon :-)

Initial version by rtreffer :-)

e45ad67

rtreffer closed this Jan 7, 2024

nietras mentioned this pull request Jan 15, 2024

My measurements xoofx/Fast1BRC#1

Closed

This was referenced Jan 15, 2024

Go implementation by AlexanderYastrebov #298

Merged

Vanilla implementation shipilev #183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt of an initial version by rtreffer...... #186

Attempt of an initial version by rtreffer...... #186

rtreffer commented Jan 6, 2024

thomaswue commented Jan 6, 2024

rtreffer commented Jan 7, 2024

gunnarmorling commented Jan 7, 2024

rtreffer commented Jan 7, 2024

Attempt of an initial version by rtreffer...... #186

Attempt of an initial version by rtreffer...... #186

Conversation

rtreffer commented Jan 6, 2024

Check List:

thomaswue commented Jan 6, 2024

rtreffer commented Jan 7, 2024

gunnarmorling commented Jan 7, 2024

rtreffer commented Jan 7, 2024