Skip to content

Conversation

@pavpanchekha
Copy link
Collaborator

@pavpanchekha pavpanchekha commented Apr 9, 2025

My Herbie project spends about 3–4% of runtime in the random-natural function. This lead me to look at how that function works, and in fact instead of ordinary rejection sampling, it implements a clever algorithm that biases the sampling slightly to reduce the rejection probability. But that biasing involves a bignum multiplication, and after timing it, it looks to be quite a bit slower than ordinary rejection sampling. Specifically, I ran this code for timing:

(module+ main
  (for ([n (in-range 10)])
    (collect-garbage)
    (time
     (for ([n (in-range 1000000)])
       (random-natural 197823238423487)))))

You can see that it samples a natural number below 197823238423487 and measures its runtime in nanoseconds/iter. I get 190–200 nanoseconds for ordinary rejection sampling and 400–410 nanoseconds for the clever version. I say we go back to the ordinary version.

@rfindler
Copy link
Member

rfindler commented Apr 9, 2025

Does it make sense to keep the other version around in case the situation changes? I guess someone did similar timing experiments when things were different (perhaps pre-chez?) to get to the current code, but I don't actually know.

@samth
Copy link
Member

samth commented Apr 9, 2025

For that number, I think you could do a lot better by special-casing fixnum-size values in random-bits and then in random-natural. Also, those functions could all be faster if they avoided the (implicit) call to (current-random-number-generator) in random by (a) saving it at the beginning of random-bits or (b) allowing the caller to specify it so that you could call it once in your benchmark instead of at least 1000000 times.

@samth
Copy link
Member

samth commented Apr 9, 2025

See #112 for a step towards this.

@pavpanchekha
Copy link
Collaborator Author

@rfindler I think @samth's response in a secret sense rebuts yours, as in, this code just isn't very optimized. The code is unchanged since it was separated from Niel Toronto's branch, where it also went through I think exactly one revision. In view of this, I think simpler code is better! At least no one will accidentally think it's been hyper-optimized.

I do like #112, @samth, it's an optimization (will measure later) but also a good extension to the API.

@samth
Copy link
Member

samth commented Apr 9, 2025

#112 now includes the other optimization I suggested. @pavpanchekha it seems like it's a speedup over what you had to include both optimizations but you should check.

@pavpanchekha
Copy link
Collaborator Author

It is. Current results:

Branch ns
main 402
#111 190
#112 264
both 106

So I say we do both!

@pavpanchekha
Copy link
Collaborator Author

Also if you do just #112 but change the quotient to an arithmetic-shift [1] you get down to 230—still best to do both.

[1] This was the first optimization I tried, initially I was planning to just PR that alone.

@samth
Copy link
Member

samth commented Apr 9, 2025

Great. I think this one is obviously correct and I'll let you review #112.

@pavpanchekha pavpanchekha merged commit 5644a7c into master Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants