Skip to content

Use hash table instead of binary search to look up 32-bit pseudoprimes in n_is_prime#2509

Merged
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:isprime4
Nov 28, 2025
Merged

Use hash table instead of binary search to look up 32-bit pseudoprimes in n_is_prime#2509
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:isprime4

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

Instead of branchy binary search, do a branch-free O(1) hash table lookup. Inspired in part by discussion with David Sparks.

Gives a 5-10% speedup for random integers and 15-40% speedup for primes:

                  random                          primes               composites without small factors
bits |    old       new    speedup |     old       new    speedup   |     old       new    speedup   
16   | 1.19e-08  1.12e-08   1.062  |  4.22e-08  3.64e-08   1.159  |  3.42e-08  3.40e-08   1.006 
17   | 1.25e-08  1.17e-08   1.068  |  4.68e-08  3.94e-08   1.188  |  3.72e-08  3.72e-08   1.000 
18   | 1.28e-08  1.22e-08   1.049  |  4.93e-08  4.28e-08   1.152  |  4.04e-08  4.00e-08   1.010 
19   | 1.38e-08  1.29e-08   1.070  |  5.83e-08  4.74e-08   1.230  |  4.60e-08  4.43e-08   1.038 
20   | 1.45e-08  1.33e-08   1.090  |  6.58e-08  5.23e-08   1.258  |  5.06e-08  4.93e-08   1.026 
21   | 1.50e-08  1.37e-08   1.095  |  7.30e-08  5.65e-08   1.292  |  5.45e-08  5.39e-08   1.011 
22   | 1.55e-08  1.43e-08   1.084  |  7.53e-08  5.99e-08   1.257  |  5.78e-08  5.71e-08   1.012 
23   | 1.61e-08  1.48e-08   1.088  |  8.20e-08  6.36e-08   1.289  |  6.14e-08  6.08e-08   1.010 
24   | 1.66e-08  1.51e-08   1.099  |  8.73e-08  6.71e-08   1.301  |  6.47e-08  6.43e-08   1.006 
25   | 1.67e-08  1.55e-08   1.077  |  9.11e-08  7.03e-08   1.296  |  6.81e-08  6.76e-08   1.007 
26   | 1.75e-08  1.60e-08   1.094  |  9.71e-08  7.35e-08   1.321  |  7.11e-08  7.09e-08   1.003 
27   | 1.79e-08  1.64e-08   1.091  |  1.02e-07  7.68e-08   1.328  |  7.44e-08  7.41e-08   1.004 
28   | 1.83e-08  1.68e-08   1.089  |  1.07e-07  8.00e-08   1.337  |  7.76e-08  7.72e-08   1.005 
29   | 1.87e-08  1.72e-08   1.087  |  1.12e-07  8.31e-08   1.348  |  8.06e-08  8.02e-08   1.005 
30   | 1.93e-08  1.78e-08   1.084  |  1.17e-07  8.63e-08   1.356  |  8.37e-08  8.34e-08   1.004 
31   | 1.96e-08  1.80e-08   1.089  |  1.22e-07  8.96e-08   1.362  |  8.71e-08  8.66e-08   1.006 
32   | 1.99e-08  1.83e-08   1.087  |  1.28e-07  9.28e-08   1.379  |  9.06e-08  8.99e-08   1.008 

I'm not an expert on designing perfect hash functions; these are the parameters I came up with after a bit of trial and error. The hash table stores the 2314 pseudoprimes in an array of 2560 entries, requiring 10% zero padding (plus a bit of extra space for hash keys).

Using an array of length 4096 instead would probably be a little bit faster by allowing a cheaper modulo operation, but would waste 6 KB of precious L1 cache. Maybe one could split by size of n into one small 512-entry table and a large 2048-entry table instead? If someone else wants to revisit this after me.

@fredrik-johansson fredrik-johansson merged commit 9ecc5eb into flintlib:main Nov 28, 2025
13 checks passed
@fredrik-johansson fredrik-johansson deleted the isprime4 branch November 28, 2025 09:11
@JASory
Copy link
Copy Markdown
Contributor

JASory commented Dec 25, 2025

Does looking up the pseudoprimes perform favorably compared to looking up the witness to use, like Forisek and Jancina's 32-bit test?

If you want to reduce the number of pseudoprimes to store, you can use a different initial witness. 2 is actually very average when used by itself, it's strong because it eliminates the Monier-Rabin semiprimes { (2x+1)(4x+1) where x is odd} that many other witnesses are weak to.

Given a pair (witness, odd_strong_pseudoprimes_under_2^32) we have (15,1883) (34,2009) (37,1959) and given their proximity to a power of two (16-1,32+2,32+5) you might be able to exploit their form for faster montgomery exponentiation like done with 2.

I'm not aware of any witness that has only 1024 pseudoprimes under 2^32 as convenient as that might be.

@fredrik-johansson
Copy link
Copy Markdown
Collaborator Author

Does looking up the pseudoprimes perform favorably compared to looking up the witness to use, like Forisek and Jancina's 32-bit test?

Yes, though this may differ depending on CPU and minute implementation differences.

Random input will be declared composite by the base 2 test with high probability, so in the average case we benefit from the fast powering of base 2 and the fact that we don't need any hash table lookup at all. For primes (or, rarely, base-2 pseudoprimes), the additional hash table lookup after the strong probable prime test should have comparable cost to a hash table lookup before the test.

BTW, note that we use the Shoup modular reduction for the 32-bit powering and Montgomery only for >32 bit. This was faster on my machine, but again, this may differ on other machines (and with any micro-optimizations I've overlooked).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants