Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR slightly changes the implementation of the reduce algorithm saving one operation resulting in an improvement of about 8%.
The current implementation of the multiplication of two$[0, p^2)$ . This can be refined by a special reduce function for elements in the range $[0, p^2)$ that are not multiples of $p$ .
M31
elements first computes the integer product of their representatives and then applies a reduce function to it. The reduce function works for every integer in the rangeGeneral
The general idea is that taking remainder modulo$p = 2^{31} - 1$ is easy if the elements are written in base $2^{31}$ . This is because the element $2^{31}$ equals $1$ modulo $p$ . So, an element of the form
$$a_n (2^{31})^n + a_{n-1} (2^{31})^{n-1} + \cdots + a_1 2^{31} + a_0$$ $$a_{n} + \cdots + a_{0}.$$ This gives an integer in the same residue class than the original element. It may not be in the interval $[0, p)$ . But it will be in the interval $[0, (n+1)(2^{31}-1)]$ which (for $n>0$ ) is smaller than the original range of values. So iterating this process eventually leads to the representative in the range $[0, p]$ .
is equivalent to the element
General remarks
Let$b > 1$ be an integer.
Let's consider nonnegative integers in base$b$ . If $v < b^2$ , then there exist $a_0, a_1 \in [0, b)$ such that $$v = a_1 b + a_0.$$ Let $w = a_0 + a_1$ . Then, $w \in [0, 2b - 1)$ . Since $w$ is bounded by $2b - 1$ , if we write $w$ in base $b$ we obtain, $$w = b_1b + b_0,$$ with either
In particular,$b_0 + b_1 \in [0, b)$ .
The case of M31
Let$b = 2^{31}$ and $p = 2^{31} - 1 = b - 1$ . By the argument above, for all $v \in [0, b^2)$ , if we write $v = a_1 b + a_0$ and write $a_0 + a_1 = b_1 b + b_0$ , we obtain $b_0 + b_1 \in [0, b)$ , which is the same as $b_0 + b_1 \in [0, p].$
Supopse in addition we know that$v = ab$ is the product of two elements $a, b \in (0, p)$ . Since $p$ is prime and $p$ does not divide both $a$ and $b$ , then $p$ does not divide $v$ . On the other hand, $v \equiv b_0 + b_1 ,\text{ mod } p$ . This implies $b_0 + b_1$ is not divisible by $p$ and in particular $b_0 + b_1$ is different from $0$ and $p$ .
Putting all together we obtain that$b_0 + b_1 \in [0, p)$ for the cases where $v \in [0, b^2)$ and $v = ab$ is the product of two elements strictly less than $p$ .
Alternative algorithm
This follows the same idea as the$v$ being the product of two nonnegative integers less than $p$ we are able to remove the
reduce
algorithm already implemented. But taking into account the particular case of+ 1
after the first shift in the current algorithm.Let$b = 2^{31}$ and $p = 2^{31} - 1$ . Suppose $v = ab$ is the product of two elements $a, b \in [0, p)$ . Then $v$ belongs to the interval $[0, p^2)$ and is not a multiple of $p$ . Then, if we write $v = a_1 b + a_0$ , with $a_1, a_0 \in [0, b)$ , then $a_1$ can't be equal to $b-1$ . Otherwise $v = (b-1)b + a_0 = pb + a_0$ which is larger than $p^2$ . So, $a_1 \leq b-2$ .
Going back to the algorithm, instead of computing$a_1, a_0$ adding them, then computing $b_1, b_0$ and adding them, there's a shortcut.
Say$v = a_1 b + a_0$ and let $b_1, b_0$ be the elements such that $a_1 + a_0 = b_1 b + b_0$ . As before, we know that $b_1$ is either $0$ or $1$ . Let's consider the following elements:
This gives about 8% improvement over the
reduce
algorithm in an x86 laptop (Core i7).This change is