Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: Improve M31 mul #622

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

schouhy
Copy link

@schouhy schouhy commented May 8, 2024

Description

This PR slightly changes the implementation of the reduce algorithm saving one operation resulting in an improvement of about 8%.

The current implementation of the multiplication of two M31 elements first computes the integer product of their representatives and then applies a reduce function to it. The reduce function works for every integer in the range $[0, p^2)$. This can be refined by a special reduce function for elements in the range $[0, p^2)$ that are not multiples of $p$.

image (15)

General

The general idea is that taking remainder modulo $p = 2^{31} - 1$ is easy if the elements are written in base $2^{31}$. This is because the element $2^{31}$ equals $1$ modulo $p$. So, an element of the form
$$a_n (2^{31})^n + a_{n-1} (2^{31})^{n-1} + \cdots + a_1 2^{31} + a_0$$
is equivalent to the element $$a_{n} + \cdots + a_{0}.$$ This gives an integer in the same residue class than the original element. It may not be in the interval $[0, p)$. But it will be in the interval $[0, (n+1)(2^{31}-1)]$ which (for $n>0$) is smaller than the original range of values. So iterating this process eventually leads to the representative in the range $[0, p]$.

General remarks

Let $b > 1$ be an integer.

Let's consider nonnegative integers in base $b$. If $v < b^2$, then there exist $a_0, a_1 \in [0, b)$ such that $$v = a_1 b + a_0.$$ Let $w = a_0 + a_1$. Then, $w \in [0, 2b - 1)$. Since $w$ is bounded by $2b - 1$, if we write $w$ in base $b$ we obtain, $$w = b_1b + b_0,$$ with either

  • $b_1 = 0$ and $b_0 \in [0, b)$, or
  • $b_1 = 1$ and $b_0 \in [0, b-1)$.

In particular, $b_0 + b_1 \in [0, b)$.

The case of M31

Let $b = 2^{31}$ and $p = 2^{31} - 1 = b - 1$. By the argument above, for all $v \in [0, b^2)$, if we write $v = a_1 b + a_0$ and write $a_0 + a_1 = b_1 b + b_0$, we obtain $b_0 + b_1 \in [0, b)$, which is the same as $b_0 + b_1 \in [0, p].$

Supopse in addition we know that $v = ab$ is the product of two elements $a, b \in (0, p)$. Since $p$ is prime and $p$ does not divide both $a$ and $b$, then $p$ does not divide $v$. On the other hand, $v \equiv b_0 + b_1 ,\text{ mod } p$. This implies $b_0 + b_1$ is not divisible by $p$ and in particular $b_0 + b_1$ is different from $0$ and $p$.

Putting all together we obtain that $b_0 + b_1 \in [0, p)$ for the cases where $v \in [0, b^2)$ and $v = ab$ is the product of two elements strictly less than $p$.

Alternative algorithm

This follows the same idea as the reduce algorithm already implemented. But taking into account the particular case of $v$ being the product of two nonnegative integers less than $p$ we are able to remove the + 1 after the first shift in the current algorithm.

Let $b = 2^{31}$ and $p = 2^{31} - 1$. Suppose $v = ab$ is the product of two elements $a, b \in [0, p)$. Then $v$ belongs to the interval $[0, p^2)$ and is not a multiple of $p$. Then, if we write $v = a_1 b + a_0$, with $a_1, a_0 \in [0, b)$, then $a_1$ can't be equal to $b-1$. Otherwise $v = (b-1)b + a_0 = pb + a_0$ which is larger than $p^2$. So, $a_1 \leq b-2$.

Going back to the algorithm, instead of computing $a_1, a_0$ adding them, then computing $b_1, b_0$ and adding them, there's a shortcut.

Say $v = a_1 b + a_0$ and let $b_1, b_0$ be the elements such that $a_1 + a_0 = b_1 b + b_0$. As before, we know that $b_1$ is either $0$ or $1$. Let's consider the following elements:

  1. Let $w := v + a_1$. If we expand this we obtain $$w = a_1 b + a_0 + a_1 = a_1 b + b_1 b + b_0 = (a_1 + b_1) b + b_0.$$ Since $b_1$ is either $0$ or $1$ and $a_1 \leq b-2$, we obtain that $a_1 + b_1 \leq b-1$. Therefore, the above expression is the decomposition of $v + a_1$ in base $b$.
  2. Let $u := v + a_1 + b_1$. Expanding once again we obtain $$u = a_1 b + a_0 + a_1 + b_1 = (a_1 + b_1) b + b_0 + b_1.$$ Since we know from the previous section that $b_0 + b_1$ is less than $p$, then the above expression is the decomposition of $u$ in base $b$.
/// Assumes that `val` is in the range [0, `P`.pow(2)) and `val` is not a multiple of `P`.
///
/// Returns `val` % `P` .
fn reduce_alternative_algorithm(v: u64) -> Self {
    let w = v + (v >> 31);
    let u = v + (w >> 31);
    Self(u as u32 & P)
}

This gives about 8% improvement over the reduce algorithm in an x86 laptop (Core i7).


This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant