# Ekerå–Håstad's algorithm for computing short discrete logarithms
This notebook exemplifies using [Quaspy](https://github.com/ekera/quaspy) to simulate Ekerå–Håstads algorithm for computing short discrete logarithms [[EH17]](https://doi.org/10.1007/978-3-319-59879-6_20), with improvements and the classical post-processing from [[E20]](https://doi.org/10.1007/s10623-020-00783-2) and [[E23p]](https://doi.org/10.48550/arXiv.2309.01754) when making and not making tradeoffs, respectively.

To start off, let us define the MODP-3072 safe-prime group used in IKEv2, see [RFC 3526](https://datatracker.ietf.org/doc/html/rfc3526) and Tab. 25 on p. 133 in App. A of [NIST SP 800-56A Rev. 3](https://doi.org/10.6028/NIST.SP.800-56Ar3).

In [1]:
!pip3 install -q --pre quaspy # Make sure that quaspy is installed.

In [2]:
from quaspy.math.groups import IntegerModRingMulSubgroupElement;

# Define the safe prime.
p = 5809605995369958062791915965639201402176612226902900533702900882779736177890990861472094774477339581147373410185646378328043729800750470098210924487866935059164371588168047540943981644516632755067501626434556398193186628990071248660819361205119793693985433297036118232914410171876807536457391277857011849897410207519105333355801121109356897459426271845471397952675959440793493071628394122780510124618488232602464649876850458861245784240929258426287699705312584509625419513463605155428017165714465363094021609290561084025893662561222573202082865797821865270991145082200656978177192827024538990239969175546190770645685893438011714430426409338676314743571154537142031573004276428701433036381801705308659830751190352946025482059931306571004727362479688415574702596946457770284148435989129632853918392117997472632693078113129886487399347796982772784615865232621289656944284216824611318709764535152507354116344703769998514148343807;

# Define the generator.
g = IntegerModRingMulSubgroupElement(2, p);

# Define the order of the generator.
r = (p - 1) // 2;

For this group, the exponent $d$ is selected uniformly at random from $[0, 2^{256}) \cap \mathbb Z$ (when using the NIST model to estimate the strength level of the group).
To select $d$, we use the [<code>sample_integer(B)</code>](../docs/math/random/sample_integer.md) convenience function provided by [Quaspy](https://github.com/ekera/quaspy).

In [3]:
from quaspy.math.random import sample_integer;

# Sample d.
m = 256;
d = sample_integer(2 ** m);

print("Sampled d =", d);

# Compute x.
x = g ** d;

print("\nComputed x =", x);

Sampled d = 14867624666158446636666179337991376830611190433112693090937780483120623068086

Computed x = 27611512647033879467607788018190846853952229749203849921921424818135710530843661193394718021432049438812026485988260822525938124351313252998448484453422719026317959053226270950160894170144515818038346696904989963887134425060796039202649268152656787619296432492745163979859476137923073371361071641914047946148219557591353372769268423864982072388138575016427967763838783689374264039543360257225456507793791084088867987316099765760519623771928367435930710959122332600887362504969380785271117088803090379984676927990470140972617985289916742865853788580028690991174882989678109676054540803895247529312577361917977464280978908200787904461792287707030984246525478453569235871064924286009151335993461857227950633353640559453461185123520318238599292349534910865918846470002856957349104833848238471434007321797394312705943438394817586734926028608545157799749357292213206145998738765592797015581585989932093

## 1. Solving for the logarithm $d$ in a single run
To start off, let us first consider the setting where our goal is to solve for the discrete logarithm $d$ in a single run of the quantum part of Ekerå-Håstad's algorithm [[EH17]](https://doi.org/10.1007/978-3-319-59879-6_20).

### 1.1. Sampling a frequency pair $(j, k)$
[Quaspy](https://github.com/ekera/quaspy) provides a function [<code>sample_j_k_given_d_r_tau(d, r, m, ell, tau, ..)</code>](../docs/logarithmfinding/short/sampling/sample_j_k_given_d_r_tau.md) for simulating the quantum part of Ekerå–Håstad's algorithm exactly (up to arbitrary precision) for a given logarithm $d$, order $r$ (that need not be specified), and parameters $m$, $\ell$ and $\tau$, where $m$ is an upper bound on the bit length of $d$, and where $\tau$ specifies the search interval when sampling and when solving in the classical post-processing from [[E23p]](https://doi.org/10.48550/arXiv.2309.01754).
For further details, see [[E23p]](https://doi.org/10.48550/arXiv.2309.01754).

Below, we use said function to simulate running the algorithm for $d$ and $r$ with control registers of length $m + \ell$ qubits and $\ell$ qubits respectively.
More specifically, for $g$ a generator of unknown order $r \ge 2^{m+\ell} + (2^\ell - 1)d$ and $x = g^d$, we simulate inducing the state

$$\frac{1}{2^{m + 2 \ell}}
\sum_{a, \, j \, = \, 0}^{2^{m+\ell} - 1}
\sum_{b, \, k \, = \, 0}^{2^{\ell} - 1}
\mathrm{exp}
\left(
  \frac{2 \pi \mathrm{i}}{2^{m + \ell}} (aj + 2^m bk)
\right)
|\, j, k, g^a x^{-b} \,\rangle$$

and reading out the first two control registers.
This yields a frequency pair $(j, k)$ sampled from the probability distribution induced by the quantum part of Ekerå–Håstad's algorithm:

In [4]:
from quaspy.logarithmfinding.short.sampling import sample_j_k_given_d_r_tau;

l = m;
tau = 27;

[j, k] = sample_j_k_given_d_r_tau(d, r, m, l, tau);

print("Sampled j =", j);
print("Sampled k =", k);

Sampled j = 11537636737092939085202964713643412808303836285641461871897267188321025803643217469817969474479357606426998814070910890931223119606001144958522487371210079
Sampled k = 29246234808862741736430286986658896390178351944930753053991265061031474065059


As explained in [[E23p]](https://doi.org/10.48550/arXiv.2309.01754), $\sim \ell$ bits of information on $d$ is computed in each run of the quantum part of Ekerå–Håstad's algorithm. To solve in a single run, we therefore select $\ell = m$ above.

Note that the analysis in [[EH17]](https://doi.org/10.1007/978-3-319-59879-6_20), [[E20]](https://doi.org/10.1007/s10623-020-00783-2) and [[E23p]](https://doi.org/10.48550/arXiv.2309.01754) furthermore requires that $r \ge 2^{m+\ell} + (2^\ell - 1)d$.
For the safe-prime group that we consider in this example, this requirement is met when picking $\ell = m$.
For other groups, this is not necessarily the case.

Note also that the [<code>sample_j_k_given_d_r_tau(d, r, m, ell, tau, ..)</code>](../docs/logarithmfinding/short/sampling/sample_j_k_given_d_r_tau.md) function checks that $r \ge 2^{m+\ell} + (2^\ell - 1)d$ when $r$ is passed to the function. This is the only reason for why $r$ is passed to said function.

### 1.2. Solving the frequency pair $(j, k)$ and $g$ and $x$ for $d$
We now proceed to solve the frequency pair $(j, k)$ for the logarithm $d$.

To this end, we use the [<code>solve_j_k_for_d(j, k, m, l, g, x, tau, t, ..)</code>](../docs/logarithmfinding/short/postprocessing/solve_j_k_for_d.md) function provided by [Quaspy](https://github.com/ekera/quaspy). 
It solves $(j, k)$ for $d$ by using the lattice-based post-processing from [[E23p]](https://doi.org/10.48550/arXiv.2309.01754).

The parameters $\tau$ and $t$ control the search space when solving $(j, k)$ for $d$.
Below, we gradually grow the search space, from $(\tau, t) = (4, 2)$ up to $(\tau, t) = (27, 2)$ for which the success probability is at least $1 - 10^{-8}$, and for which at most $2^{18.6}$ group operations have to be performed in the classical post-processing.
For further details, see [[E23p]](https://doi.org/10.48550/arXiv.2309.01754) (in particular, see Tab. 1).

Note that $\tau = 27$ for the last combination which explains why we specified $\tau = 27$ when sampling in the previous section.

In [5]:
from quaspy.logarithmfinding.short.postprocessing import solve_j_k_for_d;

for [tau, t] in [[4, 2], [7, 2], [14, 2], [17, 2], [27, 2]]:
  recovered_d = solve_j_k_for_d(j, k, m, l, g, x, tau = tau, t = t);
  if recovered_d != None:
    break;

if recovered_d == d:
  print("Recovered d =", d);

  print("\n[ OK ] Successfully recovered d.");
else:
  print("[FAIL] Failed to recover d.");

Recovered d = 14867624666158446636666179337991376830611190433112693090937780483120623068086

[ OK ] Successfully recovered d.


## 2. Making tradeoffs and solving for $d$ in multiple runs
Let us now consider the case where we make tradeoffs by picking $\ell \approx m/s$ for $s$ some tradeoff factor.

As explained in [[E20]](https://doi.org/10.1007/s10623-020-00783-2), each run of the quantum part of Ekerå–Håstad's algorithm yields $\sim \ell$ bits of information on the logarithm $d$.
Hence, we expect to have to perform at least $s$ runs to solve for $d$ efficiently and with high probability of success in the classical post-processing.

According to the estimates in [[E20]](https://doi.org/10.1007/s10623-020-00783-2) (see Tab. 2) which were computed with the [Qunundrum](https://github.com/ekera/qunundrum) suite of MPI programs, when $m = 256$, $s = 8$ and $\ell = \lceil m / s \rceil$, we need to make no more than $n = 11$ runs to solve efficiently in the classical post-processing with $\ge 99\%$ success probability without enumerating the lattice.
In the below example, we use this specific parameterization.

### 2.1. Sampling $n$ frequency pairs $((j_1, k_1), \, \ldots, \, (j_n, k_n))$
To start off, in analogy with Sect. 1.1 above, we sample $n$ frequency pairs $((j_1, k_1), \, \ldots, \, (j_n, k_n))$ from the distribution induced by the quantum part of Ekerå–Håstad's algorithm.
To this end, we use the [<code>sample_j_k_given_d_r(d, r, m, ell, ..)</code>](../docs/logarithmfinding/short/sampling/sample_j_k_given_d_r.md) function provided by [Quaspy](https://github.com/ekera/quaspy).

Note that the [<code>sample_j_k_given_d_r(d, r, m, ell, ..)</code>](../docs/logarithmfinding/short/sampling/sample_j_k_given_d_r.md) function is equivalent to the [<code>sample_j_k_given_d_r_tau(d, r, m, ell, tau, ..)</code>](../docs/logarithmfinding/short/sampling/sample_j_k_given_d_r_tau.md) function used above in Sect. 1.1, expect that it selects the search interval in a more straightforward manner that is not specifically adapted to the post-processing from [[E23p]](https://doi.org/10.48550/arXiv.2309.01754).
(And in what follows below, we will use the post-processing from [[E20]](https://doi.org/10.1007/s10623-020-00783-2), so it makes sense to call [<code>sample_j_k_given_d_r(d, r, m, ell, ..)</code>](../docs/logarithmfinding/short/sampling/sample_j_k_given_d_r.md) in this step to sample the frequency pairs.)

In [6]:
from quaspy.logarithmfinding.short.sampling import sample_j_k_given_d_r;

from math import ceil;

s = 8;
l = ceil(m / s);
n = 11;

j_k_list = [sample_j_k_given_d_r(d, r, m, l, timeout = 30) for _ in range(n)];

print("Sampled pairs [j, k] =", j_k_list);

Sampled pairs [j, k] = [[49021096921192201463681429065967228089151692407369231665560475663624004115552865438031, 2469361454], [106130186551029264589224290693095770647827665035310062613166290891073007907083547655337, 394074144], [152871282366174042836066719014012820033220242245136590336961806844018589356017478760476, 4152781648], [178108523135547768352938472840044308618395271190623697048085769867499301694822481798303, 3072724253], [167214375564563082293855330351567661302287329029280597885451216920203264041163256654432, 2947219498], [296950654069873592330121648948838739294729409516369435435726912367673757751030072575584, 4135104007], [70152138809085613449764007558598433610517478751068727846051070951573369447522398804899, 1385824607], [157608244387551874059472611710581613293092116578498127241236437174612174587777757042084, 3564079824], [219245331240923438729306069137466145912033542376902853856805409134248635158937046439182, 3916868383], [453886153471626137315329744285421716793526320645489

### 2.2. Solving the $n$ frequency pairs $((j_1, k_1), \, \ldots, \, (j_n, k_n))$ and $g$ and $x$ for $d$
We now proceed to solve the $n$ frequency pairs $((j_1, k_1), \, \ldots, \, (j_n, k_n))$ for the logarithm $d$.

To this end, we use the [<code>solve_multiple_j_k_for_d(j_k_list, m, l, g, x, ..)</code>](../docs/logarithmfinding/short/postprocessing/solve_multiple_j_k_for_d.md) function provided by [Quaspy](https://github.com/ekera/quaspy).
It jointly solves  $((j_1, k_1), \, \ldots, \, (j_n, k_n))$ for $d$ by using the lattice-based post-processing from [[E20]](https://doi.org/10.1007/s10623-020-00783-2).

In [7]:
from quaspy.logarithmfinding.short.postprocessing import solve_multiple_j_k_for_d;

recovered_d = solve_multiple_j_k_for_d(j_k_list,
                                       m,
                                       l,
                                       g,
                                       x,
                                       enumerate = False,
                                       timeout = 30);

if recovered_d == d:
  print("Recovered d =", d);

  print("\n[ OK ] Successfully recovered d.");
else:
  print("[FAIL] Failed to recover d.");

Recovered d = 14867624666158446636666179337991376830611190433112693090937780483120623068086

[ OK ] Successfully recovered d.
