# Multiprecision Number

- A multiple precision algorithm would augment the precision of the destination to accomodate the result while a single precision system would truncate excess bits to maintain a ﬁxed level of precision.
- The same algorithms based on multiple precision integers can accomodate any reasonable size input without the designer’s explicit forethought. This leads to lower cost of ownership for the code as it only has to be written and tested once.

## Miracl

- The MIRACL library consists of two new data-types are defined - big for large integers and flash (short for floating-slash) for large rational numbers. The large integer routines are based on Knuth’s algorithms, described in Chapter 4 of his classic work ‘The Art of Computer Programming’. Floating-slash arithmetic, which works with rounded fractions, was originally proposed by D. Matula and P. Kornerup. 

- The former is used to store multi-precision integers, and the latter stores multi-precision fractions as numerator and denominator in ‘floating-slash’ form. Both take the form of a fixed length array of digits, with sign and length information encoded in a separate 32-bit integer. The data type defined as mr_small used to store the number digits will be one of the built in types, for example int, long or even double. This is referred to as the “underlying type”.

```
struct bigtype
{
     mr_unsign32 L;
     mr_small *d;
};
typedef struct bigtype *big;
typedef struct bigtype *flash;
```
> Observe that a big is just a pointer. The memory needed for each big or flash number instance is taken from the heap (or from the stack). Therefore each big or flash number must be initialised before use, and the required memory assigned to it.
> Structure of big and flash data-types where s is the sign of the number, n and d are the lengths of the numerator and denominator respectively, and LSW and MSW mean ‘Least significant word and ‘Most significant word’ respectively
![](https://i.imgur.com/4hmQ9S8.png)
> It does mean that when doing a calculation on big integers that the results of all intermediate calculations must be less than or equal to the fixed size initially specified to mirsys
> Montgomery arithmetic is used internally by many of the MIRACL library routines that require extensive modular arithmetic, such as the highly optimised modular 

### Optimization
The Comba and KCM methods, are implemented in the files mrcomba.c and mrkcm.c respectively. These files are created from template files mrcomba.tpl and mrkcm.tpl by inserting macros defined in a .mcs file. This is done automatically using the supplied macro expansion utility mex. Compile and run config.c on your target system to automatically create a suitable mirdef.h and for advise on how to proceed. Also read kcmcomba.txt. 
> To get the fastest possible performance for your embedded application it is recommended that you should develop your own x.mcs file, if one is not already provided for your processor/compiler.

By default MIRACL obtains memory for Big and Flash variables from the heap. This can be quite time-consuming, and all such objects need ultimately to be destroyed. It would be faster to assign memory instead from the stack, especially for relative small big numbers. This can now be achieved by defining BIGS=m at compilation time. For example if using the Microsoft C++ compiler from the command line:-

C:miracl>cl /O2 /GX /DBIGS=50 brent.cpp big.cpp zzn.cpp miracl.lib

> Note that the value of m should be the same as or less than the value of n that is specified in the call to mirsys(n,0); or in Miracl precision=n; in the main program.
> When using finite-field arithmetic, valid numbers are always less than a certain fixed modulus. For example in the finite field mod n, the class defined in zzn.h and zzn.cpp might handle numbers with respect to a 512-bit modulus n, which is set by modulo(n). In this case one can define ZZNS=16 so that all elements are of a size 16x32=512, and are created on the stack. (This works particularly well in combination with the Comba mechanism described in Chapter 5.)

In a similar fashion, when working over the field GF(2283) , one can define GF2MS=9, so that all elements in the field are stored in a fixed memory allocation of 9 words taken from the stack.

### Interface
The MIRACL library is interfaced to C++ via the header files big.h, flash.h, zzn.h, gf2m.h, ecn.h and ec2.h. Function implementation is in the associated files big.cpp,  flash.cpp, zzn.cpp, gf2m.cpp,  ecn.cpp and ec2.cpp, which must be linked into any application that requires them. The Chinese Remainder Theorem is also elegantly implemented as a class, in files crt.h and crt.cpp. See decode.cpp for an example of use. The Comb method for fast modular exponentiation with precomputation [HAC] is implemented in brick.h. See brick.cpp for an example of use. The GF(p) elliptic curve equivalents are in ebrick.h and ebrick.cpp and the GF(2m) elliptic curve equivalents in ebrick2.h and ebrick2.cpp respectively.

In many of the example programs, particularly the factoring programs, all the arithmetic is done mod n. To avoid the tedious reduction mod n required after each operation, a new C++ class ZZn has been used, and defined in the file zzn.h. This class ZZn (for ZZ(n) or the ring of integers mod n ) has its arithmetic operators defined to automatically perform the reduction. The function modulo(n) sets the modulus. In an analogous fashion the C++ class GF2m deals with elements of the field defined over GF(2m). In this case the “modulus” is set via modulo(m,a,b,c), which also specifies either a trinomial basis tm + ta +1, (and set b=c=0), or a pentanomial basis tm + ta + tb + tc +1. Internally the ZZn class uses Montgomery representation. See zzn.h. Note that the internal implementation of ZZn is hidden from the application programmer, a classic feature of C++. Thus the awkward internals of Montgomery representation need not concern the C++ programmer. The class ECn defined in ecn.h makes manipulation of points on GF (p) elliptic curves a simple matter, again hiding all the grizzly details. The class EC2 defined in ec2.h does the same for GF(2m) elliptic curves.

![](https://i.imgur.com/eaM7r9S.png)

## Barret Reduction

$$c = a - b \cdot \lfloor (a \cdot \lfloor 2^q / b \rfloor)/2^q \rfloor$$

$$c = a - b \cdot \lfloor (a \cdot \mu)/2^q \rfloor$$

Let $t_0 = \lfloor a/\beta^{m-1} \rfloor$ represent the input with the irrelevant digits trimmed. Now the modular reduction is trimmed to the almost equivalent equation

$$c = a - b \cdot \lfloor (t_0 \cdot \mu) / \beta^{m+1} \rfloor$$


Let $\beta = 2$, which is typical case in hardware design

|Algorithm 1  Barrett Reduction                                                                                  |
|:--------------------------------------------------------------------|
|Input: $0 \leq T = (T_{2m−1}, T_{2m−2},..., T_1, T_0)_b < b^{2k}, N, b \geq 3$, $m = \lfloor logb N \rfloor +1$ and $\mu$|
|Output: $R ≡ (R_{k−1}, ... ,R_0)_b = T \mbox{ (mod } N\mbox{)}$                                                               |
|1: $q_1 =\lfloor T/b^{m-1} \rfloor$ (Bitwise Shift Right)                                                                       |
|2: $q_2 = q_1 · μ $ (m*m Multiplication)                                                                               |
|3: $q_3 = \lfloor q_2/b^{m+1} \rfloor$ (Bitwise Shift Right)                                                                      |
|4: $R_1 = T \mbox{ (mod } b^{m+1}\mbox{)}$ (Truncation)                                                                          |
|5: $R_2 = q_3 · N \mbox{ (mod } b^{m+1}\mbox{)}$ (m*m Multiplication + Truncation)                                                    |
|6: $R = R_1 − R_2$ (Subtraction)                                                                                  |
|7: If $(R < 0)$ then $R = R + b^{m+1}$ (Addition)                                                               |
|8: While ($R \geq N)$ do $R = R − N$ (Subtraction)                                                              |
|9: Return $R$                                                                                                   |

|Algorithm 2 Computation of $\mu$                                               |
|------------------------------------------|
|Input: $p$, and $k = \lfloor logb N\rfloor + 1$                                 |
|Output: $\mu = \lfloor b^{2k}/N \rfloor$                                        |
|1: $\mu = b^k$                                                                  |
|2: Repeat                                                                       |
|3: $S = μ$                                                                      |
|4: $μ = 2μ − \lfloor (\lfloor μ^2/b^k \rfloor \cdot N )/ b^k \rfloor  $           |
|5: Until $μ \leq S$                                                             |
|6: $t = b^{2k} − N \cdot μ$                                                     |
|7: While ($t < 0$) do                                                           |
|8: $μ = μ − 1$                                                                  |
|9: $t = t + N$                                                                  |
|10: Return $R$                                                                  |

In [11]:
%%writefile barret.cpp


#include <iostream>
#include "big.h"
using namespace std;
Miracl precision(400,10);

Big barret_setup (Big b)
{
  Big     x;
  x = pow((Big)2, 2*bits(b));
  x = x/b;
  return x;
}

Big barret_reduction (Big x, Big m, Big mu)
{
  Big  q, b, temp, zero;
  int ix, iq;
  temp = 1;
  /* q = x */
  q = x;
  q = q >> (bits(m)-1);
  q = mu * q;
  q = q >> (bits(m)+1);
  temp = pow((Big)2, bits(m)+1);
  x = x % temp;
  cout << "x="<< x << endl;

  q =q * m;
  temp = pow((Big)2, bits(m)+1);
  q = q % temp;
  cout << "q=" << q << endl <<"m=" << m << endl;
  x = x - q;
  if (x < 0) {
    b = pow((Big)2, bits(m)+1);
    x = x + b;
    cout << "b=" << b << endl;
  }
  while (x > m)  x = x - m; 
  cout << "x=" << x << endl;

  return x;
}
int  main()
{
    Big a,b,c,mu;
   /*  initialize  a,b  to  desired  values,  mp_init  mu, c  and  set  c  to  1...we  want  to  compute  a^3  mod  b   */
    a = 1601613;
    b = 201;
    /*  get  mu  value  */
    mu = barret_setup(b);
    /*  now  reduce a modulo b */
    c = barret_reduction (a, b, mu);
    cout << "number = " << c << endl;
    
    return  1;
}

Overwriting barret.cpp


In [12]:
%%bash 
g++ barret.cpp big.cpp miracl.a -o barret
./barret

x=77
q=142
m=201
b=512
x=45
number = 45


## Montgomery Mutiplication

In [13]:
%%writefile Mont.cpp
#include <iostream>
#include "big.h"
#include "zzn.h"
#include <math.h>

using namespace std;

Miracl precision(400,10);

Big bitvar(Big a, int base, int inradix ){
   Big x, temp;
   int i;
   x = 0;
   temp = 1;
   for (i=0; i < inradix; i++){
        x = x + bit(a,base+i)*temp;
        temp = temp *2;     
    }
   return x;
}    
   

Big mont_setup (Big p, int radix)
{
  Big     x, temp;
  temp = pow((Big)2, radix);
  x = inverse(p, temp);//p^-1 mod 2
  x = (temp-x)%temp;
  return x;
}

Big mont_full (Big a, Big b, Big p, Big p_p)
{
    Big c, q, temp;
    temp = pow((Big)2, bits(p));
    c = a*b;
    q = ((c % temp) * p_p) % temp;
    c = (c+q*p)/temp;
    if(c>=p)  c = c -p;
    return c;
}


Big mont_mul (Big a, Big b, Big p, Big p_p, int inradix)
{
  Big  q, temp, c;
  int i, iter;
  temp = 0;
  c = 0;
  temp = pow((Big)2, inradix);
  if(bits(p)%2==0)  iter = bits(p)/inradix;
  else      iter = (bits(p)/inradix)+1;

  for (i=0; i< iter; i++){
    // method 1
        q = ((bitvar(c,0,inradix)+bitvar(a,i*inradix,inradix)*bitvar(b,0,inradix))*p_p) % temp;
 //       for(int j = inradix-1; j>=0; j--)
 //           cout<<bit(c,j);
 //       cout << endl;
         c = c+bitvar(a,i*inradix,inradix)*b+q*p;
  //      cout << "c=" << c << endl;
  //      for(int j =inradix-1; j>=0; j--)
  //          cout<<bit(c,j);
  //      cout << endl;
        c = c /temp;
        
        /*
       // method 2
         c = c + bit(a,i)*b;
         q = inverse(bit(p,0), 2);
         q = (bit(c,0)*q)%2;
         
         //cout << "q=" << q << endl;
         c = c+q*p;
         c = c/2;
      */ 
      /*
      // method 3
         c = c + bit(a,i)*b;
         if(bit(c,0)==1){
            c = c + p;
        }
         c=c/2;
        */
  }
 /* Back off if it's too big */
  if (c >= p) {
    c = c - p;
  }
  //cout << "c=" << c << endl;

  return c;
}
  

int  main()
{
    Big a, b, c, p, mu, ina, inb, outc, temp, test;
    ZZn na, nb, nc;
    int radix, inradix;
    a = 155;
    b = 174;
    p = 201;
    temp = pow((Big)2, 2*bits(p));
    temp = temp % 201;
        
    inradix = 0;
    radix = 16;// 2^4
    while (radix > 1){
        inradix = inradix+1;
        radix = radix/2;
    }
    /*  get  mu  value  */
    mu = mont_setup(bitvar(p,0,inradix), inradix);
    cout << "mu=" << mu <<endl;
        
    // using radix2
    ina = mont_mul(a, temp, p, mu, inradix);
    inb = mont_mul(b, temp, p, mu, inradix);
    cout << "ina=" << ina << endl;
    cout << "inb=" << inb << endl;
    outc =  mont_mul(ina, inb, p, mu, inradix);
    cout << "outc=" << outc << endl;
    /*  now  reduce a modulo b */
    c = mont_mul (outc,(Big)1, p, mu, inradix);
    cout << "number = " << c << endl;
    
    // using full word, no radix   
    temp = pow((Big)2, 16);
      
    mu = mont_setup(p, bits(p));
    cout << "mu2=" << mu <<endl;
       
    temp = temp % 201;
    ina = mont_full(a, temp, p, mu);
    inb = mont_full(b, temp, p, mu);
    outc = mont_full(ina, inb, p, mu);
    cout <<"ina=" << ina << endl;
    cout <<"inb=" << inb << endl;
    cout <<"outc=" << outc << endl;
    c = mont_full(outc, (Big)1, p, mu);
    cout << "number2 = " << c << endl;
      
    // Using built-in montgomery multiplication
    modulo(p);
    //prepare_monty((Big)201);
    na = a;
    nb = b;
    nc = a*b;
    c = nc;
    /*
    nres(a, ina);
    nres(b, inb);
    nres_modmult(ina, inb , outc);
    redc(outc,c);
    */
    temp= inverse(a, p);
    //temp((Big)na.getzzn());
    temp = 26*temp;
    cout << "ina=";
    otnum(na.getzzn(), stdout);
    cout << "inb=" ;
    otnum(nb.getzzn(), stdout);
    cout << "outc=";
    otnum(nc.getzzn(), stdout);
    cout << "number3=" <<  c << endl;    
    return  1;
}


Writing Mont.cpp


In [18]:
%%bash
g++ Mont.cpp big.cpp zzn.cpp miracl.a -o Mont
./Mont

mu=7
ina=83
inb=123
outc=171
number = 36
mu2=135
ina=83
inb=123
outc=171
number2 = 36
ina=26
inb=24
outc=102
number3=36


## Karatsuba Multiplication

In [50]:
%%writefile karatsuba.cpp

#include <iostream>
#include "big.h"
using namespace std;
Miracl precision(400,10);


const int _CUTOFF = 1536;

Big karatsuba_mul (Big x, Big y)
{
  Big  mask, xlow, ylow, xhigh, yhigh, a, b, c, d;
  int n, half;
  // Base case
  if ( bits(x) <= _CUTOFF || bits(y) <= _CUTOFF)  return x * y ;
  else {
        n = max(bits(x), bits(y));
        half = (n + 32) / 64 * 32 ;
        mask = Big((1 << half) - 1);
        xlow = land(x, mask);
        ylow = land(y, mask);
        xhigh = x >> half;
        yhigh = y >> half;

        a = karatsuba_mul(xhigh, yhigh);
        b = karatsuba_mul(xlow + xhigh, ylow + yhigh);
        c = karatsuba_mul(xlow, ylow);
        d = b - a - c;
        return (((a << half) + d) << half) + c;
  }
  
}
int  main()
{
    Big a,b,c;
    miracl *mip=&precision;
    char aa[100]="87609798870979228866001198790ACDFFFFEEEE756868ABAABB564312345678";
    char bb[100]="AAAA1122907567841367868987091789876582228769815290789878AAA7DCAC";
    mip->IOBASE=16;
    a = aa;
    b = bb;
    c = karatsuba_mul (a, b);
    cout << "number = " << c << endl;  
    return  1;
}

Overwriting karatsuba.cpp


In [51]:
%%bash
g++ karatsuba.cpp big.cpp miracl.a -o karatsuba
./karatsuba

number = 5A4013DFA63FA51DB98A302172759F4140FF7F8F0100F7611012289B5371283D537FCCC3202CD44AE56E00920E05CC4D0B31B29143BB076927F702854DC138A0


In [22]:
# Requires Python version >= 2.7 because of long.bit_length().
# Requirement: _CUTOFF >= 64, or else there will be infinite recursion.
_CUTOFF = 1536

def multiply(x, y):
    if x.bit_length() <= _CUTOFF or y.bit_length() <= _CUTOFF:  # Base case
        return x * y  
    else:
        n = max(x.bit_length(), y.bit_length())
        half = (n + 32) // 64 * 32
        mask = (1 << half) - 1
        xlow = x & mask
        ylow = y & mask
        xhigh = x >> half
        yhigh = y >> half

        a = multiply(xhigh, yhigh)
        b = multiply(xlow + xhigh, ylow + yhigh)
        c = multiply(xlow, ylow)
        d = b - a - c
        return (((a << half) + d) << half) + c

In [24]:
hex(multiply(0xFFFFFFFF, 0x11111111))

'0x11111110eeeeeeef'

In [25]:
hex(multiply(0x87609798870979228866001198790ACDFFFFEEEE756868ABAABB564312345678, 0xAAAA1122907567841367868987091789876582228769815290789878AAA7DCAC))

'0x5a4013dfa63fa51db98a302172759f4140ff7f8f0100f7611012289b5371283d537fccc3202cd44ae56e00920e05cc4d0b31b29143bb076927f702854dc138a0'