29 bn.tex
 @@ -49,7 +49,7 @@ \begin{document} \frontmatter \pagestyle{empty} -\title{LibTomMath User Manual \\ v0.33} +\title{LibTomMath User Manual \\ v0.34} \author{Tom St Denis \\ tomstdenis@iahu.ca} \maketitle This text, the library and the accompanying textbook are all hereby placed in the public domain. This book has been @@ -263,12 +263,12 @@ \section{Purpose of LibTomMath} \begin{center} \begin{tabular}{|l|c|c|l|} \hline \textbf{Criteria} & \textbf{Pro} & \textbf{Con} & \textbf{Notes} \\ -\hline Few lines of code per file & X & & GnuPG $= 300.9$, LibTomMath $= 76.04$ \\ +\hline Few lines of code per file & X & & GnuPG $= 300.9$, LibTomMath $= 71.97$ \\ \hline Commented function prototypes & X && GnuPG function names are cryptic. \\ \hline Speed && X & LibTomMath is slower. \\ \hline Totally free & X & & GPL has unfavourable restrictions.\\ \hline Large function base & X & & GnuPG is barebones. \\ -\hline Four modular reduction algorithms & X & & Faster modular exponentiation. \\ +\hline Five modular reduction algorithms & X & & Faster modular exponentiation for a variety of moduli. \\ \hline Portable & X & & GnuPG requires configuration to build. \\ \hline \end{tabular} @@ -284,9 +284,12 @@ \section{Purpose of LibTomMath} So it may feel tempting to just rip the math code out of GnuPG (or GnuMP where it was taken from originally) in your own application but I think there are reasons not to. While LibTomMath is slower than libraries such as GnuMP it is not normally significantly slower. On x86 machines the difference is normally a factor of two when performing modular -exponentiations. +exponentiations. It depends largely on the processor, compiler and the moduli being used. -Essentially the only time you wouldn't use LibTomMath is when blazing speed is the primary concern. +Essentially the only time you wouldn't use LibTomMath is when blazing speed is the primary concern. However, +on the other side of the coin LibTomMath offers you a totally free (public domain) well structured math library +that is very flexible, complete and performs well in resource contrained environments. Fast RSA for example can +be performed with as little as 8KB of ram for data (again depending on build options). \chapter{Getting Started with LibTomMath} \section{Building Programs} @@ -809,7 +812,7 @@ \subsection{Unsigned comparison} \index{mp\_cmp\_mag} \begin{alltt} -int mp_cmp(mp_int * a, mp_int * b); +int mp_cmp_mag(mp_int * a, mp_int * b); \end{alltt} This will compare $a$ to $b$ placing $a$ to the left of $b$. This function cannot fail and will return one of the three compare codes listed in figure \ref{fig:CMP}. @@ -1220,12 +1223,13 @@ \section{Squaring} \end{alltt} Will square $a$ and store it in $b$. Like the case of multiplication there are four different squaring -algorithms all which can be called from mp\_sqr(). It is ideal to use mp\_sqr over mp\_mul when squaring terms. +algorithms all which can be called from mp\_sqr(). It is ideal to use mp\_sqr over mp\_mul when squaring terms because +of the speed difference. \section{Tuning Polynomial Basis Routines} Both of the Toom-Cook and Karatsuba multiplication algorithms are faster than the traditional $O(n^2)$ approach that -the Comba and baseline algorithms use. At $O(n^{1.464973})$ and $O(n^{1.584962})$ running times respectfully they require +the Comba and baseline algorithms use. At $O(n^{1.464973})$ and $O(n^{1.584962})$ running times respectively they require considerably less work. For example, a 10000-digit multiplication would take roughly 724,000 single precision multiplications with Toom-Cook or 100,000,000 single precision multiplications with the standard Comba (a factor of 138). @@ -1297,22 +1301,22 @@ \section{Straight Division} \section{Barrett Reduction} Barrett reduction is a generic optimized reduction algorithm that requires pre--computation to achieve -a decent speedup over straight division. First a $mu$ value must be precomputed with the following function. +a decent speedup over straight division. First a $\mu$ value must be precomputed with the following function. \index{mp\_reduce\_setup} \begin{alltt} int mp_reduce_setup(mp_int *a, mp_int *b); \end{alltt} -Given a modulus in $b$ this produces the required $mu$ value in $a$. For any given modulus this only has to +Given a modulus in $b$ this produces the required $\mu$ value in $a$. For any given modulus this only has to be computed once. Modular reduction can now be performed with the following. \index{mp\_reduce} \begin{alltt} int mp_reduce(mp_int *a, mp_int *b, mp_int *c); \end{alltt} -This will reduce $a$ in place modulo $b$ with the precomputed $mu$ value in $c$. $a$ must be in the range +This will reduce $a$ in place modulo $b$ with the precomputed $\mu$ value in $c$. $a$ must be in the range $0 \le a < b^2$. \begin{alltt} @@ -1578,7 +1582,8 @@ \section{Root Finding} This algorithm uses the Newton Approximation'' method and will converge on the correct root fairly quickly. Since the algorithm requires raising $a$ to the power of $b$ it is not ideal to attempt to find roots for large values of $b$. If particularly large roots are required then a factor method could be used instead. For example, -$a^{1/16}$ is equivalent to $\left (a^{1/4} \right)^{1/4}$. +$a^{1/16}$ is equivalent to $\left (a^{1/4} \right)^{1/4}$ or simply +$\left ( \left ( \left ( a^{1/2} \right )^{1/2} \right )^{1/2} \right )^{1/2}$ \chapter{Prime Numbers} \section{Trial Division}
3 bn_fast_mp_invmod.c
 @@ -21,8 +21,7 @@ * Based on slow invmod except this is optimized for the case where b is * odd as per HAC Note 14.64 on pp. 610 */ -int -fast_mp_invmod (mp_int * a, mp_int * b, mp_int * c) +int fast_mp_invmod (mp_int * a, mp_int * b, mp_int * c) { mp_int x, y, u, v, B, D; int res, neg;
3 bn_fast_mp_montgomery_reduce.c
 @@ -23,8 +23,7 @@ * * Based on Algorithm 14.32 on pp.601 of HAC. */ -int -fast_mp_montgomery_reduce (mp_int * x, mp_int * n, mp_digit rho) +int fast_mp_montgomery_reduce (mp_int * x, mp_int * n, mp_digit rho) { int ix, res, olduse; mp_word W[MP_WARRAY];
5 bn_fast_s_mp_mul_digs.c
 @@ -31,8 +31,7 @@ * Based on Algorithm 14.12 on pp.595 of HAC. * */ -int -fast_s_mp_mul_digs (mp_int * a, mp_int * b, mp_int * c, int digs) +int fast_s_mp_mul_digs (mp_int * a, mp_int * b, mp_int * c, int digs) { int olduse, res, pa, ix, iz; mp_digit W[MP_WARRAY]; @@ -81,7 +80,7 @@ fast_s_mp_mul_digs (mp_int * a, mp_int * b, mp_int * c, int digs) } /* store final carry */ - W[ix] = _W; + W[ix] = _W & MP_MASK; /* setup dest */ olduse = c->used;
5 bn_fast_s_mp_mul_high_digs.c
 @@ -24,8 +24,7 @@ * * Based on Algorithm 14.12 on pp.595 of HAC. */ -int -fast_s_mp_mul_high_digs (mp_int * a, mp_int * b, mp_int * c, int digs) +int fast_s_mp_mul_high_digs (mp_int * a, mp_int * b, mp_int * c, int digs) { int olduse, res, pa, ix, iz; mp_digit W[MP_WARRAY]; @@ -72,7 +71,7 @@ fast_s_mp_mul_high_digs (mp_int * a, mp_int * b, mp_int * c, int digs) } /* store final carry */ - W[ix] = _W; + W[ix] = _W & MP_MASK; /* setup dest */ olduse = c->used;
2 bn_fast_s_mp_sqr.c
 @@ -101,7 +101,7 @@ int fast_s_mp_sqr (mp_int * a, mp_int * b) } /* store it */ - W[ix] = _W; + W[ix] = _W & MP_MASK; /* make next carry */ W1 = _W >> ((mp_word)DIGIT_BIT);
14 bn_mp_exptmod.c
 @@ -65,29 +65,37 @@ int mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) #endif } +/* modified diminished radix reduction */ +#if defined(BN_MP_REDUCE_IS_2K_L_C) && defined(BN_MP_REDUCE_2K_L_C) + if (mp_reduce_is_2k_l(P) == MP_YES) { + return s_mp_exptmod(G, X, P, Y, 1); + } +#endif + #ifdef BN_MP_DR_IS_MODULUS_C /* is it a DR modulus? */ dr = mp_dr_is_modulus(P); #else + /* default to no */ dr = 0; #endif #ifdef BN_MP_REDUCE_IS_2K_C - /* if not, is it a uDR modulus? */ + /* if not, is it a unrestricted DR modulus? */ if (dr == 0) { dr = mp_reduce_is_2k(P) << 1; } #endif - /* if the modulus is odd or dr != 0 use the fast method */ + /* if the modulus is odd or dr != 0 use the montgomery method */ #ifdef BN_MP_EXPTMOD_FAST_C if (mp_isodd (P) == 1 || dr != 0) { return mp_exptmod_fast (G, X, P, Y, dr); } else { #endif #ifdef BN_S_MP_EXPTMOD_C /* otherwise use the generic Barrett reduction technique */ - return s_mp_exptmod (G, X, P, Y); + return s_mp_exptmod (G, X, P, Y, 0); #else /* no exptmod for evens */ return MP_VAL;
3 bn_mp_exptmod_fast.c
 @@ -29,8 +29,7 @@ #define TAB_SIZE 256 #endif -int -mp_exptmod_fast (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmode) +int mp_exptmod_fast (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmode) { mp_int M[TAB_SIZE], res; mp_digit buf, mp;
3 bn_mp_mul_d.c
 @@ -57,8 +57,9 @@ mp_mul_d (mp_int * a, mp_digit b, mp_int * c) u = (mp_digit) (r >> ((mp_word) DIGIT_BIT)); } - /* store final carry [if any] */ + /* store final carry [if any] and increment ix offset */ *tmpc++ = u; + ++ix; /* now zero digits above the top */ while (ix++ < olduse) {
4 bn_mp_prime_random_ex.c
 @@ -60,15 +60,15 @@ int mp_prime_random_ex(mp_int *a, int t, int size, int flags, ltm_prime_callback /* calc the maskOR_msb */ maskOR_msb = 0; - maskOR_msb_offset = (size - 2) >> 3; + maskOR_msb_offset = ((size & 7) == 1) ? 1 : 0; if (flags & LTM_PRIME_2MSB_ON) { maskOR_msb |= 1 << ((size - 2) & 7); } else if (flags & LTM_PRIME_2MSB_OFF) { maskAND &= ~(1 << ((size - 2) & 7)); } /* get the maskOR_lsb */ - maskOR_lsb = 0; + maskOR_lsb = 1; if (flags & LTM_PRIME_BBS) { maskOR_lsb |= 3; }
3 bn_mp_reduce.c
 @@ -19,8 +19,7 @@ * precomputed via mp_reduce_setup. * From HAC pp.604 Algorithm 14.42 */ -int -mp_reduce (mp_int * x, mp_int * m, mp_int * mu) +int mp_reduce (mp_int * x, mp_int * m, mp_int * mu) { mp_int q; int res, um = m->used;
3 bn_mp_reduce_2k.c
 @@ -16,8 +16,7 @@ */ /* reduces a modulo n where n is of the form 2**p - d */ -int -mp_reduce_2k(mp_int *a, mp_int *n, mp_digit d) +int mp_reduce_2k(mp_int *a, mp_int *n, mp_digit d) { mp_int q; int p, res;
58 bn_mp_reduce_2k_l.c
 @@ -0,0 +1,58 @@ +#include +#ifdef BN_MP_REDUCE_2K_L_C +/* LibTomMath, multiple-precision integer library -- Tom St Denis + * + * LibTomMath is a library that provides multiple-precision + * integer arithmetic as well as number theoretic functionality. + * + * The library was designed directly after the MPI library by + * Michael Fromberger but has been written from scratch with + * additional optimizations in place. + * + * The library is free for all purposes without any express + * guarantee it works. + * + * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org + */ + +/* reduces a modulo n where n is of the form 2**p - d + This differs from reduce_2k since "d" can be larger + than a single digit. +*/ +int mp_reduce_2k_l(mp_int *a, mp_int *n, mp_int *d) +{ + mp_int q; + int p, res; + + if ((res = mp_init(&q)) != MP_OKAY) { + return res; + } + + p = mp_count_bits(n); +top: + /* q = a/2**p, a = a mod 2**p */ + if ((res = mp_div_2d(a, p, &q, a)) != MP_OKAY) { + goto ERR; + } + + /* q = q * d */ + if ((res = mp_mul(&q, d, &q)) != MP_OKAY) { + goto ERR; + } + + /* a = a + q */ + if ((res = s_mp_add(a, &q, a)) != MP_OKAY) { + goto ERR; + } + + if (mp_cmp_mag(a, n) != MP_LT) { + s_mp_sub(a, n, a); + goto top; + } + +ERR: + mp_clear(&q); + return res; +} + +#endif
3 bn_mp_reduce_2k_setup.c
 @@ -16,8 +16,7 @@ */ /* determines the setup value */ -int -mp_reduce_2k_setup(mp_int *a, mp_digit *d) +int mp_reduce_2k_setup(mp_int *a, mp_digit *d) { int res, p; mp_int tmp;
40 bn_mp_reduce_2k_setup_l.c
 @@ -0,0 +1,40 @@ +#include +#ifdef BN_MP_REDUCE_2K_SETUP_L_C +/* LibTomMath, multiple-precision integer library -- Tom St Denis + * + * LibTomMath is a library that provides multiple-precision + * integer arithmetic as well as number theoretic functionality. + * + * The library was designed directly after the MPI library by + * Michael Fromberger but has been written from scratch with + * additional optimizations in place. + * + * The library is free for all purposes without any express + * guarantee it works. + * + * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org + */ + +/* determines the setup value */ +int mp_reduce_2k_setup_l(mp_int *a, mp_int *d) +{ + int res; + mp_int tmp; + + if ((res = mp_init(&tmp)) != MP_OKAY) { + return res; + } + + if ((res = mp_2expt(&tmp, mp_count_bits(a))) != MP_OKAY) { + goto ERR; + } + + if ((res = s_mp_sub(&tmp, a, d)) != MP_OKAY) { + goto ERR; + } + +ERR: + mp_clear(&tmp); + return res; +} +#endif
8 bn_mp_reduce_is_2k.c
 @@ -22,9 +22,9 @@ int mp_reduce_is_2k(mp_int *a) mp_digit iz; if (a->used == 0) { - return 0; + return MP_NO; } else if (a->used == 1) { - return 1; + return MP_YES; } else if (a->used > 1) { iy = mp_count_bits(a); iz = 1; @@ -33,7 +33,7 @@ int mp_reduce_is_2k(mp_int *a) /* Test every bit from the second digit up, must be 1 */ for (ix = DIGIT_BIT; ix < iy; ix++) { if ((a->dp[iw] & iz) == 0) { - return 0; + return MP_NO; } iz <<= 1; if (iz > (mp_digit)MP_MASK) { @@ -42,7 +42,7 @@ int mp_reduce_is_2k(mp_int *a) } } } - return 1; + return MP_YES; } #endif
40 bn_mp_reduce_is_2k_l.c
 @@ -0,0 +1,40 @@ +#include +#ifdef BN_MP_REDUCE_IS_2K_L_C +/* LibTomMath, multiple-precision integer library -- Tom St Denis + * + * LibTomMath is a library that provides multiple-precision + * integer arithmetic as well as number theoretic functionality. + * + * The library was designed directly after the MPI library by + * Michael Fromberger but has been written from scratch with + * additional optimizations in place. + * + * The library is free for all purposes without any express + * guarantee it works. + * + * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org + */ + +/* determines if reduce_2k_l can be used */ +int mp_reduce_is_2k_l(mp_int *a) +{ + int ix, iy; + + if (a->used == 0) { + return MP_NO; + } else if (a->used == 1) { + return MP_YES; + } else if (a->used > 1) { + /* if more than half of the digits are -1 we're sold */ + for (iy = ix = 0; ix < a->used; ix++) { + if (a->dp[ix] == MP_MASK) { + ++iy; + } + } + return (iy >= (a->used/2)) ? MP_YES : MP_NO; + + } + return MP_NO; +} + +#endif
3 bn_mp_to_signed_bin.c
 @@ -16,8 +16,7 @@ */ /* store in signed [big endian] format */ -int -mp_to_signed_bin (mp_int * a, unsigned char *b) +int mp_to_signed_bin (mp_int * a, unsigned char *b) { int res;
27 bn_mp_to_signed_bin_n.c
 @@ -0,0 +1,27 @@ +#include +#ifdef BN_MP_TO_SIGNED_BIN_N_C +/* LibTomMath, multiple-precision integer library -- Tom St Denis + * + * LibTomMath is a library that provides multiple-precision + * integer arithmetic as well as number theoretic functionality. + * + * The library was designed directly after the MPI library by + * Michael Fromberger but has been written from scratch with + * additional optimizations in place. + * + * The library is free for all purposes without any express + * guarantee it works. + * + * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org + */ + +/* store in signed [big endian] format */ +int mp_to_signed_bin_n (mp_int * a, unsigned char *b, unsigned long *outlen) +{ + if (*outlen < (unsigned long)mp_signed_bin_size(a)) { + return MP_VAL; + } + *outlen = mp_signed_bin_size(a); + return mp_to_signed_bin(a, b); +} +#endif
3 bn_mp_to_unsigned_bin.c
 @@ -16,8 +16,7 @@ */ /* store in unsigned [big endian] format */ -int -mp_to_unsigned_bin (mp_int * a, unsigned char *b) +int mp_to_unsigned_bin (mp_int * a, unsigned char *b) { int x, res; mp_int t;
27 bn_mp_to_unsigned_bin_n.c
 @@ -0,0 +1,27 @@ +#include +#ifdef BN_MP_TO_UNSIGNED_BIN_N_C +/* LibTomMath, multiple-precision integer library -- Tom St Denis + * + * LibTomMath is a library that provides multiple-precision + * integer arithmetic as well as number theoretic functionality. + * + * The library was designed directly after the MPI library by + * Michael Fromberger but has been written from scratch with + * additional optimizations in place. + * + * The library is free for all purposes without any express + * guarantee it works. + * + * Tom St Denis, tomstdenis@iahu.ca, http://math.libtomcrypt.org + */ + +/* store in unsigned [big endian] format */ +int mp_to_unsigned_bin_n (mp_int * a, unsigned char *b, unsigned long *outlen) +{ + if (*outlen < (unsigned long)mp_unsigned_bin_size(a)) { + return MP_VAL; + } + *outlen = mp_unsigned_bin_size(a); + return mp_to_unsigned_bin(a, b); +} +#endif
3 bn_mp_unsigned_bin_size.c
 @@ -16,8 +16,7 @@ */ /* get the size for an unsigned equivalent */ -int -mp_unsigned_bin_size (mp_int * a) +int mp_unsigned_bin_size (mp_int * a) { int size = mp_count_bits (a); return (size / 8 + ((size & 7) != 0 ? 1 : 0));
35 bn_s_mp_exptmod.c
 @@ -21,11 +21,12 @@ #define TAB_SIZE 256 #endif -int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) +int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y, int redmode) { mp_int M[TAB_SIZE], res, mu; mp_digit buf; int err, bitbuf, bitcpy, bitcnt, mode, digidx, x, y, winsize; + int (*redux)(mp_int*,mp_int*,mp_int*); /* find window size */ x = mp_count_bits (X); @@ -72,9 +73,18 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) if ((err = mp_init (&mu)) != MP_OKAY) { goto LBL_M; } - if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) { - goto LBL_MU; - } + + if (redmode == 0) { + if ((err = mp_reduce_setup (&mu, P)) != MP_OKAY) { + goto LBL_MU; + } + redux = mp_reduce; + } else { + if ((err = mp_reduce_2k_setup_l (P, &mu)) != MP_OKAY) { + goto LBL_MU; + } + redux = mp_reduce_2k_l; + } /* create M table * @@ -96,11 +106,14 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) } for (x = 0; x < (winsize - 1); x++) { + /* square it */ if ((err = mp_sqr (&M[1 << (winsize - 1)], &M[1 << (winsize - 1)])) != MP_OKAY) { goto LBL_MU; } - if ((err = mp_reduce (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) { + + /* reduce modulo P */ + if ((err = redux (&M[1 << (winsize - 1)], P, &mu)) != MP_OKAY) { goto LBL_MU; } } @@ -112,7 +125,7 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) if ((err = mp_mul (&M[x - 1], &M[1], &M[x])) != MP_OKAY) { goto LBL_MU; } - if ((err = mp_reduce (&M[x], P, &mu)) != MP_OKAY) { + if ((err = redux (&M[x], P, &mu)) != MP_OKAY) { goto LBL_MU; } } @@ -161,7 +174,7 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) if ((err = mp_sqr (&res, &res)) != MP_OKAY) { goto LBL_RES; } - if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) { + if ((err = redux (&res, P, &mu)) != MP_OKAY) { goto LBL_RES; } continue; @@ -178,7 +191,7 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) if ((err = mp_sqr (&res, &res)) != MP_OKAY) { goto LBL_RES; } - if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) { + if ((err = redux (&res, P, &mu)) != MP_OKAY) { goto LBL_RES; } } @@ -187,7 +200,7 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) if ((err = mp_mul (&res, &M[bitbuf], &res)) != MP_OKAY) { goto LBL_RES; } - if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) { + if ((err = redux (&res, P, &mu)) != MP_OKAY) { goto LBL_RES; } @@ -205,7 +218,7 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) if ((err = mp_sqr (&res, &res)) != MP_OKAY) { goto LBL_RES; } - if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) { + if ((err = redux (&res, P, &mu)) != MP_OKAY) { goto LBL_RES; } @@ -215,7 +228,7 @@ int s_mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y) if ((err = mp_mul (&res, &M[1], &res)) != MP_OKAY) { goto LBL_RES; } - if ((err = mp_reduce (&res, P, &mu)) != MP_OKAY) { + if ((err = redux (&res, P, &mu)) != MP_OKAY) { goto LBL_RES; } }
5 bncore.c
 @@ -20,11 +20,12 @@ CPU /Compiler /MUL CUTOFF/SQR CUTOFF ------------------------------------------------------------- Intel P4 Northwood /GCC v3.4.1 / 88/ 128/LTM 0.32 ;-) + AMD Athlon64 /GCC v3.4.4 / 74/ 124/LTM 0.34 */ -int KARATSUBA_MUL_CUTOFF = 88, /* Min. number of digits before Karatsuba multiplication is used. */ - KARATSUBA_SQR_CUTOFF = 128, /* Min. number of digits before Karatsuba squaring is used. */ +int KARATSUBA_MUL_CUTOFF = 74, /* Min. number of digits before Karatsuba multiplication is used. */ + KARATSUBA_SQR_CUTOFF = 124, /* Min. number of digits before Karatsuba squaring is used. */ TOOM_MUL_CUTOFF = 350, /* no optimal values of these are known yet so set em high */ TOOM_SQR_CUTOFF = 400;
12 changes.txt
 @@ -1,3 +1,15 @@ +February 12th, 2005 +v0.34 -- Fixed two more small errors in mp_prime_random_ex() + -- Fixed overflow in mp_mul_d() [Kevin Kenny] + -- Added mp_to_(un)signed_bin_n() functions which do bounds checking for ya [and report the size] + -- Added "large" diminished radix support. Speeds up things like DSA where the moduli is of the form 2^k - P for some P < 2^(k/2) or so + Actually is faster than Montgomery on my AMD64 (and probably much faster on a P4) + -- Updated the manual a bit + -- Ok so I haven't done the textbook work yet... My current freelance gig has landed me in France till the + end of Feb/05. Once I get back I'll have tons of free time and I plan to go to town on the book. + As of this release the API will freeze. At least until the book catches up with all the changes. I welcome + bug reports but new algorithms will have to wait. + December 23rd, 2004 v0.33 -- Fixed "small" variant for mp_div() which would munge with negative dividends... -- Fixed bug in mp_prime_random_ex() which would set the most significant byte to zero when
