GitHub - sethtroisi/gmp-ecm: Imported from svn://scm.gforge.inria.fr/svnroot/ecm/

sethtroisi / gmp-ecm Public
Notifications You must be signed in to change notification settings
Fork 8
Star 21
Imported from svn://scm.gforge.inria.fr/svnroot/ecm/
ecm.gforge.inria.fr/
GPL-3.0, LGPL-3.0 licenses found
Licenses found

21 stars 8 forks Branches Tags Activity
Star
Notifications
Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 2,930 Commits
aprtcle		aprtcle
arm		arm
build.vc12		build.vc12
build.vc14		build.vc14
build.vc15		build.vc15
build.vs		build.vs
generic		generic
hecm		hecm
hppa		hppa
ia64		ia64
m4		m4
mips		mips
nodist		nodist
powerpc32		powerpc32
powerpc64		powerpc64
sparc64		sparc64
techdocs		techdocs
x86		x86
x86_64		x86_64
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.mailmap		.mailmap
AUTHORS		AUTHORS
COPYING		COPYING
COPYING.LIB		COPYING.LIB
ChangeLog		ChangeLog
Fgw.c		Fgw.c
INSTALL-ecm		INSTALL-ecm
M877.save		M877.save
M997.save		M997.save
Makefile.am		Makefile.am
NEWS		NEWS
README		README
README.dev		README.dev
README.dev.asm		README.dev.asm
README.gpu		README.gpu
README.lib		README.lib
TODO		TODO
TODO.gpu		TODO.gpu
TODO.sp		TODO.sp
Z2102.n		Z2102.n
acinclude.m4		acinclude.m4
addlaws.c		addlaws.c
addlaws.h		addlaws.h
auxarith.c		auxarith.c
auxi.c		auxi.c
auxlib.c		auxlib.c
b1_ainc.c		b1_ainc.c
basicdefs.h		basicdefs.h
batch.c		batch.c
bench_mulredc.c		bench_mulredc.c
bestd.c		bestd.c
c155		c155
c270		c270
candi.c		candi.c
cgbn_stage1.cu		cgbn_stage1.cu
cgbn_stage1.h		cgbn_stage1.h
champions.h		champions.h
check.mpl		check.mpl
check.pari		check.pari
check.sage		check.sage
check_gpuecm.sage		check_gpuecm.sage
checkprob		checkprob
configure.ac		configure.ac
coverage.sh		coverage.sh
coverage_gpu.sh		coverage_gpu.sh
cudacommon.cu		cudacommon.cu
cudacommon.h		cudacommon.h
cudawrapper.c		cudawrapper.c
dummy.save		dummy.save
dummy2.save		dummy2.save
ecm-ecm.h		ecm-ecm.h
ecm-gmp.h		ecm-gmp.h
ecm-gpu.h		ecm-gpu.h
ecm-impl.h		ecm-impl.h
ecm-params.h		ecm-params.h
ecm.1		ecm.1
ecm.c		ecm.c
ecm.h.in		ecm.h.in
ecm.xml		ecm.xml
ecm2.c		ecm2.c
ecm_int.h		ecm_int.h
ecm_ntt.c		ecm_ntt.c
ecmbench		ecmbench
ecmfactor.c		ecmfactor.c
ecmprob.magma		ecmprob.magma
eval.c		eval.c
factor.c		factor.c
getprime_r.c		getprime_r.c
getprime_r.h		getprime_r.h
gpu_throughput_test.sh		gpu_throughput_test.sh
howto-gitlab-making-releases.txt		howto-gitlab-making-releases.txt
ks-multiply.c		ks-multiply.c
listz.c		listz.c
listz_handle.c		listz_handle.c
listz_handle.h		listz_handle.h
longlong.h		longlong.h
lucas.c		lucas.c
main.c		main.c
makesmooth.gp		makesmooth.gp
Repository files navigation

[note: this file is outdated now that GMP-ECM moved to gitlab.]

This is the README file for GMP-ECM. (See INSTALL-ecm for installing GMP-ECM
and the ecm library, and README.lib for using the ecm library.)

Table of contents of this file:
1. Basic usage.
2. How to use P-1, P+1, and ECM efficiently?
3. Extra factors and Brent-Suyama's extension.
4. Memory usage.
5. Expression syntax reference for GMP-ECM's syntax parser.
6. Options -param, -sigma, -x0, -y0, -A, -torsion.
7. Options -save, -resume and -chkpnt.
8. How to get the best of GMP-ECM?
9. GMP-ECM and GPU.
11. Record factors.
11. Known problems.
12. GMP-ECM packages.

##############################################################################

1. Basic usage

GMP-ECM reads the numbers to be factored from stdin (one number on each
line) and requires a numerical parameter, the stage 1 bound B1. A reasonable 
stage 2 bound B2 for the given B1 is chosen by default, but can be overridden 
by a second numerical parameter. (For a given B1 value, the "default" B2 might
differ between ECM, P-1, and P+1.) By default, GMP-ECM uses the ECM factoring 
algorithm.

Example: To run one curve of ECM with B1=1000000 on each number in the file
"composites", run

ecm 1000000 < composites

To use a B2 value of ~5*10^8 instead of the default value of ~10^9, run

ecm 1000000 5e8 < composites

Scientific notation is accepted for B1 and B2 values. The B1 value might be
of the form B1min-B1, in which case primes up to B1min are skipped, for
example with 1e6-1e9, primes up to 10^6 are skipped. In particular using
the same value enables one to skip stage 1, for example 1e9-1e9 (this is
useful together with -resume when stage 1 was computed with another tool
or on another computer). The actual B2 value used may be larger than the
specified value to let parameters satisfy some conditions imposed by the
stage 2 algorithm.

To run one curve with B1=11e7 on M1061, simply do:

echo "2^1061-1" | ecm 11e7

To run more than one ECM curve on each input number, use the -c parameter.

Example: to run 100 curves with B1=1000000 and default B2 on each number 
in "composites", run

ecm -c 100 1000000 < composites

To use the P-1 or P+1 factoring methods, use the -pm1 or -pp1 parameter,
respectively.

Example: to use the P-1 method with B1=10^9 on all numbers in the file 
"composites", run

ecm -pm1 1e9 < composites

Note that, unlike for ECM, using the same B1,B2 bounds on one number is 
quite useless for P-1, and of limited use for P+1. See "2. How to use P-1, 
P+1, and ECM efficiently?"

##############################################################################

2. How to use P-1, P+1, and ECM efficiently?

The P-1 method works well when the input number has a prime factor P such
that P-1 is "smooth", i.e., has all its prime factor less or equal the 
step 1 bound B1, except one which may be less or equal the second step
bound B2. For P=67872792749091946529, we have P-1 = 2^5 * 11 * 17 * 19 *
43 * 149 * 8467 * 11004397, so this factor will be found as long as B1 >= 8467
and B2 >= 11004397:

$ echo 67872792749091946529 | ./ecm -pm1 -x0 2809890345 8467 11004397
GMP-ECM ... [powered by GMP ...] [P-1]
Input number is 67872792749091946529 (20 digits)
Using B1=8467, B2=6710-19370830, x0=2809890345
Step 1 took 3ms
Step 2 took 14ms
********** Factor found in step 2: 67872792749091946529
Found input number N

There is no need to run P-1 several times with the same B1 and B2 as there
is for ECM, since a factor found with one seed will (almost always) be found 
by another one.

Note: if a factor of P-1 if known, you can specify it with -go. For example
it is known that the Fermat number 2^(2^n)+1 has factors of the form
k*2^(n+2)+1, thus 2^(n+2) is a known factor of P-1. The -go expression has
a special placeholder 'N' which stands for the input number, for example:

$ echo "2^(2^12)+1" | ecm -go "N-1" -pm1 1e6

The P+1 method works well when the input number has a prime factor P such
that P+1 is "smooth". For P=4190453151940208656715582382315221647, we have
P+1 = 2^4 * 283 * 2423 * 21881 * 39839 * 1414261 * 2337233 * 132554351, so
this factor will be found as long as B1 >= 2337233 and B2 >= 132554351:

$ echo 4190453151940208656715582382315221647 | ./ecm -pp1 -x0 7 2337233 132554351
GMP-ECM ... [powered by GMP ...] [P+1]
Input number is 4190453151940208656715582382315221647 (37 digits)
Using B1=2337233, B2=2324738-343958122, x0=7
Step 1 took 750ms
Step 2 took 120ms
********** Factor found in step 2: 4190453151940208656715582382315221647
Found input number N

However not all seeds will succeed: only half of the seeds 'x0' work for P+1
(namely those where the Jacobi symbol of x0^2-4 and P is -1.) Unfortunately, 
since P is usually not known in advance, there is no way to ensure that this 
holds. However, if the seed is chosen randomly, there is a probability of 
about 1/2 that it will give a Jacobi symbol of -1 (i.e., the factor P will 
be found if P+1 is smooth enough). A rule of thumb is to run 3 times P+1 
with different random seeds. The seeds 2/7 and 6/5 have a slightly higher 
chance of success than average as they lead to a group order divisible by
6 or 4, respectively. When factoring Fibonacci numbers F_n or Lucas 
numbers L_n, using the seed 23/11 ensures that the group order is
divisible by 2n, making other P+1 (and probably P-1) work unnecessary.

As of version 6.2, a new stage 2 for the P-1 and P+1 algorithms is 
implemented. It uses less memory and is faster than the previous code, 
thus allowing larger B2 values. If GMP-ECM is configured with the
"--enable-openmp" flag and is compiled with a compiler that implements 
OpenMP, it uses multi-threading for computation of polynomial roots and
NTT multiplication. When not using the NTT, it benefits from multi-threading
only in the computation of roots phase. The number of threads to use can 
be controlled with the OMP_NUM_THREADS environment variable. Unlike the 
previous generic stage 2, the new stage 2 cannot use the Brent-Suyama 
extension (-power and -dickson parameters). Specifying these options on 
the command line forces use of the generic stage 2. Note: the notation of
the parameters follows that in the paper, the number of multi-point
evaluations (similar to "blocks") is given by s_2. You can specify a lower
limit for s_2 by the -k command line parameter.

The ECM method is a probabilistic method, and can be viewed in some sense
as a generalization of the P-1 and P+1 method, where we only require that
P+t+1 is smooth, where t depends on the curve we use and satisfies 
|t| <= 2*P^(1/2) (Hasse's theorem). The optimal B1 and B2 bounds have 
to be chosen according to the (usually unknown) size of P. The following
table gives a set of nearly optimal B1 and B2 pairs, with the corresponding
expected number of curves to find a factor of given size (column "-power 1"
does not take into account the extra factors found by Brent-Suyama's exten-
sion, whereas column "default poly" takes them into account, with the poly-
nomial used by default: D(n) means Dickson's polynomial of degree n):

       digits D  optimal B1   default B2           expected curves
                                                       N(B1,B2,D)
                                              -power 1         default poly
          20       11e3         1.9e6             74               74 [x^1]
          25        5e4         1.3e7            221              214 [x^2]
          30       25e4         1.3e8            453              430 [D(3)]
          35        1e6         1.0e9            984              904 [D(6)]
          40        3e6         5.7e9           2541             2350 [D(6)]
          45       11e6        3.5e10           4949             4480 [D(12)]
          50       43e6        2.4e11           8266             7553 [D(12)]
          55       11e7        7.8e11          20158            17769 [D(30)]
          60       26e7        3.2e12          47173            42017 [D(30)]
          65       85e7        1.6e13          77666            69408 [D(30)]

          Table 1: optimal B1 and expected number of curves to find a
          factor of D digits with GMP-ECM.

After performing the expected number of curves from Table 1, the
probability that a factor of D digits was missed is exp(-1), i.e.,
about 37%. After twice the expected number of curves, it is exp(-2),
i.e., about 14%, and so on.

Example: after performing 8266 curves with B1=43e6 and B2=2.4e11
(or 7553 curves with -dickson 12), the probability to miss a 50-digit
factor is about 37%.

From version 6.0 on, GMP-ECM prints the expected number of curves and expected
time to find factors of different sizes in verbose mode (option -v). This
makes it easy to further optimize parameters for a certain factor size if 
desired: simply try to minimize the expected time.

(lengthy NOTE: The order of an elliptic curve with Montgomery parameterization
is known to be divisible by 4. Depending on the parametrization that is used
(see Section 6) this number is increased in different ways. Let us call T the
known torsion (depending on the parametrization). One can assume that the
probability that the order is B1,B2 smooth should be about as great as for a
random integer 1/Tth in value.  However, Montgomery observed that the order
behaves even nicer than that: heuristically, it seems that the order is as more
likely to be smooth than a random integer about 1/Tth. In "Finding ECM-Friendly
Curves through a Study of Galois Properties", Barbulescu, Bos, Bouvier,
Kleinjung and Montgomery showed how to computed the average valuation that
should be used instead of the average valuation of random number.  This is the
values we use in GMP-ECM and the computed probabilities match those observed in
experiments very well. This however means that the so computed values for the
expected number of curves for given B1,B2 values and factor sizes may not match
those published in the previous literature.
The factor GMP-ECM uses is defined as ECM_EXTRA_SMOOTHNESS in rho.c,
EXTRA_SMOOTHNESS_SQUARE and EXTRA_SMOOTHNESS 32BITS_D in ecm.c depending on the
parametrization. You can change it to
  ECM_EXTRA_SMOOTHNESS 3.0
  EXTRA_SMOOTHNESS_SQUARE 0.333333333
  EXTRA_SMOOTHNESS 32BITS_D 0.333333333  
if you want to reproduce the more pessimistic values found in the literature.)

In summary, we advise the following method:

0 - choose a target factor size of D digits
1 - choose optimal B1 and B2 values to find factors of D digits (cf Table 1)
2 - run once P-1 with 10*B1, and the default B2 chosen by GMP-ECM
3 - (optional) run 3 times P+1 with 5*B1, and the default B2
4 - run N(B1,B2,D) times ECM with those B1 and B2, where N(B1,B2,D) is the
    expected number of ECM curves with step 1 bound B1, step 2 bound B2,
    to find a factor of D digits (cf above table). 
5 - if no factor is found, either increase D by 5 digits and go to 0, or use 
    another factorization method (MPQS, NFS)

Note: if a factor is found in steps 2, 3 or 4, simply continue the current
      step with the remaining cofactor (if composite). There is no need
      to start again from 0, since the cofactor was already tested, too.

##############################################################################

3. Extra factors and Brent-Suyama's extension.

GMP-ECM may sometimes find some "extra" factors, such that one factor of P-1, 
P+1 or P+t+1 exceeds the step 2 bound B2, thanks to Brent-Suyama's extension. 
Let's explain how it works for P-1, since it's simpler. The classical step 2 
(without Brent-Suyama's extension) considers s^(j*d) mod N and s^i mod N, 
where N is the number to factor, and s is the residue computed in stage 1. 
Here, d is fixed, and the integers i and j vary in two sets so that j*d-i 
covers all primes in [B1, B2]. Now consider a polynomial f(x), and compute 
s^f(j*d) and s^f(i) instead of s^(j*d) and s^i [thus the classical step 2 
corresponds to f(x)=x^1]. Then P will be found whenever all but one of the 
factors of P-1 are <= B1, and one factor divides some f(j*d) - f(i):

$ echo 1207946164033269799036708918081 | ./ecm -pm1 -k 3 -power 12 286493 25e6
GMP-ECM ... [powered by GMP ...] [P-1]
Input number is 1207946164033269799036708918081 (31 digits)
Using B1=286493, B2=30806172, polynomial x^12, x0=1548711558
Step 1 took 320ms
Step 2 took 564ms
********** Factor found in step 2: 1207946164033269799036708918081
Found input number N

Here the largest factor of P-1 is 83957197, which is 3.35 times larger than B2.
Warning: these "extra" factorizations may not be reproducible from one
version of GMP-ECM to another one, since they depend on some internal
parameters that may change.

For P-1 with the generic stage 2, the degree of the Brent-Suyama polynomial 
should be even. Since i^2k - (j*d)^2k = (i^k - (j*d)^k)(i^k + (j*d)^k), 
this allows testing two values symmetric around a multiple of d 
simultaneously, halving the amount of computation required in stage 2. 
P+1 with the generic stage 2 and ECM do this inherently.

The new fast stage 2 for P-1 and P+1 does not support the Brent-Suyama 
extension. By default, the fast stage 2 is used for P-1 and P+1; 
giving a -power or -dickson parameter on the command line forces use of
the previous, generic stage 2. It is recommended to use the new stage 2
(from version 6.2) for P-1 and P+1, which is the default: it is so much
faster that it largely compensates the few extra factors that are not found
because Brent-Suyama's extension is not available.

The default polynomial used for ECM with a given B2 should be near optimal, 
i.e., give only a marginal overhead in step 2, while enabling extra factors.

##############################################################################

4. Memory usage.

Step 1 does not require much memory: O(n) for an input number of n digits.
Step 2 may be quite memory expensive, especially for large B2, since its
efficient algorithms use some large tables. To reduce the memory usage of
step 2, you may increase the 'k' parameter, which controls the number of
"blocks" performed in step 2. Multiplying the default value of k by 4 will
decrease the memory usage by a factor of 2. For example with  B2=1e10 and
a 155-digit number, step 2 requires about 55MB with the default k=4, but
only 27MB with k=16. Increasing k does, however, slightly increase the time
required for step 2 (see section "How to get the best of GMP-ECM?").
An estimation of the memory usage is given at the start of stage 2:

$ echo 95209938255048826235189575712705128366296557149606415206280987204268594538412191641776798249266895999715600261737863698825644292938050707507901970225804581 > c155
$ ecm -v -k 4 10 1e10 < c155
...
Estimated memory usage: 55M
...
Step 2 took 18649ms

$ ecm -v -k 16 10 1e10 < c155
...
Estimated memory usage: 27M
...
Step 2 took 26972ms

Another way is to use the -treefile parameter, which causes some of the 
tables to be stored on disk instead of in memory. Using the option
"-treefile /var/tmp/ecmtree" will create the files "/var/tmp/ecmtree.1", 
"/var/tmp/ecmtree.2" etc. The files are deleted upon completion of stage 2:

$ ecm -v -treefile /tmp/ecmtree -k 4 10 1e10 < c155
...
Estimated memory usage: 36M
...
Step 2 took 18648ms

Due to time consuming disk I/O, this will cause stage 2 to take somewhat 
longer. How much memory is saved depends on stage 2 parameters, but a
typical value is that memory use is reduced by a factor of about 1.5.
Increasing the number of blocks with -k also reduces the amount of data that 
needs to get written to disk, thus reducing disk I/O time. Combining these 
parameters is a very effective way of reducing memory use.

Up from version 6.1, there is still another (better) possibility, with the
-maxmem option. The command-line -maxmem nnn option tells GMP-ECM to use at
most nnn MB in stage 2. It is better than -k because it takes into account
the size of the number to be factored, and automatically adjusts the number
of blocks to use:

$ ./ecm -v -maxmem 40 10 1e10 < c155
...
dF=8192, k=15, d=79170, d2=11, i0=-10
...
Estimated memory usage: 27M
...
Step 2 took 25456ms

##############################################################################

5. Expression syntax reference for GMP-ECM's syntax parser.

GMP-ECM can handle several kinds of expressions as input numbers. Here is the
syntax that is handled:

1.  Raw decimal numbers like 123456789
2.  Comments can be placed in the file.  The C++ "one line comment"  // 
    is used.  Everything after the // on a line (including the //) is ignored.
    Warning: no input number should appear on such a comment line.
3.  Any white space (space, tab, end of line) is ignored.  However, the "end of
    line" is used to end the expression. For example, processing this:
    1 2 3 4 5 6 7 8 9
    would be the same as processing
    123456789
4.  "common" arithmetic expressions (* / + - %), the period '.' might be used
    in place of * for multiply, and - can be unary minus (e.g., -55555551).
    Example: echo "3*5+2" | ./ecm 100
5.  Grouping  ( [ { for start of group (which symbol is used does not matter)
    and ) ] } to end a group (again all 3 symbols mean the SAME thing).
6.  Exponentiation with the ^ character  (i.e., 2^24 is the same as 16777216).
    Example: echo "2^24+1" | ./ecm 100
7.  Simple factorial using the exclamation point !  character.  Example is
    53! == 1*2*3*4...*52*53. Example: echo '53!+1' | ./ecm 1e2
8.  Multi-factorial as in: n!m with an example:  15!3 == 15.12.9.6.3.
9. Simple Primorial using the # character with example of 11# == 2*3*5*7*11
10. Reduced Primorial n#m with example of 17#5 == 5.7.11.13.17
11. Functions are possible with the expression parser. Currently, the only 
    available function is Phi(n,x), which is the value of the n-th cyclotomic
    polynomial at x, however other functions should be easy to add.

Note: Expressions are maintained as much as possible (even if the expression
becomes longer than the decimal expansion). Expressions are output as cofactors
(if the input was an expression), and are stored into save/resume files
(again if and only if the original input was an expression, and not an expanded
decimal number). When a factor is found, the cofactor expression is of the
form (original_expression)/factor_found (see however option -cofdec):

$ echo "3*2^210+1" | ./ecm -sigma 4007218240 2500
GMP-ECM ... [powered by GMP ...] [ECM]
Input number is 3*2^210+1 (64 digits)
Using B1=2500, B2=186156, polynomial x^1, sigma=4007218240
Step 1 took 16ms
Step 2 took 16ms
********** Factor found in step 2: 1358437
Found probable prime factor of  7 digits: 1358437
Probable prime cofactor (3*2^210+1)/1358437 has 58 digits

##############################################################################

6. Options -param, -sigma, -x0, -y0, -A, -torsion.

This section explains the behavior of the options 'param', 'sigma', 'x0', 'y0',
'A' and 'torsion' when GMP-ECM is used to run ECM.

The parameters A, x0 and y0 can have integer or rational values, for example
-A 22/7 -x0 1/3 -y0 2/7.

a) with -param <= 4 (default value is 0): curves in Montgomery form
-------------------------------------------------------------------

In stage 1, the elliptic curves used are of the following Montgomery form:
b*y^2 = x^3 + A*x^2 + x. The starting point in ( x0 : : 1).
In this case we do not need the value of y0.

Either sigma is given, in which case the values of A and x0 are deduced from
sigma and the params option (see below).
Or A and x0 are given (in which case both must be given).

The -param option is used to choose the parametrization from which the curves
are taken. It can take 4 values (s is the 'sigma' parameter):
    0: use Suyama parametrization. It was the parametrization used by previous
    versions of GMP-ECM.
	                         u = s^2-5, v = 4*s,
                      A = (v-u)^3*(3*u+v)/(4*u^3*v)-2,
	                               x0=u^3/v^3.
    1: (only for 64-bit processors)
                                A = 4*s^2/2^64-2,
                                  x0 = 2.
    2: use an elliptic parametrization to compute d(s) such that the curve has 
       a 6-torsion point.
                                A = 4*d(s)-2,
                                  x0 = 2.
    3: it is mostly used for the gpu computation.
                                 A = 4*s/2^32-2,
                                  x0 = 2.

Generally, the parameter s is chosen randomly but the -sigma option can be
used to choose its value.

* with -param 0, there is no constraint on sigma
* with -param 1 (only for 64-bit processors), s^2 should fit in a 64-bit
  integer, thus s < 2^32
* with -param 2, there is no constraint on sigma
* with -param 3 (mostly used for GPU), s should fit in a 32-bit word,
  thus s < 2^32

In the case where only "-sigma s" is used, "-param 0" is assumed in order to be
compatible with older version of GMP-ECM. The above is also true when resuming
from a file (see Section 7). In the case where only "-param p" is used, the
parameter is chosen at random by the ecm library. In the case where neither
"-sigma" nor "-param" is used, the choice of the parametrization is left to ecm
library and the parameter is chosen at random. In the case where the two
options are used, "-param p -sigma s" can be shorten in "-sigma p:s".

The curve can also be chosen with the -A option which directly specifies the
values of the coefficient a of the curve. In this case the -x0 option is
mandatory.

The -x0 option can be used to override the x0 value in the case where param=0
and is mandatory in the case the curve is given with the -A option. In the case
where param=1, param=2 or param=3, x0 must be equal to 2, so this should not be
used.

b) with param = 5: curves in Weierstrass form
---------------------------------------------

In this case we use curves in Weierstrass form:

       E: y^2 = x^3 + A*x + b

The user must provide all of A, x0 and y0. GMP-ECM will consider that
(x0, y0) is a point on E, where b = y0^2-x0^3-A*x0 mod N. Step 2 is performed
as usual, since it already uses the Weierstrass form.
The options of Section 7 apply.

Applications include using curves with some exotic prescribed torsion
subgroup (see d below), or CM (Complex Multiplication) curves.

c) with param = 6, 7: curves in (twisted) Hessian form
------------------------------------------------------

These curves have torsion group Z3xZ3 over Q(sqrt(-3)) and are useful when
the prime factor p we are looking for is known to satisfy p = 1 mod 3. 
Since p = 1 mod 3 is quite frequent, a try at this version is not unthinkable
in general.

Curves in projective Hessian form have equation 
    X^3+Y^3+Z^3=D*X*Y*Z.
They should be used with

                 -param 6 -A <D> -x0 <x0> -y0 <y0>

with D^3 mod N != 1, and (x0=X0/Z0, y0=Y0/Z0) is a base point on the curve.

Curves in projective twisted Hessian form have equation 
    a*X^3+Y^3+Z^3=d*X*Y*Z.
They should be used with

                 -param 7 -A <D> -x0 <x0> -y0 <y0>

where D=a^3/d and (x0=X0/Z0, y0=Y0/Z0) is a base point on the curve.

Ref: JKL-ECM in proceedings ANTS-XII.

d) -torsion:
------------
This option enables the user to ask for a given curve with specific torsion
group (over the rationals). For instance

      	    	 -torsion Z5 -sigma 2

will generate a curve E over the rationals with torsion group Z/5Z and a point
of infinite order, both generated from parameter sigma = 2.

Note that in the present case, -param is overriden by whatever "good" form
the curves are built with that prescribed torsion.

Available torsion groups (over the rationals) are: 
* Z5: for sigma != 0, -1/2, -1/3, -1/4, cf. [1]
* Z7: [1]
* Z9: [1]
* Z10: [1]
* Z2xZ8: [1]
* Z3xZ3
* Z4xZ4
References: [1] Atkin/Morain, Math. Comp., 1993.

##############################################################################

7. Options -save, -resume and -chkpnt.

These -save and -resume options are useful to save the current state of the 
computation after step 1, or to exchange data with other software. It allows 
to perform step 1 with GMP-ECM, and step 2 with another software (or 
vice-versa). 
Note: the residue from the end of stage 1 gets written to the file only 
after stage 2, if stage 2 is performed in the same program run. This way, 
if a factor is found, the save file entry will contain the new cofactor 
(if composite) or will be omitted (if cofactor is a probable prime). For 
periodic saving during stage 1 for crash recovery, use -chkpnt, 
described below.

Here is an example how to reuse some P-1 computation:

$ cat c71
13155161912808540373988986448257115022677318870175067553764004308210487
$ ./ecm -save toto -pm1 -mpzmod -x0 2 5000000 < c71
GMP-ECM ... [powered by GMP ...] [P-1]
Input number is 13155161912808540373988986448257115022677318870175067553764004308210487 (71 digits)
Using B1=5000000, B2=352526802, polynomial x^24, x0=2
Step 1 took 3116ms
Step 2 took 2316ms

The file "toto" now contains some information about the method used, the step
1 bound, the number to factor, the value X at the end of step 1 (in hexa-
decimal), and a checksum to verify that no data was corrupted:

$ cat toto
METHOD=P-1; B1=5000000; N=13155161912808540373988986448257115022677318870175067553764004308210487; X=0x12530157ae22ae14d54d6a5bc404ae9458e54032c1bb2ab269837d1519f; CHECKSUM=2287710189; PROGRAM=GMP-ECM 6.2; X0=0x2; WHO=zimmerma@clafoutis.loria.fr; TIME=Sat Apr 12 13:41:01 2008;

Then one can resume the computation with larger B1 and/or B2 as follows:

$ ./ecm -resume toto 1e7
GMP-ECM ... [powered by GMP ...] [ECM]
Resuming P-1 residue saved by zimmerma@clafoutis.loria.fr with GMP-ECM 6.2 on Sat Apr 12 13:41:01 2008 
Input number is 13155161912808540373988986448257115022677318870175067553764004308210487 (71 digits)
Using B1=5000000-10000000, B2=9946848-1326917772, 
Step 1 took 3076ms
Step 2 took 4304ms
********** Factor found in step 2: 1448595612076564044790098185437
Found probable prime factor of 31 digits: 1448595612076564044790098185437
Probable prime cofactor 9081321110693270343633073697474256143651 has 40 digits

The second run only considered the primes in [5e6-10e6] in step 1,
which saved half the time of step 1.

The format used is the following:
  - each line corresponds to a composite (expression ARE saved in the save file)
  - a line contains assignments <id>=<value> separated by semi-colons ';'
  - possible values for <id> are 
    - METHOD (value = ECM or P-1 or P+1)
    - PARAM (value = ECM parametrization) [ECM only] (see Section 6)
    - SIGMA (value = ECM sigma parameter) [ECM only] (see Section 6)
    - B1 (first step bound)
    - N (composite number to factor)
    - X (value at the end of step 1)
    - A (A-parameter of the elliptic curve) [ECM only]
    - CHECKSUM (internal value to check correctness of the format)
    - PROGRAM (program used to perform step 1, useful for factor credits) 
    - X0 (initial point for ECM, or initial residue for P-1/P+1) [optional]
    - WHO (who performed step 1) 
    - TIME (date and time of first step)
  PARAM, SIGMA and X0 would be optional, and would be mainly be used in case of
  a factor is found, to be able to reproduce the factorization.
  For ECM, one of the SIGMA or A values must be present, so that the
  computation can be continued on the correct curve.
  The B1 and X values satisfy the condition that X is a lcm(1,2,...,B1)-th
  power in the (multiplicatively written) group.

If consecutive lines in a save file being resumed contain the same number to
be factored, say when many ECM curves on one number have been saved, factors
discovered by GMP-ECM are carried over from one attempt to the next so that
factors will be reported only once. If the cofactor is a probable prime, or
if the -one option was given and a factor was found, the remaining 
consecutive lines for that number will be skipped.

Note: it is allowed to have both -save f1 and -resume f2 for the same run,
however the files f1 and f2 should be different.

Remark: you should not perform in parallel several -resume runs on the same
input with the same B1/B2 values, since those runs will do the same 
computations. Options -save/-resume are useful in the following cases:

(a) somebody did a previous step 1 computation with another software
    which is faster than GMP-ECM, and wants to perform Step 2 with
    GMP-ECM which is faster for that task.
(b) somebody did a previous step 1 for P-1 or P+1 up to a given bound
    B1, and you want to extend that computation with B1' > B1, without
    restarting from scratch. Note: this does not apply to ECM, where 
    the smoothness property depends on the (random) curve chosen, not
    only on the input number.
(c) you did a huge step 1 P-1 or P+1 computation on a given machine, and you
    want to perform a huge step 2 in parallel on several
    machines. For example machine 1 tests the range B2_1-B2_2, machine
    2 tests B2_2-B2_3, ... This also decreases the memory usage for
    each machine, which is function of the range width B2max-B2min.
    For the same reason as (b), this does not apply to ECM. 

The -chkpnt option causes GMP-ECM to write the current residue periodically
during the stage 1 computation. This is useful as a safeguard in case the
GMP-ECM process is terminated, or the computer loses power, etc. The 
checkpoint is written every ten minutes, when a signal (SIGINT, SIGTERM) is
received, and at the end of stage 1. The format of the checkpoint file is 
very similar to that of regular save files, and checkpoints can be resumed 
with the -resume option.

For example:

$ ecm -chkpnt pm1chkpoint -pm1 1e10 1 < largenumber.txt

[Computer crashes during computation]

$ ecm -resume pm1chkpoint 1e10 1

Note 1: if an existing file is specified as the checkpoint file, it will be
silently overwritten! 

Note 2: When resuming a checkpoint file, additional small primes may be 
processed in stage 1 when the checkpoint file is resumed, so the 
end-of-stage 1 residues of an uninterrupted run and a checkpointed run 
may not match. The extra primes do not reduce the probability of finding 
factors, however.

Note 3: for ECM, the -chkpnt option is only implemented with -param 0 so far.

##############################################################################

8. How to get the best of GMP-ECM?

After configuring GMP-ECM, and "make", do:

$ make ecm-params

This will optimize parameters for your machine and put them in ecm-params.h.

If you do an out-of-source build, then:
* copy the new ecm-params.h file into the source folder
* run "make clean" and "make" again

The ecm program automatically selects what it thinks is the best
arithmetic for the given input number. If that choice is not optimal, you may 
force the use of a certain arithmetic by trying options -modmulm, -mpzmod, 
-redc. (The best choice should depend on B1 and B2 only very little, so long 
as B1 is not too small, say >= 1000.)

Number of step 2 blocks. The step 2 range [B1,B2] is divided into k 
"big blocks". The default value of k is chosen to be near to optimal.
However, it may be that for a given (B1,B2) pair, another value of k
may be better. Try for this to give the option -k <value> to ecm,
where <value> is 1, 2, 3, ... This will force ecm to divide step 2
in at least <value> blocks.

Changing the value of the number of blocks will not modify the chance
of finding a factor (except for extra factors, but some will be lost,
and some will be won, so the balance should be nearly even). However it 
will change the time spent in Step 2 and modify the memory used by Step 2
(see the section "Memory usage").

Optimal thresholds. The thresholds for the algorithms used in ecm are defined
in ecm-params.h, which tries to select a good "params.h" file.
Several params.h files are included in the distribution
and the configure script will select one matching your machine if it exists
(see the output of ecm -printconfig).
If there is no params.h for your machine then see the section "How to get the
best of GMP-ECM?" to generate one optimized for your machine.

Stage 2 now uses Number-Theoretic Transforms (NTT) for polynomial arithmetic 
by default for numbers of at most 30 machine words (NTT_SIZE_THRESHOLD in
ecm-ecm.h). The NTT code forces dF to be a power of 2; it can be disabled by
passing the command-line option -no-ntt and unconditionally enabled by -ntt. 
Performance of NTT is dependent on:
- Architecture. NTT seems to give the greatest improvement on Athlons,
  and the least improvement on Pentiums without SSE2.
- Thresholds. It is vital to have ecm-params.h properly tuned for your machine.
- C compiler. The SSE2 assembly code for 32 bit and the assembly code for 
  64 bit only work for x86 using gcc or Intel cc, so it is compiler dependent.

Note on factoring Fermat numbers:
GMP-ECM features Schönhage-Strassen multiplication for polynomials in 
stage 2 when factoring Fermat numbers (not in the new, fast stage 2 for
P+1 and P-1. This is to be implemented.) This greatly reduces the number of 
modular multiplications required, thus improving speed. It does, however, 
restrict the length of the polynomials to powers of two, so that for a given 
number of blocks (-k parameter), the B2 value can only increase by factors of 
approximately 4.
For the number of blocks, choices of 2, 3 or 4 usually give best performance.
However, if the polynomial degree becomes too large, relatively expensive
Karatsuba or Toom-Coom methods are required to split the polynomial before
Schönhage-Strassen's method can handle them. That can make a larger number 
of blocks worthwhile. When factoring the m-th Fermat number F_m = 2^(2^m)+1,
degrees up to dF=2^(m+1) can be handled directly. If your B2 choice
requires a degree much larger than this (dF is printed with the -v
parameter), try increasing the number of blocks with -k and see if
performance improves.
The Brent-Suyama extension should not be used when factoring Fermat numbers,
it is more efficient to simply increase B2. Therefore, -power 1 for P+1 and 
ECM, and -power 2 for P-1 are the default for Fermat numbers. (Larger
degrees for Brent-Suyama may possibly become worthwhile for P-1 runs on
smaller Fermat numbers and extremely large B2, when Karatsuba and Toom-Cook
are used extensively.)
Factoring Fermat numbers uses a lot of memory, depending on the size of the
Fermat number and on dF. For dF=65536 and F_12, the memory used is about
1700MB. If your system does not have enough memory, you will have to use a
larger number of blocks to reach the desired B2 value with a smaller poly
degree dF, which sacrifices some performance. Additionally, you may use the
-treefile option (see 4. Memory usage)

                k=1             k=2             k=3             k=4

dF=256          582132          1222002         1864182         2504052
dF=512          2443992         5008092         7572192         10131672
dF=1024         10016172        20263332        30519732        40766892
dF=2048         42634420        85689250        128744080       171798910
dF=4096         173259252       347500242       521780502       696021492
dF=8192         711738310       1425139180      2138540050      2851940920
dF=16384        2850278350      5703881830      8557643650      11411247130
dF=32768        11702792020     23412731170     35122670320     46832609470
dF=65536        48071333326     96165459406     144259585486    192353711566
dF=131072       194020810630    388069884940    582118959250    776168033560

                Table 2: Stage 2 interval length B2-B2min, for dF
                a power of 2 and small values of k.

For example, if you'd like to run stage 2 on F_12 with B2 ~= 40G, try
parameters "-k 1 <B1> 48e9", "-k 3 <B1> 35e9" or "-k 4 <B1> 46e9".

##############################################################################

9. GMP-ECM and GPU

The stage 1 of ECM can be computed on a Nvidia GPU.
More details are given in README.gpu.

##############################################################################

10. Record factors.

If you find a very large factor, the program will print a message like:

Report your potential champion to <email address>
(see <url for record factors>)

This means that your factor might be a champion, i.e., one of the top-ten
largest factors ever found by the corresponding method (P-1, P+1 or ECM).
Cf the following URLs:

ECM: https://maths-people.anu.edu.au/~brent/ftp/champs.txt
P-1: http://www.loria.fr/~zimmerma/records/Pminus1.html
P+1: http://www.loria.fr/~zimmerma/records/Pplus1.html

##############################################################################

11. Known problems.

On some machines, GMP-ECM uses the clock() function to measure the cpu time
used by step 1 and 2. Since that function returns a 32-bit integer, there
is a possible wrap-around effect when the clock() value goes back from 2^32-1
to 0, which may produce negative timings.

The NTT code uses primes that fit in one machine word and that are 
congruent to 1 (mod l), where l is the largest transform length required 
for the desired stage 2 parameters. For very large B2 on 32-bit machines, 
there may not be enough suitable primes, which may limit the possible 
transform length to less than what available memory would permit. This 
problem occurs mostly in the fast stage 2 for P-1 and P+1, as the generic
stage 2 uses far more memory for a given polynomial degree, so that memory
on a 32-bit machine will be exhausted long before suitable NTT primes are.
The maximal transform length depends on the size of the input number. 
For a transform length l on a 32 bit machine, N must satisfy:
l=2^11:N<2^756200, l=2^12:N<2^379353, l=2^13:N<2^190044, l=2^14:N<2^94870, 
l=2^15:N<2^47414,  l=2^16:N<2^23322,  l=2^17:N<2^11620,  l=2^18:N<2^5891, 
l=2^19:N<2^2910,   l=2^20:N<2^1340,   l=2^21:N<2^578,    l=2^22:N<2^228.
Since log(N)*l is approximately constant, this limits the amount of memory
that can be used to about 600MB for P-1, and 1200MB for P+1.

##############################################################################

12. GMP-ECM packages.

GMP-ECM is a standard Debian package. See for example
https://packages.debian.org/sid/gmp-ecm.

An opam package was built by Michael Soegtrop and was tested on MaOS
(including Apple silicon), many Linux variants and Windows (MinGW with cygwin
build host). See ocaml/opam-repository#20105.