# jwood000/bigIntegerAlgos

R Package for Factoring Big Integers using the C Library GMP (GNU Multiple Precision Arithmetic)
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
inst
man
src
tests
.DS_Store
.Rbuildignore
.gitignore
DESCRIPTION
NAMESPACE
NEWS
bigIntegerAlgos.Rproj
cleanup
config.cache
config.log
config.status
configure
configure.in

# bigIntegerAlgos

## Overview

bigIntegerAlgos uses the C library GMP (GNU Multiple Precision Arithmetic) for efficiently factoring big integers. For very large integers, prime factorization is carried out by a variant of the quadratic sieve algorithm that implements multiple polynomials. For smaller integers, a constrained version of the Pollard's rho algorithm is used (original code from https://gmplib.org/... this is the same algorithm found in the R gmp package (https://cran.r-project.org/web/packages/gmp/gmp.pdf) called by the function `factorize`). Finally, one can quickly obtain a complete factorization of given number `n` via `divisorsBig`.

## Installation

```install.packages("bigIntegerAlgos")

## Or install the development version
devtools::install_github("jwood000/bigIntegerAlgos")```

## Usage

First, we take a look at `divisorsBig`. It is vectorized and can also return a named list.

```## Get all divisors of a given number:
divisorsBig(1000)
Big Integer ('bigz') object of length 16:
[1] 1    2    4    5    8    10   20   25   40   50   100  125  200  250  500  1000

## Or, get all divisors of a vector:
divisorsBig(urand.bigz(nb = 2, size = 100, seed = 42), namedList = TRUE)
Seed initialisation
\$`153675943236425922379228498617`
Big Integer ('bigz') object of length 16:
[1] 1                              3
[3] 7                              9
[5] 21                             27
[7] 63                             189
[9] 813100228764158319466817453    2439300686292474958400452359
[11] 5691701601349108236267722171   7317902058877424875201357077
[13] 17075104804047324708803166513  21953706176632274625604071231
[15] 51225314412141974126409499539  153675943236425922379228498617

\$`261352009818227569107309994396`
Big Integer ('bigz') object of length 12:
[1] 1                              2
[3] 4                              155861
[5] 311722                         623444
[7] 419206873140534785974859       838413746281069571949718
[9] 1676827492562139143899436      65338002454556892276827498599
[11] 130676004909113784553654997198 261352009818227569107309994396```

It is very efficient as well. It is equipped with a modified merge sort algorithm that significantly outperforms the `std::sort`/`bigvec` (the class utilized in the `R gmp` package) combination.

```hugeNumber <- pow.bigz(2, 100) * pow.bigz(3, 100) * pow.bigz(5, 100)
system.time(overOneMillion <- divisorsBig(hugeNumber))
user  system elapsed
0.637   0.043   0.682
length(overOneMillion)
[1] 1030301

## Output is in ascending order
tail(overOneMillion)
Big Integer ('bigz') object of length 6:
[1] 858962534553352218394101882942702121170179203335000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[2] 1030755041464022662072922259531242545404215044002000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[3] 1288443801830028327591152824414053181755268805002500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[4] 1717925069106704436788203765885404242340358406670000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[5] 2576887603660056655182305648828106363510537610005000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[6] 5153775207320113310364611297656212727021075220010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000```

Another benefit is that it will return correct orderings on extremely large numbers when compared to sorting large vectors in `base R`. Typically in `base R` you must execute the following: `order(asNumeric(myVectorHere))`. When the numbers get large enough, precision is lost which leads to incorrect orderings. Observe:

```set.seed(101)
testBaseSort <- do.call(c, lapply(sample(100), function(x) add.bigz(pow.bigz(10,80), x)))
testBaseSort <- testBaseSort[order(asNumeric(testBaseSort))]
myDiff <- do.call(c, lapply(1:99, function(x) sub.bigz(testBaseSort[x+1], testBaseSort[x])))

## Should return integer(0) as the difference should always be positive
which(myDiff < 0)
[1]  1  3  4  7  9 11 14 17 19 22 24 25 26 28 31 32 33 36 37 38 40 42 45 47 48
[26] 50 51 54 57 58 59 63 64 65 66 69 70 72 75 78 81 82 85 87 89 91 93 94 97 98

## N.B. The first and second elements are incorrect order (among others)
Big Integer ('bigz') object of length 6:
[1] 100000000000000000000000000000000000000000000000000000000000000000000000000000038
[2] 100000000000000000000000000000000000000000000000000000000000000000000000000000005
[3] 100000000000000000000000000000000000000000000000000000000000000000000000000000070
[4] 100000000000000000000000000000000000000000000000000000000000000000000000000000064
[5] 100000000000000000000000000000000000000000000000000000000000000000000000000000024
[6] 100000000000000000000000000000000000000000000000000000000000000000000000000000029
```

## The Quadratic Sieve

The function `quadraticSieve` implements the multiple polynomial quadratic sieve algorithm. Currently, `quadraticSieve` can comfortably factor numbers with less than 50 digits (~160 bits).

```## Generate large semi-primes
semiPrime120bits <- prod(nextprime(urand.bigz(2,60,42)))
semiPrime130bits <- prod(nextprime(urand.bigz(2,65,1)))
semiPrime140bits <- prod(nextprime(urand.bigz(2,70,42)))

## Using factorize from gmp package which implements pollard's rho algorithm
## We did not test the 140 bit semi-prime as the 130 bit took a very long time

##**************gmp::factorize*********************
system.time(print(factorize(semiPrime120bits)))
Big Integer ('bigz') object of length 2:
[1] 638300143449131711  1021796573707617139
user  system elapsed
126.603   0.052 126.694

system.time(print(factorize(semiPrime130bits)))
Big Integer ('bigz') object of length 2:
[1] 14334377958732970351 29368224335577838231
user   system  elapsed
1513.055    1.455 1517.524

## quadraticSieve is much faster and scales better
Big Integer ('bigz') object of length 2:
[1] 638300143449131711  1021796573707617139
user  system elapsed
4.028   0.008   4.036

Big Integer ('bigz') object of length 2:
[1] 14334377958732970351 29368224335577838231
user  system elapsed
6.310   0.013   6.324

Big Integer ('bigz') object of length 2:
[1] 143600566714698156857  1131320166687668315849
user  system elapsed
12.990   0.260  13.249 ```

It can also be used as a general prime factoring function:

```quadraticSieve(urand.bigz(1,50,1))
Seed initialisation
Big Integer ('bigz') object of length 5:
[1] 5       31      307     2441    4702723```

However `gmp::factorize` is more suitable for numbers smaller than 60 bits and should be used in such cases.

## Current Research:

Improvements are being made to the section of the quadratic sieve algorithm that selects smooth numbers. Right now, we are only selecting M-smooth numbers where M is the largest prime in our sieving base. A potential efficiency gain is to also keep track of mostly M-smooth numbers (i.e. numbers that are almost completely factored by our sieving base but have one factor that contains a prime number greater than M). This way, we can obtain more smooth numbers by essentially eliminating these larger factors when we find a pair of them (N.B. square terms come out in the wash, so the large factors won't influence the outcome).

We are also working on efficiently integrating `divisorsBig` with `quadraticSieve` as currently, `divisorsBig` utilizes `gmp::factorize`.

## Contact

I welcome any and all feedback. If you would like to report a bug, have a question, or have suggestions for possible improvements, please contact me here: jwood000@gmail.com

You can’t perform that action at this time.