# Dixon's algorithm for factoring


This program was made for an assignment of the class ”Cryptography” of my master’s program:  
"Use Dixon's algorithm to factorize the number $n= 902831$."

<u>**Dixon's algorithm**</u>  
$\cdot$ Input: Odd composite integer $n>3$  
$\cdot$ Output: A non-trivial factor of $n$  
1. We choose a positive integer $y$ and create the "factor base*" $B$ of all primes $\leq y$, i.e. $B=\{-1,p_1, \dots, p_{\pi(y)}\}$.
2. If none of the elements of $B$ divides $n$,  
   then we find integers $b_i \in \{2, \dots, n-1\}$ $(i=1, \dots, \pi(y)+2)$ which are "$B$-adapted*" over n.
3. We write $b_i^2 \equiv (-1)^{a_{i0}} p_1^{a_{i1}} \dots p_{\pi(y)}^{a_{i\pi(y)}} \, (mod \, n)$  
and correspond each $b_i$ to a vector $u_i=(u_{i0}, \dots, u_{i\pi(y)})$ with \begin{cases} u_{ij}=0, \; \text{if } a_{ij} \text{ is even} \\ u_{ij}=1, \; \text{if } a_{ij} \text{ is odd} \end{cases}
4. We find a set of indexes $T \subseteq \{1, \dots, \pi(y)+2\}$ such that $\sum_{i \in T}u_i = 0$ in $\mathbb{Z}_2^{\pi(y)+1}$.
5. We calculate $b=\prod_{i \in T}b_i$, $c=p_1^{\gamma_1} \dots p_{\pi(y)}^{\gamma_{\pi(y)}}$ with $\gamma_j = \frac{1}{2} \sum_{i \in T} a_{ij}$  $(j=1, \dots, \pi(y))$.
6. If $b \not\equiv \pm c \, (mod \, n)$, then we calculate the $gcd(b+c,n)$, which will be a non-trivial factor of n.
7. If $b \equiv \pm c \, (mod \, n)$, then we choose another $T$, or a larger $y$ and repeat the process.


##### Notes:
A set of the form $B=\{-1,p_1,\dots,p_m\}$ where $p_1, \dots, p_m$ are different primes is called a factor base.    
An integer will be called "$B$-smooth" if it can be written as a product of elements of $B$.  
An integer will be called "$B$-adapted" over a positive integer $m$ if $\exists$ $B$-smooth integer $c$ such that $-\frac{n}{2}<c<\frac{n}{2}$ and $b^2 \equiv c \, (mod \, n)$.

Author: Florias Papadopoulos

## Importing modules

We start by importing the modules that we will use

In [1]:
import numpy as np
import math
import random
import sys
from collections import OrderedDict

If one or more modules are missing you can just type the code below in order to install a pip package in the current Jupyter kernel. For example, if numpy is missing, then we can use

In [2]:
import sys
!{sys.executable} -m pip install numpy



## Defining the functions

### Starter functions

#### (a) base_numbers

We first need a function that takes as input an integer $y$ and outputs the factor set $B$, which contains -1 & primes $\leq y$.  
A for/else loop is used inside this function, which is not that known. For more information, look [here](https://book.pythontips.com/en/latest/for_-_else.html) (working link in 19/01/2023).

In [3]:
def base_numbers(y):
    base = [-1]
    for i in range(2, y + 1):
          for j in range(2, int(i ** 0.5) + 1):
                if i%j == 0:
                    break
          
          else:
                base.append(i)
    return base

#### (b) bi_testset

A common trick when trying to find the "$B$-adapted" $b_i$ is to search among the integers of the form $\lfloor \sqrt{kn} \rfloor + j$, where $k,j=1,2,\dots$.  
For this reason we need a function that will have as input the integers $n$, $k_{upper}$ (the max value that $k$ can be) and $j_{upper}$ (the max value that $j$ can be), and as output the "test-set" of these "possible $b_i$".

In [4]:
def bi_testset(n, k_upper, j_upper):
    bi_testset = set()
    for k in range(1,k_upper+1):
        v1 = math.floor((k*n)**0.5)
        for j in range(1,j_upper+1): 
            bi_testset.add(v1 + j)
    return bi_testset

#### (c) sublist

We will also need a function that can check if a python list is a subset of another list.

In [5]:
def sublist(lst1, lst2):
    return all([(x in lst2) for x in lst1])

#### (d) generatePrimeFactors

Finally, we need a function that can take a number $N$ and calculate its prime factors, along with their exponents.  
This function works with an accompanying algorithm (the Sieve of Eratosthenes).

In [6]:
#Python3 program to print prime factors and their powers using Sieve Of Eratosthenes (found online, with small changes by me -F)
# Using SieveOfEratosthenes to  find smallest prime factor of all the numbers.
# For example, if N is 10, s[2] = s[4] = s[6] = s[10] = 2, s[3] = s[9] = 3, s[5] = 5, s[7] = 7

def generatePrimeFactors(N):
    # s[i] is going to store smallest prime factor of i.
    s = [0] * (N+1)
    
    # Filling values in s[] using the sieve
    # Create a boolean array "prime[0..n]" and initialize all entries in it as false.
    prime = [False] * (N+1)
    # Initializing smallest factor equal to 2 for all the even numbers
    for i in range(2, N+1, 2):
        s[i] = 2
    # For odd numbers less than equal to n
    for i in range(3, N+1, 2):
        if (prime[i] == False):            
            # s(i) for a prime is the number itself
            s[i] = i
            # For all multiples of current prime number
            for j in range(i, int(N / i) + 1, 2):
                if (prime[i*j] == False):
                    prime[i*j] = True
                    # i is the smallest prime factor for number "i*j".
                    s[i * j] = i
    
    # Current prime factor of N
    curr = s[N]   
    # Power of current prime factor + allagh dikia mou
    cnt = 1
    curr_set = []
    cnt_set = []
    # Printing prime factors and their powers
    while (N > 1):
        N //= s[N]
        # N is now N/s[N]. If new N also has smallest prime factor as curr, increment power
        if (curr == s[N]):
            cnt += 1
            continue
        curr_set.append(curr)
        cnt_set.append(cnt)
        # Update current prime factor as s[N] and initializing count as 1.
        curr = s[N]
        cnt = 1
        
    return [curr_set, cnt_set]

### Main functions

#### (a) ci_all

A function that takes as input $n, k_{upper}, j_{upper} \text{ and } y$, and returns the real $b_i$, the $c_i = b_i^2 \, mod \, n$ and the bases and exponents of the prime factorization of $c_i$.

In [7]:
def ci_all(n,k_upper,j_upper,y):
    base = base_numbers(y)
    bi = bi_testset(n,k_upper,j_upper)
    bi_set = []
    c_set = []
    c_set_base = []
    c_set_exponent = []
    c_set_bases = []
    c_set_exponents = []
    c_set_indexes =[]
    c_set_index =[]

    for i in bi:
        j = i**2
        z = j % n
        [a,b] = generatePrimeFactors(z)
        if sublist(a,base) == True:
            bi_set.append(i)
            c_set_base = a
            c_set_exponent = b
            c_set_bases.append(c_set_base)
            c_set_exponents.append(c_set_exponent)

            for s in range(0,len(c_set_base)):
                e = base.index(c_set_base[s])
                c_set_index.append(e) #index kanei mono prwto occurence prosoxh!
            c_set_indexes.append(c_set_index)
            c_set_index = []

            q=0
            u=0
            c_number=1
            while u < len(c_set_base):
                c_number = c_set_base[u]**c_set_exponent[q] * c_number
                u=u+1
                q=q+1
            c_set.append(c_number)

    return c_set, c_set_bases, c_set_exponents, c_set_indexes, bi_set

#### (b) basics

This function is used as a "superset" of the previous one.  
It returns everything the previous function outputted, but it also returns some randomly chosen $b_i$, with their corresponing $c_i$ etc.  
Moreover, the set $U$ of the $u_i$ is also created, as defined in the algorithm.

In [8]:
def basics(n,k_upper,j_upper,y):
    base = base_numbers(y)
    (c_set, c_set_bases, c_set_exponents, c_set_indexes, bi_set) = ci_all(n,k_upper,j_upper,y)

    bi_set_list = list(bi_set)
    bi_random = []
    bi_randoms = []
    base = base_numbers(y)
    x = len(base)+1
    if x > len(bi_set_list):
        print("The size of the bi_set is: " + str(len(bi_set_list)) + " which is smaller than " + str(x))
        sys.exit("Not enough numbers inside the set of bi numbers, choose larger y or/and k or/and j.")
    else:
        bi_random = random.sample(bi_set_list, x)

        index_of_random_in_bi_set = []
        for i in bi_random:
            q = bi_set_list.index(i)
            index_of_random_in_bi_set.append(q)

        ci_random = []
        ci_random_bases = []
        ci_random_exponents = []
        u_set =[]
        for m in index_of_random_in_bi_set:

            c_set_exponents_m = c_set_exponents[m]
            c_set_indexes_m = c_set_indexes[m]

            u_m = []
            for i in range(0,len(base)):
                q = 0
                u_m.append(q)

            for t in range(0,len(c_set_indexes_m)):
                w = c_set_exponents_m[t]
                z = c_set_indexes_m[t]
                if w % 2 != 0:
                    u_m[z] = 1

            r0 = c_set[m]
            r1 = c_set_bases[m]
            r2 = c_set_exponents[m]
            r3 = u_m

            ci_random.append(r0)
            ci_random_bases.append(r1)
            ci_random_exponents.append(r2)
            u_set.append(r3)

    return c_set, c_set_bases, c_set_exponents, c_set_indexes, bi_set,  bi_random, ci_random, ci_random_bases, \
ci_random_exponents, u_set

#### (c) indexT

This function will be used after the random $b_i$ are chosen (and all their corresponding values are computed e.g. $c_i$).  
It creates the index $T$ as defined in the algorithm.

In [9]:
def indexT(n,k_upper,j_upper,y,a,b,c,d,e):
    u_set = e

    base = base_numbers(y)
    ut_sum = np.zeros(len(base))

    while True:
        random_T_set =[]
        index_of_T = []
        x = random.randrange(math.floor(len(u_set)/2),len(u_set))
        v = 0
        while v < x:
            y = random.randint(0,x)
            random_T_set.append(u_set[y])
            index_of_T.append(y)
            v = v + 1
        random_T_set_SET =  np.unique(np.array(random_T_set), axis=0)
        index_of_T_SET = np.unique(np.array(index_of_T))
        for i in range(0,len(random_T_set_SET)):
            ut_sum = np.add(random_T_set_SET[i],ut_sum)
      
        statements = []
        for j in ut_sum:
            statements.append((j % 2 == 0))
      
        if all(statements) == True:
            return index_of_T_SET

#### (d) b_final

This function computes the value $b$, mentioned in the 5th step of the algorithm.

In [10]:
def b_final(n,k_upper,j_upper,y,a,b,c,d,e,f):

    b_final = 1
    for i in f:
        b_final = a[i]*b_final

    return b_final

#### (e) gamma_j

This function computes the values $\gamma_j$ ($j=1, \dots, \pi(y)$), also defined in the 5th step.

In [11]:
def gamma_j(n,k_upper,j_upper,y,a,b,c,d,e,f):
    base = base_numbers(y)
    l = []
    a_i= []

    for i in range(0,len(f)):
        r = f[i]
        w = d[r]

        while len(w) < len(base):
            w.append(0)
        a_i.append(w)

    lists_of_lists = [[1, 2, 3], [4, 5, 6]]
    gamma_j = [1/2*sum(x) for x in zip(*a_i)]
    return gamma_j

#### (f) c_final

This final function gives us the value $c$ as output, also defined in the 5th step.

In [12]:
def c_final(n,k_upper,j_upper,y,a,b,c,d,e,f,h):
    base = base_numbers(y)
    c_final = 1
    for p_index in range(1,len(base)):
        p_w = base[p_index]
        g_j = h[p_index-1]
        c_final = c_final * (p_w ** g_j)
    return (c_final)

## Solving the problem

We now create a script that takes as input $n, k_{upper}, j_{upper} \text{ and } y$ and returns a correct iteration of Dixon's algorithm.  
In the script below, $n=902831$ and the other inputs are chosen so that the algorithm runs smoothly.  
However, due to the probabilistic nature of the algorithm, it takes on average 10-15 minutes to finish for a number the size of $902831$. 

In [15]:
#input
n, k_upper, j_upper, y = 902831, 30, 30, 30
#input 

base = base_numbers(y)
x = True
while x == True:
    (c_set, c_set_bases, c_set_exponents, c_set_indexes, bi_set, bi_random, ci_random, ci_random_bases, ci_random_exponents, \
     u_set) = basics(n,k_upper,j_upper,y)
    p0=0
    counter = 0
    while counter <1000:
        p0 = p0 + 1
        counter = counter + 1
        #print("This is try number " + str(p0) + " of this run.")
        f = indexT(n,k_upper,j_upper,y,bi_random, ci_random, ci_random_bases, ci_random_exponents, u_set)
        g = b_final(n,k_upper,j_upper,y,bi_random, ci_random, ci_random_bases, ci_random_exponents, u_set,f)
        h = gamma_j(n,k_upper,j_upper,y,bi_random, ci_random, ci_random_bases, ci_random_exponents, u_set,f)
        i = c_final(n,k_upper,j_upper,y,bi_random, ci_random, ci_random_bases, ci_random_exponents, u_set,f,h)
        #print("T=", f)
        #print("b=", g)
        #print("set of γ_j:", h)
        #print("c=", i)

        x1 = (g - math.floor(i) % n != 0)
        x2 = (g + math.floor(i) % n != 0)
        n1 = math.gcd(g + math.floor(i),n)
        if x1 != True or x2 != True or i - math.floor(i) > 0 or n1 == 1:
            #print("No proper result achieved - Will try again.")
            #print("")
            continue
        else:
            #print("")
            #print("")
            #print("")
            #print("")
            print("This number is a non-trivial factor of " + str(n) + " : " + str(n1))
            print("The number was found using algorithm of Dixon, using the following:")
            print("")
            print("In step 1, we chose y=" + str(y) + " which gave us the base B={p_1,...,p_π(y)}:", base)
            print("")
            print("In step 2, we had to find B-adapted numbers from the set {2,...,n-1}.")
            print("To make the search easier, a common hint is to look for these numbers in a set of the form { floor{sqrt{kn}}+j | k=1,2,...,K and j=1,2,...,J }")
            print("So, using K_upper=" + str(k_upper) + " and J_upper=" + str(j_upper) + " we got the set:", bi_set)
            print("From this set, we chose randomly |B|+1 of them and defined them as b_i, giving us the bi_set: ", bi_random)
            print("")
            print("In step 3, first we defined the numbers c_i = b_i^2 (modn), which gave us the set:", ci_random)
            print("Moreover, each c_i has a unique analysis of the form  (-1)^0 * p_1^a_i1 * ... * p_π(y)^a_iπ(y)  , from which we need the exponents and their respective bases.")
            print("So, we made a list of the exponents of each c_i:", ci_random_exponents)
            print("and a list of the respective bases for these exponents (all others have exponent 0):", ci_random_bases)
            print("From them we created the set of all vectors u_i=(u_i0, ... ,u_iπ(y)), which is:", u_set)
            print("")
            print("In step 4, we found the set of indexes T=", f)
            print("")
            print("In step 5, we found b=", g)
            print("and the set of γ_j", h)
            print("which gave us c=", i)
            print("")
            print("In step 6, we verified that n|b+c and n|b-c are not True and calculated the gcd(b+c,n)=" + str(n1) + " which is a non-trivial factor of n=" + str(n))
            x = False #to get out of the outside loop
            counter = 1001 #to get out of the inside loop

This number is a non-trivial factor of 902831 : 823
The number was found using algorithm of Dixon, using the following:

In step 1, we chose y=30 which gave us the base B={p_1,...,p_π(y)}: [-1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

In step 2, we had to find B-adapted numbers from the set {2,...,n-1}.
To make the search easier, a common hint is to look for these numbers in a set of the form { floor{sqrt{kn}}+j | k=1,2,...,K and j=1,2,...,J }
So, using K_upper=30 and J_upper=30 we got the set: [4142, 2125, 4250, 4361, 4374, 2331, 2344, 2515, 4662, 4663, 2689, 2693, 4941, 4950, 5030, 3005, 3010, 5122, 3154, 3294, 3300, 3442, 3557, 3577, 3578, 3681, 1647, 1650, 3922, 4040]
From this set, we chose randomly |B|+1 of them and defined them as b_i, giving us the bi_set:  [4941, 4662, 5122, 3557, 3300, 5030, 3154, 3681, 2125, 2693, 3922, 4142]

In step 3, first we defined the numbers c_i = b_i^2 (modn), which gave us the set: [37044, 66300, 52785, 12615, 56028, 21632, 16575, 7296, 1470, 29601, 33