<div style="text-align: right">Paul Novaes<br>July 2018</div> 

# Big Integers

The goal of this notebook is to show how to perform the 4 basic arithmetic operations on big integers.

By big integer, we mean an arbitrary nonnegative integer, with no limit on its size, for example with 1,000,000 digits.

Though Python already supports big integers, the following algorithms are interesting in themselves and can be used in other languages and environments.

## Big Integer Representation

We will represent big integers as arrays of digits. For example 123 will be represented by $[1, 2, 3]$.

The following functions will be used later.

__Conversions between int and big integer__

In [1]:
# Returns a big integer from an int.
def big_integer_from_int(n):
    assert(isinstance(n, int) and n >= 0)
    if n == 0: return [0]
    big_integer = []
    while n > 0:
        big_integer.append(n % 10)
        n = n // 10
    return big_integer[::-1]

# Returns an int from a big_integer.
def big_integer_to_int(n):
    result = 0
    for i in range(0, len(n)):
        result = result*10 + n[i]
    return result

__Easy digit access__

In [2]:
# Gets n_i in n = n_k...n_1n_0.
def big_integer_get_digit(n, i):
    if i >= len(n): return 0
    return n[len(n) - 1 - i]

# Sets n_i to d in n = n_k...n_1n_0.
def big_integer_set_digit(n, i, d):
    n[len(n) - 1 - i] = d

__Utility functions__

In [3]:
# Trims leading zeroes, from a possibly improper representation of a big integer.
def big_integer_trim(n):
    leading_zeroes = 0
    for i in range(len(n) - 1):
        if n[i] == 0:
            leading_zeroes += 1
        else:
            break
    del n[0:leading_zeroes]
    big_integer_assert(n)
    
# Asserts n is a proper representation of a big integer.
def big_integer_assert(n):
    assert (not(len(n) > 1 and n[0] == 0))
    for d in n:
        is_digit = isinstance(d, int) and d >= 0 and d <= 9
        assert(is_digit)

__Unit Test__

In [4]:
def test_big_integer_representation():
    for i in range(100):
        n = big_integer_from_int(i)
        big_integer_assert(n)
        return big_integer_to_int(n) == i

def run_test(test):
    if test():
        print(test, 'succeeded')
    else:
        print(test, 'failed')
        assert(False)
        
run_test(test_big_integer_representation)

<function test_big_integer_representation at 0x10aed1950> succeeded


## Addition

We use the standard algorithm. It only assumes that we have an "addition table", that is that we know how to add 2 individual digits.

In [5]:
# Returns a + b.
def big_integer_add(a, b):
    def add_digits(d1, d2, carry):
        assert(carry <= 1)
        sum = d1 + d2 + carry
        return int(sum / 10), sum % 10
    big_integer_assert(a)
    big_integer_assert(b)
    num_digits = max(len(a), len(b)) + 1
    sum = [0 for i in range(num_digits)]
    # Add pairs of digits starting from the right and set the carry bit appropriately.
    for i in range(num_digits):
        carry, digit_sum = add_digits(big_integer_get_digit(a, i), 
                                      big_integer_get_digit(b, i), 
                                      big_integer_get_digit(sum, i))
        big_integer_set_digit(sum, i, digit_sum)
        if (carry > 0):
            big_integer_set_digit(sum, i + 1, 1)
    big_integer_trim(sum)
    return sum

__Unit Test__

In [6]:
def test_big_integer_add():
    for i in range(200):
        for j in range(200):
            a = big_integer_from_int(i)
            b = big_integer_from_int(j)
            sum = big_integer_add(a, b)
            if i + j != big_integer_to_int(sum):
                return False
    return True
        
run_test(test_big_integer_add)

<function test_big_integer_add at 0x10aef1378> succeeded


__Example__

In [7]:
def twice(a):
    return big_integer_add(a, a)
        
n = 2**1000
a = big_integer_from_int(n)
result = big_integer_to_int(twice(a))

assert(result == n * 2)
print('\n2 *', n, '=', result)


2 * 10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376 = 21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336138752


## Subtraction

We use again the standard algorithm, and we only assume that we have a "subtraction table" that let us subtract 2 individual digits.

In [8]:
# Returns a - b.
# Assumes that a >= b.
def big_integer_subtract(a, b):
    def subtract_digits(d1, d2, borrow):
        assert(borrow <= 1)
        diff = d1 - d2 - borrow
        if diff >= 0: return 0, diff
        return 1, diff + 10
    big_integer_assert(a)
    big_integer_assert(b)
    num_digits = len(a) 
    diff = [0 for i in range(num_digits)]
    borrow = 0
    # Subtract pairs of digits starting from the right and set the borrow bit appropriately.
    for i in range(num_digits):
        borrow, digit_diff = subtract_digits(big_integer_get_digit(a, i), 
                                             big_integer_get_digit(b, i), 
                                             borrow)
        big_integer_set_digit(diff, i, digit_diff)
    assert(borrow == 0)
    big_integer_trim(diff)
    return diff

__Unit Test__

In [9]:
def test_big_integer_subtract():
    for i in range(100):
        for j in range(i + 1):
            a = big_integer_from_int(i)
            b = big_integer_from_int(j)
            diff = big_integer_subtract(a, b)
            if i - j != big_integer_to_int(diff):
                return False
    return True

run_test(test_big_integer_subtract)

<function test_big_integer_subtract at 0x10aef18c8> succeeded


__Example__

In [10]:
n = 10**100
a = big_integer_from_int(n)
result = big_integer_to_int(big_integer_subtract(a, [1]))

assert(result == n - 1)
print('\n10**100 - 1 =', result)


10**100 - 1 = 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999


## Multiplication

We essentially use the standard method that involves:
1. multiplying the multiplicand by the digits in the multiplier
2. shifting the partial results appropriately
3. adding the partial results

For the first step we avoid using a "multiplication table". Instead we precompute multiples of the multiplicand by every digit between 0 and 9. This is overkill for small multipliers but it works well here.

In [11]:
# Shifts n 'k' places to the left. That is, multiplies n by 10**k.
def shift_left(n, k):
    num_digits = len(n) + k
    shifted = [0 for i in range(num_digits)]
    for i in range(len(n)):
        big_integer_set_digit(shifted, i + k, big_integer_get_digit(n, i))
    big_integer_trim(shifted)
    return shifted

# Returns a * b.
def big_integer_multiply(a, b):
    product = [0]
    # Precompute multiples of 'a'. This is overkill if 'b' is small but it is
    # useful when 'b' has many digits, and simplifies the algorithm.
    multiples_of_a = [[0] for i in range(10)]
    for i in range(1, 10):
        multiples_of_a[i] = big_integer_add(multiples_of_a[i-1], a)
    # Standard algorithm, where we multiply 'a' by each digit in 'b', shifting the
    # result as necessary.
    for i in range(len(b)):
        b_digit = big_integer_get_digit(b, i)
        product = big_integer_add(product, 
                             shift_left(multiples_of_a[b_digit], i))
    return product  

__Unit Test__

In [12]:
def test_big_integer_multiply():
    for i in range(100):
        for j in range(100):
            a = big_integer_from_int(i)
            b = big_integer_from_int(j)
            product = big_integer_multiply(a, b)
            if i * j != big_integer_to_int(product):
                return False
    return True
            
run_test(test_big_integer_multiply)

<function test_big_integer_multiply at 0x10aef1b70> succeeded


__Examples__

In [13]:
def factorial(n):
    result = [1]
    while n != [0]:
        result = big_integer_multiply(result, n)
        n = big_integer_subtract(n, [1])
    return result

def power(n, m):
    result = [1]
    while m != [0]:
        result = big_integer_multiply(result, n)
        m = big_integer_subtract(m, [1])
    return result

import math

result = big_integer_to_int(factorial(big_integer_from_int(100)))
assert(result == math.factorial(100))
print('\n100! =', result)

result = big_integer_to_int(power(big_integer_from_int(2), big_integer_from_int(40)))
assert(result == 2**40)
print('\n1 terabyte =', result)


100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

1 terabyte = 1099511627776


## Division

Standard division is pretty tedious. To compute a/b, we typically consider the first couple of digits of a and b, and try to estimate the first digit d of the division. We then compute b * d and hope that the product is not too small and not to big. Otherwise we try d-1 or d+1.

Our algorithm avoids this guessing game, even it is a little bit slower.

In [14]:
def big_integer_compare(a, b):
    big_integer_assert(a)
    big_integer_assert(b)
    if len(a) > len(b): return 1
    if len(a) < len(b): return -1    
    num_digits = len(a)    
    for i in range(num_digits):
        if a[i] > b[i]: return 1
        elif a[i] < b[i]: return -1
    return 0

# Returns q, r with a = q*b + r and r < b.
def big_integer_divide(a, b):
    big_integer_assert(a)
    big_integer_assert(b)
    # At all times a = q*b + r and, at the end, r < b.
    q = [0]
    r = a
    while big_integer_compare(r, b) >= 0:
        # Remove from r, b*10**k, for k as large as possible.
        k = len(r) - len(b)
        if big_integer_compare(shift_left(b, k), r) > 0:
            k -= 1
        q = big_integer_add(shift_left([1], k), q)
        r = big_integer_subtract(r, shift_left(b, k))
    return q, r

__Example__

In [15]:
def gcd(a, b):
    if b == [0]:
        return a
    q, r = big_integer_divide(a, b)
    return gcd(b, r)

a = big_integer_from_int(2**512*3**256)
b = big_integer_from_int(2**256*3**512)

result = big_integer_to_int(gcd(a, b))
assert(result == 2**256*3**256)
print('\ngcd of 2**512*3**256 and 2**256*3**512 is', big_integer_to_int(gcd(a, b)))
    


gcd of 2**512*3**256 and 2**256*3**512 is 16096079122395561512061763913577304064976913480336184419870793713264277662647884811647537309195067118494671396772863385036070451166210844912750004569643271101628982439604781505910612662005364378566656


## Discussion

Overall, that was a bit easier than expected.

There were a few surprises:
* Subtraction is not more difficult than addition. In fact the 2 algorithms are nearly identical.
* We don't need a multiplication table to multiply big integers. We can just use addition. This is not optimal, but still OK from a big-O point of view.
* Division was surprisingly easy. We don't really need to guess each digit and we don't need to know how to multiply. Again this is not optimal, but OK from a big-O point of view.

A few more remarks:
* It would be even easier if we dealt with binary, instead of decimal, integers. But we would still need to convert between the 2 representations.
* Performance could be improved by representing big integers in a larger base, say 10^5, but the multiplication and division algorithms would become more complicated.
* Addition and substraction run in linear time while multiplication and division run in quadratic time which is standard. The space complexity is probably substandard though. 