# Lecture 14: Number Theory/Cryptography (2 of ...?)

### Please note: This lecture will be recorded and made available for viewing online. If you do not wish to be recorded, please adjust your camera settings accordingly. 

# Reminders/Announcements:
- Assignment 4 has been collected. Assignment 5 out soon. (has been pushed)
- Final Project Topics are available. Take a look!
- Halfway point! https://forms.gle/fV8MiyZ922fJkXHY7

## Last Time...
Recall last time we discussed the *multiplicative order* of elements mod $N$. Namely, if $a$ is an integer coprime to $N$, then the multiplicative order of $a$ mod $N$ is the smallest positive integer $d$ with
$$
a^d \equiv 1 \mod N.
$$

In [0]:
N = 53
a = 4
gcd(a,N)

In [0]:
mod(a,N).multiplicative_order()

In [0]:
power_mod(a,26,N)

Euler's Theorem told us that $d$ is always bounded above by $\phi(N)$, where $\phi$ is the Euler totient function. In fact, $d$ will always divide $\phi(N)$.

In [0]:
euler_phi(N)

In [0]:
52%26

Finally, we learned that if $p$ is prime, there *is always* a number $a$ with multiplicative order mod $p$ equal to $\phi(p)$. Such an integer is called a *primitive root* mod $p$.

In [0]:
p = 19
euler_phi(p)

In [0]:
primitive_root(p)

In [0]:
for i in range(19):
    print(power_mod(2,i,19))

## ***** Participation Check ***************************

In the code cell below, find and display *all of the primitive roots* mod $53$.

In [0]:
# your code here

In the code cell below, compute $\phi(52)$, where $\phi$ is Euler's totient function. How does this relate to the answer in the previous part?

In [0]:
# your code here
euler_phi(52)

General fact: If $p$ is prime, there are $\phi(p-1)$ primitive roots mod $p$.

## ***********************************************************

## This Time...
We want to start talking about "difficult" things from the lens of computation. The reason is that "difficult" things will often directly lead to cryptographic protocols. Our end goal: how can two users communicate securely through public channels? 

If they are allowed to conspire before hand, this is very easy to imagine. People collude together in secret and agree on a method to encode and decode their message; for instance, a "one-time pad" is a common way of doing this (or at least; *used* to be a common way of doing this):

In [0]:
binaryMsg = '01110100 01101000 01101001 01110011 00100000 01101001 01110011 00100000 01101101 01111001 00100000 01110011 01100101 01100011 01110010 01100101 01110100 00100000 01101101 01100101 01110011 01110011 01100001 01100111 01100101 00100001'

chars = binaryMsg.split()
chars = [chr(int(char,2)) for char in chars]
print(''.join(chars))

In [0]:
import random
myPad = [random.randint(0,1) for char in binaryMsg if char in ['0','1']]   #This is our shared secret key
show(myPad[0:10])

In [0]:
encodedMsg = ''
i = 0
for char in binaryMsg:
    if char == ' ':
        encodedMsg += char
    else:
        encodedMsg += str((int(char)+myPad[i])%2)
        i+=1

In [0]:
encodedMsg

In [0]:
binaryMsg

In [0]:
chars = encodedMsg.split()
chars = [chr(int(char,2)) for char in chars]
print(''.join(chars))

In [0]:
decodedMsg = ''
i = 0
for char in encodedMsg:
    if char == ' ':
        decodedMsg += char
    else:
        decodedMsg += str((int(char)+myPad[i])%2)
        i+=1

In [0]:
chars = decodedMsg.split()
chars = [chr(int(char,2)) for char in chars]
print(''.join(chars))

A one time pad is effectively impossible to decode, as long as:
- The key is truly random
- The key is kept secret
- The key is as long as the plaintext
- The key is never reused.

What if those individuals had never met before, so they couldn't even agree on a secret code or password to use?

This seems impossible! To communicate securely they need some sort of key, but how can they share the key without first being able to communicate securely?!

In fact, it's very easy to imagine how to do this:

![](crypto2.png)

![](crypto1.png)

In other words: 
- Alice encrypts. 
- Then Bob doubly encrypts. 
- Then Alice partially decrypts. 
- Then Bob fully decrypts. 

At every stage, Eve cannot unlock the box to read what's inside. This is a depiction of *asymmetric key cryptography*. What's even more impressive is that we can even do better than this. We can communicate publicly and actually create a shared key, without any eavesdroppers getting the info.

## Discrete Logarithms

Recall the standard logarithm and exponential functions from calculus:

In [0]:
plot(log(x), (x, 0, 7))+plot(exp(x), (x, -6, 2), color = 'green')+plot(x, (-6,6), color = 'red')

In [0]:
a = exp(7.53)
RR(log(a))

In [0]:
b = 10^15
RR(log(b))

In [0]:
RR(log(b))/RR(log(10))

There are many fast algorithms for approximating $\log(a)$ for any constant $a$ (for example, there are power series approximations which can be computed fairly easily)

Given the discussion from Wednesday; what if we tried to extend these notions to modular arithmetic? (By the way: for those with algebra background, this conversation directly generalizes to *finite fields*, but I want to try and avoid using that term)

Let $p$ be a prime number and let $a$ be a primitive root $\mod p$. We already know exponentiation:

In [0]:
for i in range(0,19):
    print(power_mod(2,i,19))

We will define the *discrete logarithm of $m$ with respect to $a \mod p$* to be the unique integer $d$ in $\{0,1,\dots,p-2\}$ such that $a^d \equiv m\mod p$.

In [0]:
p = random_prime(2000)
print(p)

In [0]:
a = primitive_root(p)
print(a)

In [0]:
import random
exponent = random.randint(0,p-2)
m = power_mod(a,exponent,p)
print(m)

In [0]:
mod(m,p).log(mod(a,p))

In [0]:
print(exponent)

In [0]:
print(power_mod(a,544, p))

This looks super easy! What's the big deal?

Thanks for asking. The big deal is that discrete logarithms are *hard* to compute in general! What I mean by "hard" is that there is no efficient algorithm which can compute discrete logarithms in the general case. It is conjectured that this is a *NP-Intermediate* problem. 

A quick word on what this (heuristically) means. When we solve problems using a computer, we are interested in the number of operations it takes for our computer to spit out an answer. This is directly related to the *runtime* of the algorithm. 

The runtime of the algorithm is dependent on the size of the input. It's very fast to factor numbers that are smaller than $100$. It's very slow to factor numbers that are larger than $100000$. Etc.

What is the size of the input? Well this depends on what your input is. For a number $N$, the size of the input is (roughly) the number of digits required to write that number down. This is basically logarithmic in $N$.

In [0]:
N = 100
print(float(N.log(10)))
N.ndigits()

In [0]:
N = 55124
print(float(N.log(10)))
N.ndigits()

In [0]:
N = 9992934923949
print(float(N.log(10)))
N.ndigits()

Let's try to solve the discrete logarithm problem in the most naive way possible, and then discuss the runtime.

## ***** Participation Check ***************************

In the code cell below, write a function which 
- takes as input a prime number $p$, a primitive root $a\mod p$, and an integer $N$ which is coprime to $p$.
- iterates through the exponents $0,1,2,\dots,p-2$ and, at each step, tests if $a^e \equiv N\mod p$
- once it finds such an exponent $e$, return $e$.

In [0]:
def discreteLog(p,a,N):
    #Your code here

## ***********************************************************

What is the runtime of this algorithm? Well in general we are just iterating through the values $0,1,2,\dots,p-2$ and doing a test at each step. If you implemented this as efficiently as possible, this would take ~$p$ operations in the worst case. We say this algorithm has a $O(p)$ runtime.

But remember! The input size is *logarithmic* in $p$! What is the comparison of input to runtime? I.e. what is the comparison of $\log(p)$ to $p$? Exponential!

There are many more algorithms for computing discrete logarithms. All of them are known to be exponential (although some of them are very fast on certain special inputs). A major question in computability theory is whether or not there exists a *polynomial time algorithm* for computing the discrete logarithm in general inputs.

In [0]:
p = random_prime(10^30)
print(p)

In [0]:
a = primitive_root(p)
print(a)

In [0]:
import random
exponent = random.randint(0,p-1)
m = power_mod(a,exponent,p)
print(m)

In [0]:
import time
t = time.time()
print(mod(m,p).log(mod(a,p)))
print('ran in roughly: ',time.time()-t,' seconds')

In [0]:
print(exponent)

Practical implementations of Diffie-Hellman-Merkle use 1000 bit primes (or larger), for which this method would be entirely impossible.

## Diffie-Hellman-Merkle Key Exchange

Let's use this to our advantage to generate a key for secure communication.

(Historical note: you will often see this simply referred to as the *Diffie-Hellman key exchange*, as the math behind this was originally *published* by Whitfield Diffie and Martin Hellman. But Ralph Merkle was integral to the process. You can read more about this in an interview of Hellman, transcribed here: https://conservancy.umn.edu/bitstream/handle/11299/107353/oh375mh.pdf;jsessionid=0DBC6185AFF7B816D0F1D85C0911D058?sequence=1)

Let's say Alice and Bob want to communicate securely. To do so, they want to establish a key, or a "shared secret" that they can use to encode future messages.

Here is the idea:

- Step 1: Alice and Bob publicly choose a large prime $p$ and a multiplicative generator $g\mod p$.
- Step 2: Alice and Bob independently (and secretly) choose an integer in the range $0,\dots,p-2$. These are called their *private keys*. Alice's will be called $a$ and Bob's will be called $b$.
- Step 3: Alice and Bob publicly transmit $A = g^a\mod p$ and $B = g^b \mod p$.
- Step 4: Alice receives $B$ and computes $B^a \mod p$. Bob receives $A$ and computes $A^b\mod p$. Modulo $p$,
$$
B^a \equiv (g^b)^a \equiv g^{ab} \equiv (g^a)^b\equiv A^b,
$$
so Alice and Bob have created a shared key $K$.

If Eve wanted to break this protocol, she would have to be able to recreate $K$ from $g$, $g^a$, and $g^b$. This is believed to be as difficult as the discrete logarithm problem in general.

In [0]:
# Public stuff
p = random_prime(2^64)
g = primitive_root(p)

# Alice's private and public keys
a = Integers().random_element(p-2) 
A = power_mod(g,a,p)

# Bob's private and public keys
b = Integers().random_element(p-2) 
B = power_mod(g,b,p)

# Alice computes the shared secret
K_alice = power_mod(B,a,p)

# Bob computes the shared secret
K_bob = power_mod(A,b,p)

# Finally, check that they are the same
print(K_alice == K_bob)

In [0]:
K_alice

In [0]:
K_bob

## Weak Primes

The Diffie-Hellman-Merkle is a very good algorithm in general. That doesn't mean you can apply the method blindly though. Here is an example of a terrible prime number:

In [0]:
p = 1298074214633668968809363113301611

Why is it terrible? Well, look at it

In [0]:
factor(p-1)

Whenever $p-1 = q_1*q_2$ for $q_1, q_2$ relatively prime factors of size $\approx \sqrt{p}$, the following happens. 

Recall Euler's Theorem: $x^{p-1}\mod p = 1$ for any $x$. Thus $(x^{q_1})^{q_2}\mod p = 1$ for any $x$. That is, if we only look at elements with are $q_1$-powers of something, they have order $q_2$. An analogous thing happens if we switch $q_1$ and $q_2$. 

Suppose we are trying to solve the mod $p$ discrete log problem for $A = g^a$, i.e. we want to recover $a$.

The idea is to recover $a\mod q_1$ and $a\mod q_2$, from which we can use the Chinese Remainder Theorem to recover $a$.
- Find the discrete logarithm of $A^{q_1}$ with respect to $g^{q_1}$, i.e. $a_2$ such that $g^{a_2*q_1} = A^{q_1}$. This implies that $a_2*q_1 \equiv a*q_1\mod p-1$, i.e. $a_2\equiv a\mod q_2$.
- Find the discrete logarithm of $A^{q_2}$ with respect to $g^{q_2}$, i.e. $a_1$ such that $g^{a_1*q_2} = A^{q_2}$. This implies that $a_1*q_2 \equiv a*q_2\mod p-1$, i.e. $a_1\equiv a\mod q_1$.
- Compute $a = CRT(a_1, a_2, q_1, q_2)$.

In [0]:
q1 = 2 * 3 * 5 * 2487977 * 482705387
q2 = 36028797018963913

In [0]:
exponent = 983902092654374580967281794418725
g = primitive_root(p)
print(g)
print(power_mod(g,exponent,p))

In [0]:
A = power_mod(g,exponent,p)

A1 = power_mod(A,q1,p)
A2 = power_mod(A,q2,p)

g1 = power_mod(g,q1,p)
g2 = power_mod(g,q2,p)

In [0]:
import time
t = time.time()
a1 = mod(A1,p).log(mod(g1,p))
a2 = mod(A2,p).log(mod(g2,p))
print(time.time()-t)

In [0]:
print(a1)
print(a2)

In [0]:
crt([a2,a1],[q1,q2])

In [0]:
exponent

In [0]:
import time
t = time.time()
print(mod(A,p).log(mod(g,p)))
print(time.time()-t)

## Next Time: Baby Step Giant Step, Elliptic Curves