# Problem1: Affine Cipher

> Affine cipher is a type of monoalphabetic substitution cipher that uses modular arithmetic to encrypt the letters of a message. The mathematical module formula is c = ap + b mod n where input p is plaintext, the output c is the ciphertext, n is a modular integer, a and b are non-negative integers less than n, a and n are relatively prime (to do decryption). We assume all the letters are encoded to unique integers.

## 1a

> Describe affine cipher in more detail

For the affine cipher, each letter of an alphabet maps to an integer. The mathematical formula c = ap + b mod n is then applied to each letter individually to produce the ciphertext letter. This process repeats for every letter in the plaintext until the message is complete. For example, A is 0, B is 1, C is 2…, and Z is 25, where mod n=26. The cipher key is represented by a and b, where a must be selected in a way that a and m have no common factors other than 1 (coprime) so that there is a modular multiplicative inverse to decrypt the ciphertext letter successfully. 

Encryption example:

To encrypt the letter “S” using the formula c = ap + b mod n and encryption key of (3,7), we would follow these steps:

Find the integer equivalent of the plaintext letter: The integer value of the letter S is 18. 

The value of a = 3, b = 7, and n = 26. 

Now, apply the formula, and we get 3 x 18 + 7 = 61. 

61 mod 26 = 9 (when you divide 61 by 26, you get 2 and a remainder of 9.)

9 is the integer equivalent of the letter J. 

To decrypt the message, the opposite or inverse operation is performed. The formula for decryption is p = a^-1 ( c - b ) mod n. where a^-1 is the modular multiplicative inverse of a modulo n. For instance, if a*x = 1 (mod 26), then x is an inverse of a, and vice versa.

To decrypt the letter J back to S (while knowing the encryption key, which is 3 and 7), we would do the following:

The formula for decryption is p = a^-1 ( c - b ) mod n.

We know that c = 9 (the integer equivalent of J) and b = 7.

Calculate the modular multiplicative inverse of a mod n, where (3 * a^-1) mod 26 = 1.

Think about a number x mod 26 = 1. That would be 27. And 27 divided by 3 gives you 9. 

Check: (3*9) mod 26 = 1.

Now, plug the numbers in the decryption formula: p = 9 * (9 – 7) mod 26.

p = 9 * 2 mod 26

p = 18 mod 26 = 18

The letter equivalent of 18 is S. 

As the affine cipher is a form of symmetric encryption where the same key is used for encryption and decryption, traditional methods are not feasible for decryption without knowing the cipher key. We can employ cryptoanalysis techniques such as frequency analysis, known plaintext attacks, brute force, etc., to decrypt the ciphertext when the cipher key is unknown. However, that can be very complex and time-consuming.


## 1b
> What is the size of key space for a fixed integer n? Hint: Use Euler's totient Φ(n)

The key space size for a fixed integer n is determined by the number of possible combinations of a and b. Given a number n, the phi function Φ(n) outputs the number of integers that are less than or equal to n, and are relatively prime (or coprime) to n. For example, if we want to find Φ(8), we will count how many integers from 1 to 8 does not share a factor greater than 1, which will be 1, 3, 5, and 7. Therefore, Φ(8) = 4. 

The size of the key space for a fixed integer n can be calculated with Φ(n) * n, where Φ(n) is the phi function, and n is the modulus that represents the size of the alphabet or character set. 
To determine the key space for a fixed integer n in the affine cipher, calculate Φ(n) then multiply by n.

For example, using n = 26 to represent the English alphabet, Φ(26) = 12. 
12 x 26 = 312. 

Therefore, there is a total of 312 possible keys for a fixed integer of n = 26. 


## 1c

> Let us assume the plaintext is made of 26 capital letters only. So, the $n=26$. Given the affine cipher $c = 5p + 9 \mod 26$, what is the ciphertext for the plaintext “CRYPTO IS FUN”. Here we keep space as is

The ciphertext is `TQZGAB XV IFW`

In [35]:
import string
from typing import List, Tuple

Coefficients = Tuple[int, int]
Keyspace = List[str]

def are_relatively_prime(a: int, b: int) -> bool:
    if a == 0 or b == 0:
        return False
    else:
        while b != 0:
            a, b = b, a % b
        return a == 1

def check_assertions(n: int, a: int, b: int):
    if a < 0 or b < 0 or a >= n or b >= n:
        raise ValueError(f'a and b must be non-negative and less than {n}')

    if not are_relatively_prime(a, n):
        raise ValueError(f'a and n must be relatively prime')

def encrypt(plaintext: str, coefficients: Coefficients, keyspace: Keyspace) -> str:
    n = len(keyspace)
    a, b = coefficients
    check_assertions(n, a, b)

    plaintext_words: List[str] = plaintext.split(' ')
    ciphertext_words: List[str] = []

    for word in plaintext_words:
        indicies = [ keyspace.index(p) for p in word ]
        ciphertext_words.append(
            ''.join(keyspace[(a * idx + b) % n] for idx in indicies)
        )

    return ' '.join(ciphertext_words)

def decrypt(ciphertext: str, coefficients: Coefficients, keyspace: Keyspace) -> str:
    n = len(keyspace)
    a, b = coefficients
    check_assertions(n, a, b)
    
    def inverse(a, n) -> int:
        import math

        possible = [ a for a in range(n) if math.gcd(a, n) == 1]
        for p in possible:
            if (a * p) % n == 1:
                return p
            
        raise ValueError(f'No inverse for {a} mod {n}')
    
    ciphertext_words: List[str] = ciphertext.split(' ')
    plaintext_words: List[str] = []
    a_inverse = inverse(a, n)

    for word in ciphertext_words:
        indicies = [ keyspace.index(p) for p in word ]
        plaintext_words.append(
            ''.join(keyspace[(a_inverse * (idx - b)) % n] for idx in indicies)
        )

    return ' '.join(plaintext_words)
    

coefficients: Coefficients = (5, 9)
keyspace: Keyspace = list(string.ascii_uppercase)

c = encrypt("CRYPTO IS FUN", coefficients, keyspace)
assert(c == "TQZGAB XV IFW")
assert("CRYPTO IS FUN" == decrypt(c, coefficients, keyspace))

## 1d

> Eve has the ciphertext “QJKES REOGH GXXRE OXEO”. She magically knows the cipher is an affine cipher and the letter T is encrypted to H and O to E. Recover the decryption function and decipher the message. Students shall solve it manually first and then use code to solve it. They both shall give the same results. The code shall be more general, not just in this case.

In [37]:
ciphertext = 'QJKES REOGH GXXRE OXEO'
n = len(keyspace)

# Find all possible A (since limited to relitively prime numbers with regards to len(keyspace))
def get_possible_a() -> List[int]:
    possible_a = []
    for a in range(n):
        if are_relatively_prime(a, n) and a != 1:
            possible_a.append(a)

    return possible_a

def get_a_b(first_plain, first_cipher, second_plain, second_cipher) -> Coefficients:
  for a in get_possible_a():
      for b in range(n):
        first = (a * keyspace.index(first_plain) + b) % n
        second = (a * keyspace.index(second_plain) + b) % n
        if first == keyspace.index(first_cipher) and second == keyspace.index(second_cipher):
            return(a,b)

coefficients = get_a_b('T','H','O','E')
decrypted = decrypt(ciphertext, coefficients, keyspace)
print(decrypted)
assert(decrypted == "IFYOU BOWAT ALLBO WLOW")

IFYOU BOWAT ALLBO WLOW


## 1e

> What is the affine formula if we want to include the space and little letter case in the encde set?

$$
n = \lvert \{A, \dots, Z\} \cup \{a, \dots, z\} \cup \{ \text{ } \} \rvert = 53 \\
E = (ap + b)\mod{n} \\
D = a^{-1}(p - b)\mod{n}
$$

# Problem 2 Frequency Analysis

> Alice uses a simple substitution cipher to send her message to Bob. It reads as “TNFOS FOZSW PZLOC GQAOZ WAGQR PJZPN ABCZP QDOGR AMTHA RAXTB AGZJO GMTHA RAVAP ZW”. Space is treated as is. Eve gets the ciphertext and she also heard the word “liberty” appears in the plaintext.

## 2a

> Describe the substitution cipher.

## 2b

> What is the size of key space

The size of the key space in a substitution cipher is the number of possible permutations of the alphabet. For example, there are 26 letters in the English alphabet, to calculate the size of the key space for this, we can use the factorial function. 
The key space size calculated with 26!, which is the product of all positive integers from 1 to 26:

26! = 26 × 25 × ... × 3 × 2 × 1

Therefore, the size of the key space for a simple substitution cipher is approximately 4 x 10^26, or 88-bit. 


## 2c

> Use the frequency of English letters as reference to recover the plaintext. We can do it manually. Optionally, we can do it by coding. It is a bit of a challenge. It is doable.

The plaintext is `know notwh atcou rseot hersm aytak ebuta sform egive melib ertyo rgive medea th` i.e. _know not what course others may take but as for me give me liberty or give me death_

In [2]:
from typing import List, Tuple

ciphertext = "NFOS FOZSW PZLOC GQAOZ WAGQR PJZPN ABCZP QDOGR AMTHA RAXTB AGZJO GMTHA RAVAP ZW"
ciphertext_condensed = ciphertext.replace(' ', '')
known_word = 'LIBERTY'
known_word_chars = set([*known_word])
known_word_length = len(known_word)

english_letter_frequency = {
    'E' : 12.0,
    'T' : 9.10,
    'A' : 8.12,
    'O' : 7.68,
    'I' : 7.31,
    'N' : 6.95,
    'S' : 6.28,
    'R' : 6.02,
    'H' : 5.92,
    'D' : 4.32,
    'L' : 3.98,
    'U' : 2.88,
    'C' : 2.71,
    'M' : 2.61,
    'F' : 2.30,
    'Y' : 2.11,
    'W' : 2.09,
    'G' : 2.03,
    'P' : 1.82,
    'B' : 1.49,
    'V' : 1.11,
    'K' : 0.69,
    'X' : 0.17,
    'Q' : 0.11,
    'J' : 0.10,
    'Z' : 0.07
}

english_letter_frequency_sorted: List[Tuple[str, float]] = sorted(english_letter_frequency.items(), key=lambda x: x[1], reverse=True)

letter_counts = {
    k: ciphertext.count(k) for k in set([*ciphertext_condensed])
}
letter_counts_sorted: List[Tuple[str, int]] = sorted(letter_counts.items(), key=lambda x: x[1], reverse=True)

potential_plaintexts = []

# Find potential slice where 'LIBERTY' could be
for i in range(len(ciphertext_condensed) - len(known_word) + 1):
    characters = ciphertext_condensed[i:i+known_word_length]
    unique_characters = set([*characters])

    # 'LIBERTY' could potentially fit here
    if len(unique_characters) == len(known_word_chars):

        mapping = {k: v for (k, v) in zip(characters, known_word)}

        # Most likely when the two most frequent letters in the ciphertext
        # are the two most frequent letters in the english language
        if mapping.get(letter_counts_sorted[0][0]) == english_letter_frequency_sorted[0][0] and\
            mapping.get(letter_counts_sorted[1][0]) == english_letter_frequency_sorted[1][0]:

            output = ciphertext
            for k, v in mapping.items():
                output = output.replace(k, v.lower())

            potential_plaintexts.append(output)



# Remove LIBERTY from choices
letter_counts_sorted = list(filter(lambda x: x[0] not in known_word, letter_counts_sorted))

for p in potential_plaintexts:
    print("Potential: ", p)

# Manually solved below:
# ... give me liberty or give me death
candidate = "NFOS FOtSW PtLOC rQeOt WerQR PytPN ebCtP QDOrR eMiHe Relib ertyO rMiHe ReVeP tW"

candidate = candidate \
    .replace('O', 'o') \
    .replace('M', 'g') \
    .replace('H', 'v') \
    .replace('R', 'm') \
    .replace('V', 'd') \
    .replace('P', 'a') \
    .replace('W', 'h') \
    .replace('Q', 's') \
    .replace('N', 'k') \
    .replace('C', 'u') \
    .replace('L', 'c') \
    .replace('S', 'w') \
    .replace('F', 'n') \
    .replace('D', 'f')

assert(
    # know not what course others may take but as for me give me liberty or give me death
    candidate == 'know notwh atcou rseot hersm aytak ebuta sform egive melib ertyo rgive medea th'
)


Potential:  NFrS FrtSy PtLrl ibert yeibR PJtPN eBltP bDriR eMTHe ReXTB eitJr iMTHe ReVeP ty
Potential:  NFOS FOtSW PtLOC rQeOt WerQR PytPN ebCtP QDOrR eMiHe Relib ertyO rMiHe ReVeP tW


## 2d

> (optional) The following message is from a Vigenère cipher with a 3-letter English keyword: “CTMYR DOIBS RESRR RIJYR EBYLD IYMLC CYQXS RRMLQ FSDXF OWFKT CYJRR IQZSM X”. Recover the plaintext.

The plaintext is `SPOON FEEDING IN THE LONG RUN TEACHES US NOTHING BUT THE SHAPE OF THE SPOON` with the key being `KEY`

In [45]:
from itertools import product
from string import ascii_uppercase

ciphertext = "CTMYR DOIBS RESRR RIJYR EBYLD IYMLC CYQXS RRMLQ FSDXF OWFKT CYJRR IQZSM X"
ciphertext_condensed = ciphertext.replace(' ', '')

def decrypt(ciphertext: str, key: str) -> str:
    plaintext: List[str] = []

    for idx, c in enumerate(ciphertext):
        p = (ord(c) - ord(key[idx % len(key)])) % 26
        p += ord('A')
        plaintext.append(chr(p))

    return ''.join(plaintext)

for key in product(ascii_uppercase, repeat=3):
    plaintext = decrypt(ciphertext_condensed, ''.join(key))
    # See if the two most common words in the english language are in the plaintext
    # This limits the number of keys down to 10
    if "THE" in plaintext and "OF" in plaintext:
        print(key, plaintext)

# Manually solved below:
# SPOON FEEDING IN THE LONG RUN TEACHES US NOTHING BUT THE SHAPE OF THE SPOON
assert(
    "SPOONFEEDINGINTHELONGRUNTEACHESUSNOTHINGBUTTHESHAPEOFTHESPOON" == decrypt(ciphertext_condensed, 'KEY')
)


('G', 'D', 'Y') WQOSOFIFDMOGMOTLFLSOGVVNXFAGIEWVSRPTLJNKCUXUHITHEQESGTLFSTPOR
('H', 'E', 'K') VPCRNTHERLNULNHKEZRNUUUBWEOFHSVUGQOHKIBJBIWTVHSVDPSRFHKEGSOCQ
('K', 'E', 'Y') SPOONFEEDINGINTHELONGRUNTEACHESUSNOTHINGBUTTHESHAPEOFTHESPOON
('L', 'N', 'I') RGENEVDVTHEWHEJGVBNEWQLDSVQBYURLIMFJGZDFSKSKXDJXZGUNWJGVIOFEM
('L', 'N', 'L') RGBNESDVQHETHEGGVYNETQLASVNBYRRLFMFGGZAFSHSKUDJUZGRNWGGVFOFBM
('T', 'E', 'Y') JPOFNFVEDZNGZNTYELFNGIUNKEATHEJUSEOTYINXBUKTHVSHRPEFFTYESGOOE
('U', 'P', 'C') IEKECBUTZYCCYCPXTHECCHJJJTWSWAIJODDPXXJWQQJIDUHDQEAEUPXTOFDKD
('U', 'U', 'L') IZBEXSUOQYXTYXGXOYEXTHEAJONSRRIEFDYGXSAWLHJDUUCUQZREPGXOFFYBD
('X', 'Y', 'O') FVYBTPRKNVTQVTDUKVBTQEAXGKKPNOFACAUDUOXTHEGZRRYRNVOBLDUKCCUYA
('Y', 'F', 'H') EOFAMWQDUUMXUMKTDCAMXDTEFDROGVETJZNKTHESALFSYQRYMOVAEKTDJBNFZ
