# CyS 431 - HW 1
C1C Jim Wang

13 Jan 2021

## Problem 1

The ciphertext "YXCYIS" was produced by an affine cipher mod 26. Wes have reason to believe plaintext starts with “cr”.

### (a) What's the key?
Mapping each letter to their respective index, "YXCYIS" becomes:

$$ YXCIS \to 24,23, 2, 24, 8, 18 $$

That was kind of annoying so we'll script that and its inverse funciton.

In [9]:
def alphabet_map(string_or_array):

    if isinstance(string_or_array, str):
        num_array = []
        list(string_or_array)
        for i in string_or_array.lower():
            num_array.append(ord(i)-97)
        return(num_array)
    else:
        word_array = []
        for i in string_or_array:
            word_array.append(chr(i+97))
        ret_string = ""
        return(ret_string.join(word_array))

In [10]:
alphabet_map("YXCIS")

[24, 23, 2, 8, 18]

Cool, so that's done. However, we now need to derive the cipher keys. We are given that the first two letters are "cr". Thus:
$$ c \to Y\\ r \to X$$
Converting this to numbers:

In [11]:
alphabet_map("cr")

[2, 17]

so we now have:
$$ 02 \to 24 \\ 17\to 23$$
and the system:
$$ \begin{align*}24 \equiv 2\alpha + \beta \mod{26}\\ 23 \equiv 17\alpha + \beta \mod{26} \end{align*}$$

Solving:

$$
    \begin{align*}
        24 \equiv 2\alpha + \beta \mod{26}\\ 
        23 \equiv 17\alpha + \beta \mod{26}\\

        \therefore -1\equiv 25 \equiv 15 \alpha \mod{26}
    \end{align*}
$$

Since $15 * 7 \equiv 1 \mod{26}$:

$$
    \begin{align*}
    25 * 7 \equiv 15\alpha * 7 \equiv \alpha \mod{26}\\
    \therefore \alpha \equiv 25 * 7 \equiv 175 \equiv 19\mod{26}
    \end{align*}
$$

Plugging in our $\alpha$:

$$
    \begin{align*}
        24 \equiv 2 * 19 + \beta \equiv 38 \equiv 12 + \beta \mod{26}\\
        \therefore \beta \equiv 12 \mod{26}
    \end{align*}
$$


Thus, $\alpha = 19$ and $\beta = 12$. I don't feel like coding this.


### (b) What's the Message
We will now decode our message using the affine cipher formula: $$P \equiv \gamma(C-\beta) \mod{26}$$where $\gamma * \alpha \equiv 1 \mod{26}$.

In [12]:
def affine(cipher_string, alpha, beta, decode):
    valid = {True, False}

    if decode not in valid:
        raise ValueError("results: status must be one of %r." % valid)
    
    i_array = alphabet_map(cipher_string)
    o_array = []

    if decode:
        for i in i_array:
            gamma = pow(alpha, -1, 26)
            o_array.append(gamma * (i - beta) % 26)
        return(alphabet_map(o_array))
    else:
        cipher_string = cipher_string.lower()
        for i in i_array:
            o_array.append((alpha*i + beta ) %26)
        return(alphabet_map(o_array).upper())



In [13]:
print("decoded message is: " + affine("YXCYIS", 19, 12, True))
print("plugging back into the affine: " + affine("crucio", 19, 12, False))

decoded message is: crucio
plugging back into the affine: YXCYIS


### (b) What kind of attack is this?
This is a ciphertext only attack as we do not have the encryption or decryption machine ($\alpha$ and $\beta$ are undefined in the problem statement).

Just kidding it's a known plaintext attack.

## Question 2
Suppose we encrypt a message with an affine cipher using key K1, then encrypt
the ciphertext with an affine cipher using key K2. Is this double encryption more
secure than just doing a single encryption? Support your answer mathematically.

No, this is primarily because the size of the key space remains relatively similar between affine and double affine.

Consider the affine cypher. Let $P$, $C_1$, $C_2$ be the plaintext, singly encrypted affine, and doubly encrypted affine respectively.

Further, let $\alpha$, $\beta$ be keys for the affine such that $C_1 \equiv \alpha P + \beta \mod{26}$. While $\beta$ can have 26 values $\beta \in \{0,2,3\dots,25\}$, $\alpha$ is restricted to all natural numbers under 26 such that $\text{gcd}(\alpha, 26) = 1$. Thus, $\alpha$ may have $\phi(26) = \phi(2) * \phi(13) = 1 * 12 = 12$ possible values. Thus, the key space for the affine cipher has only $12*26 = 312$ keys.

Let $\gamma$ and $\delta$ be the keys for the second encryption such that $C_2 \equiv \gamma C_1 + \delta \mod{26} \Rightarrow C_2 \equiv \gamma (\alpha P + \beta ) + \delta \mod{26}$. Thus, there are $312*312 = 97344$ keys. 

While this may seem like a significant change, it is still an insignificant key size for a computer. Thus, double affine is not more secure.

## Question 3

Suppose our alphabet has only 3 letters, A, B, and C, which occur in plaintext
with frequency 75%, 15%, 10%, respectively. A message is encrypted with a
Vigenere cipher (mod 3, of course), using a key that is of length 1, 2, or 3 (you don’t
know which). If the ciphertext is CBCABAAACA.

We're scripting!

In [35]:
def find_key_size(cipher_text):
    cipher_arr = list(cipher_text)

    shift_dict = {} #shift_val : coincidences
    for shift_value in range(1,len(cipher_arr)):
        match = 0
        for i in range(len(cipher_arr)):
            if cipher_arr[i] == cipher_arr[(i + shift_value) % len(cipher_arr)]:
                match += 1
        shift_dict.updavte({shift_value:match})
    #print(shift_dict)
    return max(shift_dict, key=shift_dict.get)

print("the most likely key size is: " + str(find_key_size("CBCABAAACA")))

the most likely key size is: 2


By using the book's technique, we find that the most likely key size has length 2. We will now conduct frequency analysis on the letters of indices modulo 2.

In [None]:
def mod_freq_analysis(cipher_text):
    key_length = find_key_size(cipher_text)
    # this is the last problem... but worse... why am I even coding this
    

## Question 9

Write a small program that loads in a text file of any size and then prints the
frequency (as a percentage) of each character (‘a’..’z’). All characters should be made
lowercase for counting purposes. Ignore punctuation, spaces, etc.

In [15]:
import re

with open('testFiles/hw1.txt', 'r') as file:
    raw_string = file.read().replace('\n', '')

raw_string = raw_string.replace(' ', '')
raw_string = raw_string.lower()
raw_string = re.sub(r'[^\w\s]', '', raw_string)


def frequency(txt, sign):
    counter: int = 0
    for s in txt:
        if s != sign:
            continue
        counter += 1
    return counter


value_dict = {}

for s in 'abcdefghijklmnopqrstuvwxyz':
    word_freq = frequency(raw_string, s)
    percent = 100 * word_freq / len(raw_string)
    value_dict.update({s: percent})

value_dict = {k: v for k, v in sorted(
    value_dict.items(), key=lambda item: item[1], reverse=True)}

for key, val in value_dict.items():
    val = str(round(val, 2))
    print('\'' + key + '\'' + ' - ' + val + '%')


'e' - 11.96%
't' - 9.25%
'a' - 7.58%
'i' - 6.95%
'n' - 6.41%
'o' - 6.18%
'h' - 5.87%
'r' - 5.46%
's' - 5.01%
'c' - 3.84%
'l' - 3.75%
'y' - 3.02%
'p' - 2.84%
'u' - 2.66%
'd' - 2.08%
'f' - 1.99%
'g' - 1.99%
'w' - 1.85%
'b' - 1.67%
'm' - 1.67%
'k' - 1.08%
'v' - 0.63%
'x' - 0.63%
'q' - 0.23%
'z' - 0.18%
'j' - 0.09%
