# XOR decryption

> Each character on a computer is assigned a unique code and the preferred standard is ASCII (American Standard Code for Information Interchange). For example, uppercase A = 65, asterisk (*) = 42, and lowercase k = 107.
>
> A modern encryption method is to take a text file, convert the bytes to ASCII, then XOR each byte with a given value, taken from a secret key. The advantage with the XOR function is that using the same encryption key on the cipher text, restores the plain text; for example, 65 XOR 42 = 107, then 107 XOR 42 = 65.
>
> For unbreakable encryption, the key is the same length as the plain text message, and the key is made up of random bytes. The user would keep the encrypted message and the encryption key in different locations, and without both "halves", it is impossible to decrypt the message.
>
> Unfortunately, this method is impractical for most users, so the modified method is to use a password as a key. If the password is shorter than the message, which is likely, the key is repeated cyclically throughout the message. The balance for this method is using a sufficiently long password key for security, but short enough to be memorable.
>
> Your task has been made easy, as the encryption key consists of three lower case characters. Using p059_cipher.txt (right click and 'Save Link/Target As...'), a file containing the encrypted ASCII codes, and the knowledge that the plain text must contain common English words, decrypt the message and find the sum of the ASCII values in the original text.


ASCII has been supplanted by UTF-8 as “the preferred standard” for character encoding and is what Julia uses, but UTF-8 was designed for backward compatibility with ASCII. As long as we are using characters with code points less than 128 (0x80), we don’t have to differentiate between the two standards.

In [12]:
using DelimitedFiles
C = readdlm("p059_cipher.txt", ',', UInt8)
extrema(C)

(0x00, 0x5f)

We’re told that the encryption key is three characters long. We’ll reshape the cipher text so we can do frequency analysis on each row—data that was all encrypted with the same character.

In [8]:
keysize = 3
C′ = reshape(C, (keysize, cld(size(C)[2], keysize)))

3×485 Array{UInt8,2}:
 0x24  0x00  0x17  0x11  0x04  0x0b  …  0x45  0x00  0x13  0x45  0x08  0x17
 0x16  0x00  0x19  0x58  0x13  0x58     0x19  0x58  0x1d  0x16  0x1a  0x0b
 0x50  0x04  0x13  0x04  0x15  0x16     0x02  0x15  0x1e  0x05  0x15  0x5e

Now, for each row, we’ll find the most frequent entry. This should correspond to the space character (0x20). XORing the character code we identify in the cipher with 0x20 should recover the key character.

In [31]:
for R in eachrow(C′)
    F = Dict(i => count(isequal(i), R) for i in unique(R))
    m = argmax(F)
    k = Char(m ⊻ 0x20)
    println((m, k))
end

(0x45, 'e')
(0x58, 'x')
(0x50, 'p')


It’s a good sign that all three results are lower-case letters, as hinted in the problem statement. Now that we have the key, ‘exp’, we can turn our attention to decrypting the cipher text.

In [1]:
0b00100000 ⊻ 0b01000101

0x65

In [2]:
Char(0x65)

'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)

In [3]:
digits(0x65, base=2, pad=8)

8-element Array{Int64,1}:
 1
 0
 1
 0
 0
 1
 1
 0

In [4]:
0b01100101 ⊻ 0b00100000

0x45

In [5]:
digits(0x045, base=2, pad=8)

8-element Array{Int64,1}:
 1
 0
 1
 0
 0
 0
 1
 0