# Xor Encryption

In [None]:
#run this the first time to install the library
%pip install tabulate

XOR (eXclusive OR) is a logical operation just like AND/OR/NOT but it's a little harder to grasp intuitively. In C-style programming languages it's represented with the caret (`^`)

It can help to think of it as a logical diff. 2 bits are compared and the result is 1 if they're different or 0 if they're the same.


In [2]:
from tabulate import tabulate

table = [
    [' A',' B','A^B'],
    [0,0,0^0],
    [0,1,0^1],
    [1,0,1^0],
    [1,1,1^1],
]

print(tabulate(table, headers='firstrow', tablefmt='fancy_grid'))

╒══════╤══════╤═══════╕
│    A │    B │   A^B │
╞══════╪══════╪═══════╡
│    0 │    0 │     0 │
├──────┼──────┼───────┤
│    0 │    1 │     1 │
├──────┼──────┼───────┤
│    1 │    0 │     1 │
├──────┼──────┼───────┤
│    1 │    1 │     0 │
╘══════╧══════╧═══════╛


## One Time Pads
This makes it useful for encryption when a random series of bytes are used as a secret key. You can encrypt a message by doing
```
cyphertext=xor(plaintext, key)
```
and then someone with the same series of bytes can decrypt with
```
plaintext=xor(cyphertext, key)
```
  
Assuming the key is truly random, the message will be unrecoverable without having a copy of the keyfile.

Unfortunately this key needs to be the same length as the message and it can never be reused or it leads to other types of decryption attacks.

Before we continue here's a reminder that ascii characters are stored as bytes and not every byte represents a printable character

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/ASCII-Table.svg/1200px-ASCII-Table.svg.png" width="60%" />

In [3]:
plaintext = "secret"
row=['char','ord','hex','bin']
table=[row]

for letter in plaintext:                #for each letter
    row=(
        [letter,                        #the character
        ord(letter),                    #the ascii representation of that character
        hex(ord(letter)),               #in hex
        '{0:08b}'.format(ord(letter))   #in binary (you can just use bin())
        ])
    table.append(row)

print(tabulate(table, headers='firstrow', tablefmt='fancy_grid'))
# here's our plaintext broken into different encodings to show how it's stored in memory
# try changing the loop from "plaintext" to "key" 

╒════════╤═══════╤═══════╤══════════╕
│ char   │   ord │ hex   │      bin │
╞════════╪═══════╪═══════╪══════════╡
│ s      │   115 │ 0x73  │ 01110011 │
├────────┼───────┼───────┼──────────┤
│ e      │   101 │ 0x65  │ 01100101 │
├────────┼───────┼───────┼──────────┤
│ c      │    99 │ 0x63  │ 01100011 │
├────────┼───────┼───────┼──────────┤
│ r      │   114 │ 0x72  │ 01110010 │
├────────┼───────┼───────┼──────────┤
│ e      │   101 │ 0x65  │ 01100101 │
├────────┼───────┼───────┼──────────┤
│ t      │   116 │ 0x74  │ 01110100 │
╘════════╧═══════╧═══════╧══════════╛


In [4]:
# this function takes two strings and does an XOR for each byte
def xor(s1,s2):
    return ''.join(chr(ord(a) ^ ord(b)) for a,b in zip(s1,s2))

#this one repeats the key until it's the proper length
def repeat(s, l):
    return (s*(int(l/len(s))+1))[:l]

In [5]:
#here's how you might generate a random key
import random
securekey =''
for char in range(6):
    securekey+=(chr(random.randint(0,255)))
print(str.encode(securekey))

#but just for the demo, here's a less secure way to do it that you might see used in malware or CTFs
key = "key"
#stretch out the key
expanded_key=repeat(key,len(plaintext))
print(expanded_key)

b'H\x0fo5H\xc3\x90'
keykey


In [6]:
#plaintext^key
cyphertext=xor(plaintext,expanded_key)
#output encrypted message
print(str.encode(cyphertext))

b'\x18\x00\x1a\x19\x00\r'


And that's your encrypted string. Here's how to decrypt

In [7]:
print(xor(cyphertext,expanded_key))

secret


Let's take a closer look at what happened there

In [8]:
table=[]

table.append(list(plaintext))

row=[]
for letter in plaintext:
    row.append('{0:08b}'.format(ord(letter)))   #show the plaintext in binary
table.append(row)

table.append(list(expanded_key))
row=[]
for letter in expanded_key:
    row.append('{0:08b}'.format(ord(letter)))   #show the key in binary
table.append(row)

table.append(list(str.encode(cyphertext)))
row=[]
for letter in cyphertext:
    row.append('{0:08b}'.format(ord(letter)))   #show the XOR encrypted string in binary
table.append(row)

print(tabulate(table, headers='firstrow', tablefmt='fancy_grid'))

╒══════════╤══════════╤══════════╤══════════╤══════════╤══════════╕
│ s        │ e        │ c        │ r        │ e        │ t        │
╞══════════╪══════════╪══════════╪══════════╪══════════╪══════════╡
│ 01110011 │ 01100101 │ 01100011 │ 01110010 │ 01100101 │ 01110100 │
├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ k        │ e        │ y        │ k        │ e        │ y        │
├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 01101011 │ 01100101 │ 01111001 │ 01101011 │ 01100101 │ 01111001 │
├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 24       │ 0        │ 26       │ 25       │ 0        │ 13       │
├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 00011000 │ 00000000 │ 00011010 │ 00011001 │ 00000000 │ 00001101 │
╘══════════╧══════════╧══════════╧══════════╧══════════╧══════════╛


You should see that for each bit that matches between the secret and the key there's a `0` in the bottom and where those bits differ there's a `1`. 

It's interesting to notice that if you use the same byte in both your key and your plaintext the output is a zero byte.

And also that because we used only lowercase characters (in the range `0110 0001` - `0111 1010`) the first three bits match for every character.
This means that every cyphertext byte starts with 3 0s. This shows why using a repeating password is not as secure as a totally random range of bytes. 

If the plaintext had been a binary file it might have large sections with null characters (`0x00`). If you XOR `0x00 ^ key` then the output is just the key so it's sometimes visible as a repeating pattern in the "encrypted" file.


## OTP reuse
Another thing you might see is one time pads being reused. This is a big nono because

`cyphertext1 ^ cyphertext2` 

is the same as

`plaintext1 ^ key ^ plaintext2 ^ key`

the two encryption operations cancel each other out and you're left with

`plaintext1 ^ plaintext2`

I'll leave you to experiment. 

# See also
[Wikipedia](https://en.wikipedia.org/wiki/Exclusive_or)

[RNG for one time pads](https://www.random.org/bytes/) (but don't reuse them!)

[CyberChef web app for XORing quickly](https://gchq.github.io/CyberChef/)