# Seminar 03: Mapping, Encoding, Stream Ciphers

## <span style="color:red">Caesar-like ciphers</span>

## Mapping and permutations

First, we introduce a function that will allow us to construct mapping from inputs to outputs (and vice versa) in various ways. The constructed mappings will take a form of dictionary, where keys will correspond to inputs and values to outputs. You don't need to spend time analysing this function, we will show you examples of how it can be used.

In [None]:
def construct_mappings(inputs, perm=None, key=None, outputs=None):
    l = len(inputs)
    if outputs == None:
        outputs = [item for item in inputs]
    if perm != None:
        tmp = [item for item in outputs]
        for idx in range(l):
            idx_mapped = perm(idx, key)
            outputs[idx] = tmp[idx_mapped]
    mapping = dict(zip(inputs, outputs))
    inverse = dict(zip(outputs, inputs)) 
    return mapping, inverse

In our context, permutation is a mapping in which the input set is equal to the output set. It is also one-to-one, meaning that every input symbol is mapped to different output symbol. 

Below is a simple example of permutation which 'shifts' outputs by one.

In [None]:
def shift_by_one(idx, key=None):
    return (idx + 1) % 3 # depends on size of the permutation

**Task 1**: Print out the `mapping` and  `inverse` dictionaries to see that they are one-to-one.

In [None]:
mapping, inverse = construct_mappings(inputs=['A', 'B', 'C'], perm = shift_by_one)
# print both mappings

**Task 2**: Verify that `mapping` and `inverse` are inverse operations. Apply one of them, then the other one and you should see the original input.

In [None]:
# here test composition of mapping and inverse

**Task 3:** Define `shift_by_two` and create mapping equivalent to `'A': 'C', 'C': 'E'...` for whole English alphabet. Print out the maping and verify visualy the correctness. 

In [None]:
def shift_by_two(idx, key=None):
    return idx

alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
mapping, inverse = construct_mappings(inputs=alphabet, perm = shift_by_two) 

### Original Caesar cipher

The original Caesar cipher is actually very similar to the mapping (`mapping`) we just defined. The only difference is that it shifts input symbols by 3 positions instead of 2. We will also extend the input alphabet by " "(space) and "." (dot), this will come in handy for encrypting natural text.

In [None]:
def caesar_perm(idx, key = None):
    return (idx + 3) % 28
caesar_mapping, caesar_inverse_mapping = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", caesar_perm)

**Task 4**: The function `txt_process_with_mapping` allows us to apply the mapping to messages instead of just a single symbol. Use it to decrypt the following ciphertext. 

In [None]:
def txt_process_with_mapping(message, character_mapping):
    res_list = [character_mapping[char] for char in message] 
    res_text = "".join(res_list)
    return res_text
ciphertext = "KHOORBZRUOG"
# print(txt_process_with_mapping(??,??)) #uncomment and replace ?? 

**Task 5**: Use `txt_process` and appropriate mapping to encrypt plaintext `HELLO WORLD`. 

In [None]:
# txt_process_with_mapping('HELLO WORLD', ??)  #uncomment and replace ?? 

## Ciphers with keys

The problem of the original Caesar cipher is that relies on security-by-obscurity - if our adversary (an enemy) learns how the cipher works, he will be able to decrypt any captured ciphertext. To counter this problem, ciphers use keys - additional piece of information that is required to be able to decrypt messages.

### Generalized Caesar cipher

We can construct a cipher with keys from the original Caesar cipher. In the generalized Caesar cipher, key will correspond to amount of positions by which the inputs are shifted. 

First, we define the permutation function used by the cipher. 

In [None]:
def generalized_caesar_perm(idx, key):
    return (idx + key) % 28 # 26 letters + 2 extra symbols

To create an instance of the cipher (mappings), we need to choose a key.

In [None]:
key = 9
encryption_mapping, decryption_mapping = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", generalized_caesar_perm, key)

We encrypt and decrypt in the same way as using the cipher without a key.

In [None]:
print(txt_process_with_mapping("HELLO", encryption_mapping))
print(txt_process_with_mapping("QNUUX", decryption_mapping))

### Attacking the cipher

Perhaps as you already have known, even generalized Caesar cipher is vulnerable to various attacks, which we will explore in next few tasks.

**Task 5:** We have captured an enemy ciphertext and thanks to information from our spies, we know that the corresponding plaintext was one of the two options written below. Can you figure out the original plaintext? What is the key? Verify your findings.

In [None]:
plaintext1 = 'ATTACK NOW   '
plaintext2 = 'DO NOT ATTACK'

ciphertext = 'VMMVXDTGHPTTT'

key = 0 # PUT YOUR KEY GUESS HERE
encryption_mapping, decryption_mapping = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", generalized_caesar_perm, key)
print(txt_process_with_mapping(ciphertext, decryption_mapping))

**Task 6:** Our enemy has improved its operations and we only have limited intel from the spies about the next ciphertext. However, they have discovered that the enemy now terminates its messages with a commonly used symbol. Can you find the key and decrypt the following ciphertext?

In [None]:
ciphertext = 'KBBKMUIKBIXYYXJ'

key = 0 # PUT YOUR KEY GUESS HERE
encryption_mapping, decryption_mapping = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", generalized_caesar_perm, key)
print(txt_process_with_mapping("KBBKMUIKBIXYYXJ", decryption_mapping))

**Task 7:** We no longer have any information about content or format of the messages of the enemy. Can you still find the original message?

Hint: What is the simplest method?

In [None]:
ciphertext = 'GQXXKP ZQKGUDTKI EBKNBEDQR BOQKMDDMOW'

#implement your approach here

In [None]:
key = 0 # PUT YOUR KEY GUESS HERE
encryption_mapping, decryption_mapping = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", generalized_caesar_perm, key)
print(txt_process_with_mapping(ciphertext, decryption_mapping))

### Statistical analysis

While bruteforce attack can be used to break the generalized Caesar cipher, there are more clever approaches that can be used as well. One is concerned with frequency analysis of letters in written English. Observe frequency of symbols in a randomly selected text below.

In [None]:
from collections import Counter
import pandas as pd

text = "ONE MORNING WHEN GREGOR SAMSA WOKE FROM TROUBLED DREAMS HE FOUND HIMSELF TRANSFORMED IN HIS BED INTO A HORRIBLE VERMIN. HE LAY ON HIS ARMOURLIKE BACK AND IF HE LIFTED HIS HEAD A LITTLE HE COULD SEE HIS BROWN BELLY SLIGHTLY DOMED AND DIVIDED BY ARCHES INTO STIFF SECTIONS. THE BEDDING WAS HARDLY ABLE TO COVER IT AND SEEMED READY TO SLIDE OFF ANY MOMENT. HIS MANY LEGS PITIFULLY THIN COMPARED WITH THE SIZE OF THE REST OF HIM WAVED ABOUT HELPLESSLY AS HE LOOKED. WHATS HAPPENED TO ME HE THOUGHT. IT WASNT A DREAM. HIS ROOM A PROPER HUMAN ROOM ALTHOUGH A LITTLE TOO SMALL LAY PEACEFULLY BETWEEN ITS FOUR FAMILIAR WALLS. A COLLECTION OF TEXTILE SAMPLES LAY SPREAD OUT ON THE TABLE  SAMSA WAS A TRAVELLING SALESMAN  AND ABOVE IT THERE HUNG A PICTURE THAT HE HAD RECENTLY CUT OUT OF AN ILLUSTRATED MAGAZINE AND HOUSED IN A NICE GILDED FRAME. IT SHOWED A LADY FITTED OUT WITH A FUR HAT AND FUR BOA WHO SAT UPRIGHT RAISING A HEAVY FUR MUFF THAT COVERED THE WHOLE OF HER LOWER ARM TOWARDS THE VIEWER. GREGOR THEN TURNED TO LOOK OUT THE WINDOW AT THE DULL WEATHER. DROPS OF RAIN COULD BE HEARD HITTING THE PANE WHICH MADE HIM FEEL QUITE SAD."

df = pd.DataFrame.from_dict(Counter(text), orient='index', columns=['LETTER FREQUENCY'])
df.sort_index().plot.bar()

**Task 8:** Now, try to decrypt the following text by observing frequencies of letters in the ciphertexts.

In [None]:
ciphertext = "PWCGIJWA.GQNGQG TMMXGIGTQ..TMGJQ.GTWVOMZGIVLGNWZOM.GITTG.PQ GVWV MV MGPMG.PWAOP.GJA.G.PI.GCI G WUM.PQVOGPMGCI GAVIJTMG.WGLWGJMKIA MGPMGCI GA MLG.WG TMMXQVOGWVGPQ GZQOP.GIVLGQVGPQ GXZM MV.G .I.MGKWATLV.GOM.GQV.WG.PI.GXW Q.QWVHGPWCMBMZGPIZLGPMG.PZMCGPQU MTNGWV.WGPQ GZQOP.GPMGITCIE GZWTTMLGJIKSG.WGCPMZMGPMGCI HGPMGUA .GPIBMG.ZQMLGQ.GIGPAVLZMLG.QUM G PA.GPQ GMEM G WG.PI.GPMGCWATLV.GPIBMG.WGTWWSGI.G.PMGNTWAVLMZQVOGTMO GIVLGWVTEG .WXXMLGCPMVGPMGJMOIVG.WGNMMTGIGUQTLGLATTGXIQVG.PMZMG.PI.GPMGPILGVMBMZGNMT.GJMNWZMHGWPGOWLGPMG.PWAOP.GCPI.GIG .ZMVAWA GKIZMMZGQ.GQ G.PI.GQBMGKPW MVG.ZIBMTTQVOGLIEGQVGIVLGLIEGWA.HGLWQVOGJA QVM  GTQSMG.PQ G.ISM GUAKPGUWZMGMNNWZ.G.PIVGLWQVOGEWAZGWCVGJA QVM  GI.GPWUMGIVLGWVG.WXGWNG.PI.G.PMZM G.PMGKAZ MGWNG.ZIBMTTQVOGCWZZQM GIJWA.GUISQVOG.ZIQVGKWVVMK.QWVFGJILGIVLGQZZMOATIZGNWWLGKWV.IK.GCQ.PGLQNNMZMV.GXMWXTMGITTG.PMG.QUMG WG.PI.GEWAGKIVGVMBMZGOM.G.WGSVWCGIVEWVMGWZGJMKWUMGNZQMVLTEGCQ.PG.PMUHGQ.GKIVGITTGOWG.WGPMTTGPMGNMT.GIG TQOP.GQ.KPGAXGWVGPQ GJMTTEGXA PMLGPQU MTNG TWCTEGAXGWVGDQ GJIKSG.WCIZL G.PMGPMILJWIZLG WG.PI.GPMGKWATLGTQN.GPQ GPMILGJM..MZGNWAVLGCPMZMG.PMGQ.KPGCI GIVLG ICG.PI.GQ.GCI GKWBMZMLGCQ.PGTW. GWNGTQ..TMGCPQ.MG XW. GCPQKPGPMGLQLV.GSVWCGCPI.G.WGUISMGWNGIVLGCPMVGPMG.ZQMLG.WGNMMTG.PMGXTIKMGCQ.PGWVMGWNGPQ GTMO GPMGLZMCGQ.GYAQKSTEGJIKSGJMKIA MGI G WWVGI GPMG.WAKPMLGQ.GPMGCI GWBMZKWUMGJEGIGKWTLG PALLMZHGPMG TQLGJIKSGQV.WGPQ GNWZUMZGXW Q.QWVH"

df = pd.DataFrame.from_dict(Counter(ciphertext), orient='index', columns=['LETTER FREQUENCY'])
df.sort_index().plot.bar()

In [None]:
key = 0 # PUT YOUR KEY GUESS HERE
encryption_mapping, decryption_mapping = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", generalized_caesar_perm, key)
print(txt_process_with_mapping(ciphertext, decryption_mapping))

**BONUS TASK[do non-bonus tasks first]:** Try to automatize the bruteforce attack using the frequency analysis - the function will return one or more most likely plaintexts corresponding to the provided ciphertext with the used key.

We provide you with two ciphertexts to test your method.

Hint: You can take advantage of the previously used `Counter` class. Also beware of the fact that in usual encodings, the symbol ' ' does not follow the letter 'Z'. 

In [None]:
ciphertext1 = "IGCWVLMZNATG MZMVQ.EGPI G.ISMVGXW  M  QWVGWNGUEGMV.QZMG WATGTQSMG.PM MG CMM.GUWZVQVO GWNG XZQVOGCPQKPGQGMVRWEGCQ.PGUEGCPWTMGPMIZ.HGQGIUGITWVMGIVLGNMMTG.PMGKPIZUGWNGMDQ .MVKMGQVG.PQ G XW.GCPQKPGCI GKZMI.MLGNWZG.PMGJTQ  GWNG WAT GTQSMGUQVMHGQGIUG WGPIXXEGUEGLMIZGNZQMVLG WGIJ WZJMLGQVG.PMGMDYAQ Q.MG MV MGWNGUMZMG.ZIVYAQTGMDQ .MVKMG.PI.GQGVMOTMK.GUEG.ITMV. HGQG PWATLGJMGQVKIXIJTMGWNGLZICQVOGIG QVOTMG .ZWSMGI.G.PMGXZM MV.GUWUMV.GIVLGEM.GQGNMMTG.PI.GQGVMBMZGCI GIGOZMI.MZGIZ.Q .G.PIVGVWCHGCPMVGCPQTMG.PMGTWBMTEGBITTMEG.MMU GCQ.PGBIXWAZGIZWAVLGUMGIVLG.PMGUMZQLQIVG AVG .ZQSM G.PMGAXXMZG AZNIKMGWNG.PMGQUXMVM.ZIJTMGNWTQIOMGWNGUEG.ZMM GIVLGJA.GIGNMCG .ZIEGOTMIU G .MITGQV.WG.PMGQVVMZG IVK.AIZEH"

ciphertext2 = "MBMZEGTM..MZGCPMZMBMZG.PMZMGQ GVMML GMVKZEX.QWVG.WGJMG MKAZMH"

# FOR REFERENCE
key1 = 4
key2 = 17

def fixed_ord(char):
    if char == ' ':
        return ord('Z') + 1
    elif char == '.':
        return ord('Z') + 2
    else:
        return ord(char)

def find_original_text(ciphertext):
    # here implement your solution
    return 'Replace this string by the corresponding plaintext'

print("CIPHERTEXT 1:")
find_original_text(ciphertext1)
print("CIPHERTEXT 2:")
find_original_text(ciphertext2)

### Caesar with keystream
The previous attack is based on statistic of letters in CT. The attack is possible since the same mapping is applied to all letters i.e. frequencies do not change (they are only shifted). In order to prevent the previous attack we can change the mapping for each letter by changing the corresponding key.  Hence we need to define stream of keys (=keystream) one for each letter.   

In [None]:
def generalized_caesar_perm(idx, key):
    return (idx + key) % 28

def txt_process_with_changing_mapping(message, keystream, perm):
    ct_list = []
    for i in range(len(message)):
        char = message[i]
        key = keystream[i]
        mapping, inverse = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", perm=perm, key=key)
        ct_list.append(mapping[char])
    ct = "".join(ct_list)
    return ct

Execute the following cell where the original encryption `txt_process_with_mapping` that uses fixed mapping defined by `key = 21`. In the next task Task 9 you will use `txt_process_with_changing_mapping` that uses keystream (stream of keys) to define mappings - one for each character. Using `txt_process_with_changing_mapping` you can obtain the same result as with `txt_process_with_mapping`. It suffices to define appropriate keystream. 

In [None]:
key = 21 
PT = 'I AM OF TWO MINDS ABOUT THIS. ON THE ONE HAND ITS PROBABLY PREMATURE TO SWITCH TO ANY PARTICULAR POSTQUANTUM ALGORITHMS. THE MATHEMATICS OF CRYPTANALYSIS FOR THESE LATTICE AND OTHER SYSTEMS IS STILL RAPIDLY EVOLVING AND WE ARE LIKELY TO BREAK MORE OF THEM AND LEARN A LOT IN THE PROCESS OVER THE COMING FEW YEARS. BUT IF YOU ARE GOING TO MAKE THE SWITCH THIS IS AN EXCELLENT CHOICE. AND APPLES ABILITY TO DO THIS SO EFFICIENTLY SPEAKS WELL ABOUT ITS ALGORITHMIC AGILITY WHICH IS PROBABLY MORE IMPORTANT THAN ITS PARTICULAR CRYPTOGRAPHIC DESIGN. AND IT IS PROBABLY ABOUT THE RIGHT TIME TO WORRY ABOUT AND DEFEND AGAINST ATTACKERS WHO ARE STORING ENCRYPTED MESSAGES IN HOPES OF BREAKING THEM LATER ON FUTURE QUANTUM COMPUTERS.'
encryption_mapping, decryption_mapping = construct_mappings("ABCDEFGHIJKLMNOPQRSTUVWXYZ .", generalized_caesar_perm, key)
CT_mapping = txt_process_with_mapping(PT, encryption_mapping)
CT_mapping

**Task 9:** Define only the keystream so the resulted ciphertext will be the same `BTVFTH TMPHTFBGY...`as one obtained by `txt_process_with_mapping` with K = 21 in the cell above. 

In [None]:
keystream = list(range(1000)) # replace by appropriate list (think about the length of the list, and list values)
CT_changing_mapping = txt_process_with_changing_mapping(PT, keystream, perm=generalized_caesar_perm)
if CT_mapping == CT_changing_mapping: 
    print('Correctly defined keystream')
else: 
    print('Incorrectly defined keystream')

**Task 10:** Randomly generated keystream using the `secrets.token_bytes` is used to encrypt the same PT twice. Execute the following cell to see that frequencies of letters in CT are changing and we can not distinguish letters based on their frequency.

In [None]:
import secrets
keystream = list(secrets.token_bytes(len(PT)))
CT = txt_process_with_changing_mapping(PT, keystream, perm=generalized_caesar_perm)
df = pd.DataFrame.from_dict(Counter(CT), orient='index', columns=['LETTER FREQUENCY'])
df.sort_index().plot.bar()

keystream = list(secrets.token_bytes(len(PT)))
CT = txt_process_with_changing_mapping(PT, keystream, perm=generalized_caesar_perm)
df = pd.DataFrame.from_dict(Counter(CT), orient='index', columns=['LETTER FREQUENCY'])
df.sort_index().plot.bar()

## <span style="color:red">Vernam cipher</span>

### Vernam vs Caesar keystream
Vernam cipher in principle works similarly to caesar with keystream above. The difference between them is that it uses bits 0,1 instead of alphabet A...Z. The following example shows processing the message `msg`.   

In [None]:
def vernam_perm(idx, key):
    return (idx + key) % 2

def bits_process_with_changing_mapping(message, keystream, perm):
    ct_list = []
    for i in range(len(message)):
        bit = message[i]
        key = keystream[i]
        mapping, inverse = construct_mappings([0, 1], perm=perm, key=key)
        ct_list.append(mapping[bit])
    return ct_list

msg = [0, 1, 1, 1, 0, 0, 0, 1]
keystream = [0, 1, 0, 0, 1, 0, 1, 0]

print(bits_process_with_changing_mapping(msg, keystream, perm=vernam_perm))

**Task 11**: Verify that encryption process is equivalent to XOR of msg bits and keystream bits. Define two integers `msg_int,keystream_int` that correspond to bits of message `msg` and keystream `keystream`. You shoudl see that XOR (operator `^`) of `msg_int,keystream_int` correspond to result above.

In [None]:
def byte_to_bits(byte_val):
    return list(reversed([byte_val >> i & 1 for i in range(8)]))

msg_int = 1 # replace by appropriate integer
keystream_int = 2 # replace by appropriate integer
byte_to_bits(msg_int ^ keystream_int)

# Vernam cipher
Now we will describe vernam cipher that operates above bytes. In Python, we will use the `bytes` type for that.

**Task 12**: What is the result of bitwise XOR: $01110001_2 \oplus 01001010_2$?

Set the two variables `a`, `b` accordingly and compare your answer to the question with `c`. Beware, `bytes.fromhex()` expects a hexadecimal string.

In [None]:
def XOR(array1: bytes, array2: bytes) -> bytes:
    l = min(len(array1), len(array2))
    xored = bytes(a ^ b for (a, b) in zip(array1, array2))
    if len(array1) > l:
        xored += array1[l:]
    else:
        xored += array2[l:]
    return xored

a = bytes.fromhex('00')
b = bytes.fromhex('00')
c = XOR(a, b)
c.hex()

**Task 13**: Using `c` and `a` compute `b_computed` and compare with the original `b`. Use the `XOR` function.

Hint: From `c`, `a` with $c = a \oplus b$ can be `b` computed using $\oplus$.

In [None]:
b_computed = None
print(b_computed == b) #verification

To generate random binary data (e.g. keys), we will be using the `secrets` library. 

In [None]:
import secrets

key = secrets.token_bytes(32)
print(key)

**Task 14**: Fix the returned value of the `Vernam` function.

In [None]:
def Vernam(text: bytes, key: bytes) -> bytes:
    assert len(text) <= len(key), f'The key is shorter than the encrypted/decrypted text.'
    return bytes() #TODO fix the returned value

Check that the decrypted plaintext corresponds to the original message.

In [None]:
msg = b'At the first God made the heaven and the earth. And the earth was waste and without form and it was dark on the face of the deep and the Spirit of God was moving on the face of the waters.'
key = secrets.token_bytes(len(msg))
ct = Vernam(msg, key)
pt = Vernam(ct, key)
print(pt)

## <span style="color:red">Stream ciphers</span>

### Chacha20
Modern stream ciphers like Chacha20 encrypt plaintext by XORing it with a "keystream" generated from a key and a nonce (random bitstring used only once). Importantly, the key is of fixed size, usually much shorter than the plaintext.

In [None]:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms

def chacha20_encrypt(plaintext: bytes, key: bytes, nonce: bytes) -> bytes:
    algorithm = algorithms.ChaCha20(key, nonce)
    encryptor = Cipher(algorithm, mode=None).encryptor()
    ct = encryptor.update(plaintext) + encryptor.finalize()
    return ct

def chacha20_decrypt(ciphertext: bytes, key: bytes, nonce: bytes) -> bytes:
    algorithm = algorithms.ChaCha20(key, nonce)
    decryptor = Cipher(algorithm, mode=None).decryptor()
    pt = decryptor.update(ciphertext) + decryptor.finalize()
    return pt

msg = b'Far far away, behind the word mountains, far from the countries.'

**Task 15**: Replace zeros with correct key/nonce sizes to encrypt the message `msg` using Chacha20. See [chacha20 documentation](https://cryptography.io/en/latest/hazmat/primitives/symmetric-encryption/#cryptography.hazmat.primitives.ciphers.algorithms.ChaCha20).

In [None]:
key = secrets.token_bytes(0)
nonce = secrets.token_bytes(0)
ct = chacha20_encrypt(msg, key, nonce)
pt = chacha20_decrypt(ct, key, nonce)

**Task 16**: Change the plaintext `pt` so that by encrypting it, you obtain the first 64 bytes of the keystream. Remember how the ciphertext is computed from the plaintext and the keystream.

In [None]:
pt = bytes.fromhex('11'*64) # TODO change the plaintext
keystream = chacha20_encrypt(pt, key, nonce)
print(keystream)

Verify that the ciphertext `ct` can be obtained directly as a XOR of `msg` and `keystream`.

In [None]:
ct2 = XOR(msg, keystream)
ct2 == ct

**Task 17**: Use two nonces which differ in single bit and compare corresponding keystreams. How much do they differ?

In [None]:
def bytestream_to_bitstring(byte_stream):
    return ''.join([''.join(map(str, byte_to_bits(byte_val))) for byte_val in byte_stream ]) 

pt = bytes.fromhex('00'*10)
nonce1 = bytes.fromhex('00'*15+'00') 
nonce2 = bytes.fromhex('00'*15+'00') # TODO change the nonce
keystream1 = chacha20_encrypt(pt, key, nonce1)
keystream2 = chacha20_encrypt(pt, key, nonce2)
print(bytestream_to_bitstring(keystream1))
print(bytestream_to_bitstring(keystream2))

**Task 18 (BONUS)**: Imagine yourself in the role of an attacker. You managed to discover that `known_ct` is a ciphertext obtained by encrypting `known_pt` using ChaCha20. Then, you intercepted another ciphertext which may have been encrypted using the same key/nonce pair. Can you decrypt it?

In [None]:
known_pt = b'Attack at dawn!'
known_ct = b'\xfdy=\x98\x89\xa7\xb9Rj>\xe9?\x15#\xb5'

intercepted_ct = b'\xf8h/\x9c\x84\xa8\xb9Rj>\xe9+\x11&\xb5'

## <span style="color:red">Bonus: Block ciphers</span>

Today, we spoke about various types of ciphers:
- first, we discussed the (generalized) **Caesar cipher** that encrypts plaintext consisting of letters by shifting each letter by amount of positions determined by the key. We have shown that this cipher is easy to break.
- then we introduced the **Vernam cipher**, which works with bits instead of letters and uses the XOR operation. This cipher is perfectly secure, but suffers from major limitations that make it impractical to use - the key needs to be as long as the plaintext and can only be used once.
- finally, we discussed **stream ciphers**, which try to fix problems of Vernam cipher. They use so-called generators, functions which are used to generate keystream from key. This way, we can a have a short key (e.g. 128 bits), from which we can generate keystream of any length, that is used to encrypt the plaintext using the XOR operation. The generators also have second input, initialisation vector, which is used to ensure that the generated keystream is different each time (recall second issue of Vernam cipher).

However this is not the only way to design ciphers. It is still possible to create ciphers that use same key repeatedly. These ciphers are called **block ciphers**. Block ciphers require data to be split into **blocks** of same predefined size. Same key is then used to encrypt to every block of the data.

So how can block ciphers be secure? The solution is to use more complex encryption function. One concept used by block cipher is **s-box** (substitution box).

## Simple block cipher

Let us define a simple block cipher with block size of 3 bits that will use the following sbox:

| Input  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
|--------|-----|-----|-----|-----|-----|-----|-----|-----|
| Output | 110 | 101 | 001 | 000 | 011 | 010 | 111 | 100 |

The encryption function is defined using XOR and sbox as follows:

$E(input) = sbox(input \oplus key)$

And the corresponding decryption function is:

$D(input) = inverse\_sbox(input) \oplus key$

In [None]:
inputs = [
    "000",
    "001",
    "010",
    "011",
    "100",
    "101",
    "110",
    "111"
]
sbox_outputs = [
    "110",
    "101",
    "001",
    "000",
    "011",
    "010",
    "111",
    "100"
]
sbox, inverse_sbox = construct_mappings(inputs, outputs=sbox_outputs)

def encrypt_block(block, key):
    return sbox[f"{(int(block, 2) ^ int(key, 2)):03b}"] # sbox(block XOR key)

def decrypt_block(block, key):
    return f"{(int(inverse_sbox[block], 2) ^ int(key, 2)):03b}" # inverse_sbox(block) XOR key

def process_with_block_cipher(message, key, cipher_operation):
    blocks = [message[i:i+3] for i in range(0, len(message), 3)] # split message into blocks
    res_blocks = [cipher_operation(block, key) for block in blocks]
    res = "".join(res_blocks)
    return res

plaintext = "010111001"
ciphertext = process_with_block_cipher(plaintext, "101", encrypt_block)
decrypted_plaintext = process_with_block_cipher(ciphertext, "101", decrypt_block)

print(f"Plaintext: {plaintext}")
print(f"Ciphertext: {ciphertext}")
print(f"Decrypted plaintext: {decrypted_plaintext}")

As you might have guessed, this example block cipher is too simple to be secure.

Real block ciphers use much more complex functions and more importantly, **larger block size and key size** (e.g. 128 bits).

We will discuss them in more detail next seminar.