# IT Security - Sheet 1 "Historic Ciphers"

**Total achievable points: 20**

**Released: 17.10.2024**

**Submission Deadline: 24.10.2024 23:59**

---
Groupnumber: 128

Names and matriculation numbers of **ALL** team members: Samuel Rode (445160), Nils Maasch (445796), Pau Azpeita Bergos (443428), Gereon Geuchen (445328), Ben-Jay Huckebrink (445219) 

Format: John Doe (999999)

---

**Important Information**

The assignments must be submitted by groups of 5 students. Please use the RWTHmoodle exercise room to register for a submission group using the [Submission Groups Section](https://moodle.rwth-aachen.de/course/view.php?id=44012#section-1). Even if you are registered in RWTHmoodle to a submission group, **please include the groupnumber as well as the name and matriculation number of every group member in this notebook**. Register in RWTHmoodle the submission group before you hand in the assignment. The registration of submission groups is available until the first assignment is due (24.10.2024 23:59). Please do not change or leave the submission group after the solution for this group was handed in.

Enter your solutions for the tasks in the respective cells of this notebook. These cells are either marked by "YOUR ANSWER HERE" or `#YOUR CODE HERE`. Cells marked with `###PLAYGROUND` can be used to test your implementation and generate output (see example for the first tasks). They will be ignored during grading. **Do not change any other cells or add new ones.**

Please **do not import any further Python packages** except the default Python ones and the ones that are explicitly given by us.

## Content of this Assignment

In the lecture, you learned about security goals and attacks threatening one or more of those security goals.
Likewise, you learned about vulnerabilities that may arise at different layers and about various possible attackers.
Furthermore, you were introduced to some examples of symmetric ciphers (including historic ones).

In this exercise, you'll build upon this knowledge and try to break some historic ciphers on your own!

## 1. Basics (6 points)

We will start with a little bit of important basic knowledge. Note that these are only some of the basics you need to know to excel in the exciting field of IT-security. For each question, try to answer the tasks **as precisely as possible** but still include all the **necessary** reasoning to support your claims.

### Task 1.1 (3 points)

**Explain** what CIA stands for, **give examples** of attacks against each of the security goals and **reason** why each of those attacks breaches the respective security goal.

CIA stands for confidentiality, integrity, and availability. Where confidentiality ensures that only authorized users can access the data, integrity ensures that the data is not altered by unauthorized users, and availability ensures that the data is available when needed. 
Example for confidentiality: Eavesdropping, i.e., an attacker intercepts the communication between two parties. This breaches confidentiality as the attacker can read the data that was supposed to be confidential.
Example for integrity: Modification, i.e., an attacker alters the data during transmission. This breaches integrity, as the attacker is obviously not authorized to change the data.
Example for availability: Denial of Service, i.e., an attacker floods the server with (fake) requests. This leads to the server being (temporarily) unavailable such that legitimate users also can not access the service.

### Task 1.2 (2 points)

**Explain** why cipher negotiation exists and **name one additional problem** which can be introduced by it.

New attacks on ciphers cannot be prevented and some defense mechanisms break. Therefore the typical solution is to integrate multiple algorithms (as mandatory) to choose from. In case of an algorithm breaking, we can configure our system to use the other supported algorithms. 
The integrity of the algorithm negotiation messages has to be protected. If the negotiation is insecure, it's possible that attackers intercept the negotiation messages and act like the requesting system only supports a (broken) algorithm.

### Task 1.3 (1 point)

Assume a system that has no vulnerabilities regarding the cryptographic primitives and the protocols used. The implementations of all components are bug-free, and there are no vulnerabilities. Also, assume there is no vulnerability in the interplay between those components.

**Explain** whether there is still a potential vulnerability that might break the security of the system when used.

If the system is used by humans, there is always a potential vulnerability. Humans can be tricked into revealing their passwords, which is a common social engineering attack. 

## 2. Monoalphabetic Cipher (5 points)

Now, we will play around with some ciphers and see that, often, it is not that hard to break them. We already know that the key space of the Caesar cipher is too small to be considered sufficiently secure nowadays. One can easily bruteforce the key - with **recognizable** plaintext at least.

Another approach was introduced in the lecture, namely a monoalphabetic substitution cipher. Here, we use a substitution table and replace each letter in the plaintext with the corresponding letter in the table. This effectively increases the key space, but there remains a problem. We can still make use of another property of languages: letters have a common frequency in their natural language. Hence, we can map back the ciphertext letters to the plaintext letters if we have plaintext that "makes sense". 

In the following, you will use that knowledge to break such a monoalphabetic substitution cipher.

With a monoalphabetic cipher, a letter in the plaintext is mapped to another letter in the alphabet.
You can think of it as a permutation of letters, where the key describes the permutation. Thus, the
key space has a size of k!. 

We use Python's dictionary datatype to store such a key. One example of a given key in the format would be: ```key = {'a': 'x', 'b': 'd', 'c': 't', 'd': 'f', ... , 'z': 'm'} ```

You can now assume the following counts of each letter were computed on the plaintext for the given ciphertext:

In [2]:
mono_counts = {
    'a': 27, 'b': 5, 'c': 25, 'd': 23, 'e': 63, 'f': 9, 'g': 13, 
    'h': 24, 'i': 34, 'k': 1, 'l': 15, 'm': 16, 'n': 38,
    'o': 49, 'p': 17, 'r': 36, 's': 32, 't': 53, 'u': 18,
    'v': 6, 'w': 7, 'y': 8
}

In [3]:
cipher_1 = (
    'royiy liy ln ynqlcsgwlrbnm syqgibru cluwale ysc ciaraqaw eaqgjynr lne ln lgroynrbqlrban oyleyi lo ' +
    'ciaraqaw eaqgjynr rolr qakyi roy clqdyr paijlr lne mynyilw bssgys iymliebnm roy iyscyqrbky ciaraqaws ' +
    'roy ynqiucrban lwmaibroj eaqgjynr syr soavn an roy wypr bs roy syr ap eaqgjynrs eysqibxbnm oav klibags ' +
    'ynqiucrban lwmaibrojs liy gsye pai ysc roy qajxbnye lwmaibroj eaqgjynr syr soavn bn roy jbeewy bs roy syr ap ' +
    'eaqgjynrs eysqibxbnm oav klibags qajxbnye jaey lwmaibrojs liy gsye ra ciakbey xaro ynqiucrban lne bnrymibru ' +
    'ciaryqrban pai ysc  roy pawwavbnm sribnm bs ra ynsgiy ebppyiynr qagnrs cyi wyrryi kvvuujcggoq'
)

### Task 2.1 (2 point)

Implement the function ```get_counts(text)``` that counts, how often each letter occurs in a text. 

Your function should return a dictionary of the format ```{letter: count}```, i.e, the keys are the letters and the values are the corresponding counts.

In [36]:
def get_counts(text: str) -> dict:
    # YOUR CODE HERE
    text = text.lower()
    dict = {}
    for i in text:
        if i in dict:
            dict[i] += 1
        else:
            dict[i] = 1
    # remove whitespace
    dict.pop(' ', None)

    # also add the counts (0) for all the letters that are not in the text
    letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
    for i in letters:
        if i not in dict:
            dict[i] = 0

    return dict

In [5]:
### PLAYGROUND
# You can use this cell to test out your implementation. Everything in this cell will be ignoored during grading.
#print(get_counts('hello world  sssssss'))
print(get_counts(cipher_1))

{'r': 53, 'o': 24, 'y': 63, 'i': 36, 'l': 27, 'n': 38, 'q': 25, 'c': 17, 's': 32, 'g': 18, 'w': 15, 'b': 34, 'm': 13, 'u': 8, 'a': 49, 'e': 23, 'j': 16, 'k': 6, 'd': 1, 'p': 9, 'v': 7, 'x': 5}


In [6]:
# This test just checks the output format of your solution

oft_result = get_counts(cipher_1)
assert len(oft_result) >= 22
for letter in cipher_1:
    assert (letter in oft_result or letter == ' ')

In [7]:
# Even this cell seems empty, it contains automatic tests. Please do not remove this cell and just ignore it.

### Task 2.2 (3 points)

Implement the function ```mono_decrypt(cipher_text, counts_english)``` that decrypts a ciphertext with the help of a given dictionary of letter counts. 

In addition to the decrypted plaintext, the decryption key should be returned. For your key, only consider letters that appear in the ciphertext. In this example, your key will have fewer entries than 26. 

The key has to be a dictionary with the format ```{cipher_letter: plain_letter}``` as described above. Your function should return a tuple of the form ```(plaintext, recovered_key)```.

In [31]:
def mono_decrypt(cipher_text: str, counts: dict) -> tuple:
    # YOUR CODE HERE
    key = {}
    # get counts of cipher text
    counts = get_counts(cipher_text)
    sorted_counts = sorted(counts.items(), key=lambda item: item[1], reverse=True)
    # remove letters that are not in the text
    sorted_counts = [i for i in sorted_counts if i[1] != 0]
    # sort the dictionary by value
    sorted_mono_counts = sorted(mono_counts.items(), key=lambda item: item[1], reverse=True)
    # create a key
    for i in sorted_counts:
        key[i[0]] = sorted_mono_counts.pop(0)[0]
    # decrypt the text
    plain_text = ''
    for i in cipher_text:
        if i in key:
            plain_text += key[i]
        else:
            plain_text += i
    return plain_text, key
  
    

In [32]:
### PLAYGROUND
# You can use this cell to test out your implementation. Everything in this cell will be ignoored during grading.

print(mono_decrypt(cipher_1, mono_counts))

('there are an encapsulating security payload esp protocol document and an authentication header ah protocol document that cover the packet format and general issues regarding the respective protocols the encryption algorithm document set shown on the left is the set of documents describing how various encryption algorithms are used for esp the combined algorithm document set shown in the middle is the set of documents describing how various combined mode algorithms are used to provide both encryption and integrity protection for esp  the following string is to ensure different counts per letter vwwyympuuhc', {'y': 'e', 'r': 't', 'a': 'o', 'n': 'n', 'i': 'r', 'b': 'i', 's': 's', 'l': 'a', 'q': 'c', 'o': 'h', 'e': 'd', 'g': 'u', 'c': 'p', 'j': 'm', 'w': 'l', 'm': 'g', 'p': 'f', 'u': 'y', 'v': 'w', 'k': 'v', 'x': 'b', 'd': 'k'})


In [10]:
# This test just checks the output format of your solution

oft_result = mono_decrypt(cipher_1, mono_counts)

# Check tuple
assert type(oft_result) == tuple
assert len(oft_result) == 2

# Check plaintext
assert type(oft_result[0]) == str

# Check key
for letter in cipher_1:
    assert (letter in oft_result[1] or letter == ' ')

In [11]:
# Even this cell seems empty, it contains automatic tests. Please do not remove this cell and just ignore it.

## Task 3: Vigenère Cipher (8 Points)

We saw during the last task that monoalphabetic substitution is also not yet a really secure cipher. Let's try another approach: polyalphabetic substitution ciphers! Here, we replace each letter in the plaintext with a different letter again but this time, this depends, e.g., on the position of the letter in the plaintext. Through this, we change the one-to-one substitution approach to a one-to-many substitution. 

You probably already guessed: in the next task, you are going to break an example of a polyalphabetic substitution cipher.

The Vigenère cipher is an example of a polyalphabetic substitution cipher. Each letter in the plaintext will be substituted depending on the key letter it is paired with. For your implementation, interpret the letters 'a' to 'z' as the numbers 0 to 25, such that you can calculate in a finite group Z mod 26.

For example, the letter `d` = 3 encrypted with the key `y` = 24 is equal to 3 + 24 = 27 = 1 mod 26 = `b`.

### Task 3.1 (3 Points)

Implement the function ```vigenere_decrypt(key, text)``` that decrypts a text with a given key according to the Vigenère cipher. First, you need to expand the key to the length of the text, such that you can pairwise subtract the cipher and key letters to obtain the decryption. Do **not** decrypt white spaces. Your function should return the decrypted text as a string.

Hint: You can use the built-in functions ```ord()``` *and* ```chr()``` to convert a character to its ASCII code and an integer to its character representation, respectively.

**Make sure that your function can decrypt arbitrary text. If you want to test your function, you can use [CyberChef](https://cyberchef.org/#recipe=Vigen%C3%A8re_Decode('')) to generate ciphertext-plaintext-pairs.

In [37]:
def vigenere_decrypt(key: str, text: str) -> str:
    # expand key
    key = key.lower()
    text = text.lower()
    decrypted = ''
    # remove whitespaces and remember their position
    spaces = []
    newtext = ''
    for i in range(len(text)):
        if text[i] == ' ':
            spaces.append(i)
        else:
            newtext += text[i]
    text = newtext
    key = key * (len(text) // len(key)) + key[:len(text) % len(key)]

    # ASCII A starts at 65 and Z is at 90
    for (i, k) in zip(text,key):
        decrypted += chr(((ord(i) - ord(k)) % 26) + 65)
        # debug print
        #print(f'{i} - {k} = {chr(((ord(i) - ord(k)) % 26) + 65)}')

    # insert whitespaces
    for i in spaces:
        decrypted = decrypted[:i] + ' ' + decrypted[i:]
    return decrypted

            

In [38]:
### PLAYGROUND
# You can use this cell to test out your implementation. Everything in this cell will be ignoored during grading.
plain = "attacking tonight"
cipher = "ovnlqbpvt hznzouz"
key = "oculorhinolaringology"
print(vigenere_decrypt(key,cipher))
print(vigenere_decrypt("lemon", "LXFOPVEFRNHR"))

ATTACKING TONIGHT
ATTACKATDAWN


In [14]:
# Even this cell seems empty, it contains automatic tests. Please do not remove this cell and just ignore it.

### Task 3.2 (2 Point)

Implement the function ```recognizable(text, counts_english)```. 

It should recognize a potential plaintext as an English text if the below-given letter counts match the letter counts in the computed plaintext. 

Return ```True``` or ```False``` depending on whether the text was a recognizable text or not. You can use the function ```get_counts()``` of task 2.1 (with the new counts!).

In [25]:
vigenere_ciphertext = "ty uxmxbx zmis qbj pnffl bb mulej lz bxm fkx beys xfyeizdb n tuxkn yeywb bj puna c bxm mfm kocjkx beche wfjy fp mtjy xrluocxb c xmipb csqi mfm jvyx"
vigenere_counts = {'a': 10, 'b': 0, 'c': 2, 'd': 4, 'e': 16, 'f': 2, 'g': 1, 'h': 10, 'i': 11, 'j': 0, 'k': 2, 'l': 1, 'm': 2, 'n': 6, 'o': 7, 'p': 5, 'q': 0, 'r': 5, 's': 16, 't': 6, 'u': 3, 'v': 0, 'w': 7, 'x': 0, 'y': 1, 'z': 0}

In [39]:
def recognizable(text: str, counts_english: dict) -> bool:
    # get counts of text
    counts = get_counts(text)
    
    # check if the counts are the same
    for i in counts:
        if i in counts_english:
            if counts[i] != counts_english[i]:
                return False
        else:
            return False
    return True

In [40]:
### PLAYGROUND
# You can use this cell to test out your implementation. Everything in this cell will be ignoored during grading.
# Test for the recognizable function
test_text = "this is a test text to check if the function works correctly"
test_counts = get_counts(test_text)
print(recognizable(test_text, test_counts))  # Expected output: True

# Test with non-matching counts
non_matching_counts = {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1, 'f': 1, 'g': 1, 'h': 1, 'i': 1, 'j': 1, 'k': 1, 'l': 1, 'm': 1, 'n': 1, 'o': 1, 'p': 1, 'q': 1, 'r': 1, 's': 1, 't': 1, 'u': 1, 'v': 1, 'w': 1, 'x': 1, 'y': 1, 'z': 1}
print(recognizable(test_text, non_matching_counts))  # Expected output: False

True
False


In [18]:
# Even this cell seems empty, it contains automatic tests. Please do not remove this cell and just ignore it.

### Task 3.3 (3 Points)

Implement the function ```brute_force_vigenere(text, counts_english, max_key_length)```. It should find the plaintext by trying every possible key up to a specified length. You can use the below function `get_all_keys` to iterate over all possible keys of a given length. The given counts help you recognize a decryption result as an English text. 

In addition, you have to retrieve the encryption key that was used to encrypt the plaintext. Hence, your function should return a tuple of the form ```(plaintext, encryption_key)```. The returned encryption key can either be a string or a tuple consisting of the individual chars of the key.

In [19]:
import itertools
def get_all_keys(key_length: int):
    letters = [chr(ascii_index) for ascii_index in range(ord("a"), ord("z") + 1)]
    return itertools.product(letters, repeat=key_length)

In [41]:
def brute_force_vigenere(text: str, counts: dict, max_key_length: int) -> tuple:
    # YOUR CODE HERE
    for i in range(1, max_key_length + 1):
        for key in get_all_keys(i):
            key = ''.join(key)
            decrypted = vigenere_decrypt(key, text)
            if recognizable(decrypted, counts):
                return decrypted, key
    return '', ''

In [42]:
### PLAYGROUND
# You can use this cell to test out your implementation. Everything in this cell will be ignoored during grading.
"""
cipher = "RIJVS"
plain = "HELLO"
key = "KEY"
print(brute_force_vigenere(cipher, get_counts(plain), 3))
print(vigenere_decrypt("eys", cipher))
"""
plain = "ATTACKATDAWN"
cipher = "LXFOPVEFRNHR"
key = "LEMON"
#print(recognizable("ATAACKAADAWN", get_counts(plain)))
print(brute_force_vigenere(cipher, get_counts(plain), 6))
#print(vigenere_decrypt("lemon", cipher))
#print(vigenere_decrypt("fygih", cipher))



('ATTACKATDAWN', 'lemon')


In [44]:
# This test just checks the output format of your solution

oft_result = brute_force_vigenere(vigenere_ciphertext, vigenere_counts, 5)
print(oft_result)

# Check tuple
assert type(oft_result) == tuple
assert len(oft_result) == 2

# Check plaintext
assert type(oft_result[0]) == str

# Check key
if type(oft_result[1]) == tuple:
    for c in oft_result[1]:
        assert type(c) == str
        assert len(c) == 1
else:
    assert type(oft_result[1]) == str

('WE PASSED UPON THE STAIR WE SPOKE OF WAS AND WHEN ALTHOUGH I WASNT THERE HE SAID I WAS HIS FRIEND WHICH CAME AS SOME SURPRISE I SPOKE INTO HIS EYES', 'xuf')


In [None]:
# Even this cell seems empty, it contains automatic tests. Please do not remove this cell and just ignore it.

In [None]:
# Even this cell seems empty, it contains automatic tests. Please do not remove this cell and just ignore it.

## 5. Example Exam Questions (0.5 points)

This task contains questions you could find during an exam. We will not grade the correctness of this task, but we will discuss it during the exercise session. You will be awarded 0.5 points, if you work on this task and every subtask has some written answer. Please be aware that also task 1 can occur during an exam, but task 1 will be graded for correctness.

### 5.1

Briefly **describe or define** what a **chosen-ciphertext attack** on an encryption scheme is and **list** what the attacker knows and can obtain.

A chosen-ciphertext attack (CCA) is an attack model for cryptanalysis in which the attacker can choose arbitrary ciphertexts to be decrypted and has access to the corresponding plaintexts. The goal of the attacker is to gather information that reduces the security of the encryption scheme.

In a chosen-ciphertext attack, the attacker knows:
- The encryption algorithm.
- The ciphertexts they have chosen.
- The corresponding plaintexts obtained from the decryption of the chosen ciphertexts.

The attacker can obtain:
- Information about the secret key.
- Insights into the structure and weaknesses of the encryption algorithm.
- Potentially the ability to decrypt other ciphertexts without knowing the secret key.

### 5.2

**Explain** the difference between a **block cipher** and a **stream cipher**.

One of the core differences of block & stream ciphers is that stream ciphers can be used to encrypt **arbitary length messages** while _block ciphers_ can (per se) only deal with **messages that _exactly_ match the required block length** of the cipher\
-> this means that _block ciphers_ **require** the use of a _mode of encryption_, while _stream ciphers_ **do not!**

### 5.3

Briefly **explain two different motives** of cyber attackers.

Two possible motives for attackers are:
1. **Monetary gain**: The attacker/_hacker-for-hire_ wants to make money from the attack, e.g. by stealing confidential information like credit card info to then make fraudulent transactions with them
2. **Pentesting**: _Not all attacks are bad_; some companies hire _pentesters_ that are _asked_ (& explicitly _allowed_) to attack a companies IT system to help reveal potential vulnerabilites _before any bad actor_ can exploit them

## 4. Feedback (0.5 points)

You made it through. Since we want to know how it went and how we might improve the exercises, we include the following task. Here, you can write constructive feedback! You even get 0.5 points for it if you write anything. But don't worry, we do not grade the content itself!

In our view, the exercise gave a very good recap of the first two chapters & touched upon every bigger aspect within those chapters.

However, in the coding exercises of task 2 & 3, it was a tad annoying that we (in task 2.2) were tasked with implementing the key return so that _only those_ entries that _were actually present_, leading to us initially implementing the `get_count` method _like this as well_, where we only returned letters _actually present_ in the text in the dictionary. This then lead to problems in task 3.2 when iterating over the dictionaries (as the dictionary _you provided_ also lists letters _not present_ in the text with a count of 0) & a bit of confusion in our debugging to fix this issue.\
To fix this, a hint in task 2.1 like "Implement this function so that even non-present letters appear in the returned dictionary (with a count of 0)" would work & have avoided the complications mentioned above.