# Notebook 11.2: Breaking the Code 🕵️‍♀️

> "It is a capital mistake to theorize before one has data." — Arthur Conan Doyle, *A Study in Scarlet*

Welcome back, Agent! In the last notebook, you learned how to encrypt and decrypt messages using the Caesar Cipher. But what happens when you intercept a message and you *don't* know the key? In this notebook, you'll put on your detective hat and learn how to break the code!

### Learning Objectives

By the end of this notebook, you will be able to:

*   Understand and implement a **brute-force attack** to break a Caesar cipher.
*   Understand the concept of **frequency analysis** as a code-breaking technique.
*   Use dictionaries to count the frequency of letters in a text.
*   Combine loops, dictionaries, and conditional logic to analyze encrypted messages.

### Prerequisites/Review

*   Concepts from [Notebook 10: Dictionaries](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/10-dictionaries.ipynb), including how to create and use dictionaries.
*   Concepts from [Notebook 11.a: The Caesar Cipher](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/11.a-the-caesar-cipher.ipynb), including the `caesar_cipher` function.

### Estimated Time
This notebook is designed to be completed in approximately **30-60 minutes**.

[Return to Table of Contents](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/table-of-contents.ipynb)


## 🕵️‍♀️ Brute-Force Attack: Trying Every Possibility

Imagine you have a locked box, and you don't know the combination. One way to open it is to try every single possible combination until you find the right one. This is exactly what a **brute-force attack** is in cryptography!

For the Caesar cipher, there are only 25 possible keys (shifts from 1 to 25, since a shift of 0 or 26 would result in the original message). This is a small enough number that a computer can try every single one very quickly.

Let's recall our `caesar_cipher` function from the previous notebook. We'll use it here to decrypt messages with different keys.

In [None]:
def caesar_cipher(text, shift, mode='decrypt'):
    result = ""
    for char in text:
        if char.isalpha():
            start = ord('a') if char.islower() else ord('A')
            offset = ord(char) - start
            if mode == 'encrypt':
                shifted_offset = (offset + shift) % 26
            elif mode == 'decrypt':
                shifted_offset = (offset - shift) % 26
            result += chr(start + shifted_offset)
        else:
            result += char
    return result

# Test the function (optional, but good practice)
encrypted_message = caesar_cipher("HELLO", 3, 'encrypt')
decrypted_message = caesar_cipher(encrypted_message, 3, 'decrypt')
print(f"Original: HELLO, Encrypted: {encrypted_message}, Decrypted: {decrypted_message}")

### How to Brute-Force a Caesar Cipher

Let's say we intercept the following encrypted message:

```
"khoor zruog"
```

We don't know the key! But we can try every possible shift from 1 to 25 and see which one makes sense.

In [None]:
encrypted_text = "khoor zruog"

print(f"Encrypted Message: {encrypted_text}")

for key in range(1, 26):
    decrypted_text = caesar_cipher(encrypted_text, key, 'decrypt')
    print(f"Trying key {key:2}: {decrypted_text}")

Did you see it? One of the decrypted messages should look like a readable English phrase! This is how a brute-force attack works: you try everything until something makes sense.

### 🎯 Mini-Challenge: Brute-Force the Secret Message!

Your mission, should you choose to accept it, is to decrypt the following secret message using a brute-force attack. Find the original message and the key that was used to encrypt it.

```
"wklv lv d vhfuhw phvvdjh"
```

<details>
  <summary>Hint: Remember the range of possible keys.</summary>
  Try keys from 1 to 25. The correct key will reveal a readable message.
</details>

<details>
  <summary>Hint: Reuse the `caesar_cipher` function.</summary>
  You already have a function that can decrypt. Just call it inside a loop with different `shift` values.
</details>

In [None]:
secret_message = "wklv lv d vhfuhw phvvdjh"

# YOUR CODE HERE


<details>
  <summary>Click to see a possible solution</summary>

  ```python
  secret_message = "wklv lv d vhfuhw phvvdjh"

  print(f"Encrypted Message: {secret_message}
")

  for key in range(1, 26):
      decrypted_text = caesar_cipher(secret_message, key, 'decrypt')
      print(f"Trying key {key:2}: {decrypted_text}")

  # The correct key is 3, which decrypts to "this is a secret message"
  ```
</details>

## 📊 Frequency Analysis: The Codebreaker's Secret Weapon

While brute-force works for simple ciphers like Caesar, what if there were hundreds or thousands of possible keys? Trying every single one would take too long! This is where **frequency analysis** comes in.

Frequency analysis is a technique that uses the known frequency of letters in a language to help decipher a coded message. In English, for example, the letter 'E' is the most common, followed by 'T', 'A', 'O', 'I', 'N', 'S', 'H', 'R'. The least common letters are 'Z', 'Q', 'J', 'X'.

If you analyze an encrypted message and find that the most frequent letter in the ciphertext is 'X', it's highly probable that 'X' stands for 'E' in the original message. This gives you a clue about the shift key!

Let's write a function to count the frequency of each letter in a given text.

In [None]:
def calculate_letter_frequencies(text):
    frequencies = {}
    total_letters = 0
    for char in text:
        if char.isalpha():
            char = char.upper() # Convert to uppercase for consistent counting
            frequencies[char] = frequencies.get(char, 0) + 1
            total_letters += 1

    # Convert counts to percentages
    for char in frequencies:
        frequencies[char] = (frequencies[char] / total_letters) * 100

    # Sort by frequency (descending)
    sorted_frequencies = sorted(frequencies.items(), key=lambda item: item[1], reverse=True)
    return sorted_frequencies

# Example usage:
sample_text = "This is a sample text to analyze for letter frequencies."
sample_frequencies = calculate_letter_frequencies(sample_text)
print("Sample Text Frequencies:", sample_frequencies)

### English Letter Frequencies (Approximate)

Here are the approximate frequencies of letters in the English language. We can use this as a reference.

| Letter | Frequency (%) |
|---|---|
| E | 12.70 |
| T | 9.06 |
| A | 8.17 |
| O | 7.51 |
| I | 6.97 |
| N | 6.75 |
| S | 6.33 |
| H | 6.09 |
| R | 5.99 |
| D | 4.25 |
| L | 4.03 |
| U | 2.76 |
| C | 2.78 |
| M | 2.41 |
| W | 2.36 |
| F | 2.23 |
| G | 2.02 |
| Y | 1.97 |
| P | 1.93 |
| B | 1.29 |
| V | 0.98 |
| K | 0.77 |
| J | 0.15 |
| X | 0.15 |
| Q | 0.10 |
| Z | 0.07 |

### Applying Frequency Analysis

Let's take an encrypted message and calculate its letter frequencies. Then, we can compare them to the English frequencies to guess the shift.

In [None]:
encrypted_message_fa = "L fdph, L vdz, L frqtxhuhg."
encrypted_frequencies = calculate_letter_frequencies(encrypted_message_fa)

print(f"Encrypted Message: {encrypted_message_fa}")
print("Encrypted Frequencies:", encrypted_frequencies)

# The most frequent letter in English is 'E'.
# Let's assume the most frequent letter in our encrypted message corresponds to 'E'.
# In our example, the most frequent letter in 'L fdph, L vdz, L frqtxhuhg.' is 'L'.
# If 'L' corresponds to 'E', what is the shift?
# 'L' is the 11th letter (0-indexed) and 'E' is the 4th letter.
# Shift = (ord('L') - ord('E')) % 26 = (76 - 69) % 26 = 7 % 26 = 7
# Let's try decrypting with a shift of 7.

guessed_shift = 7
decrypted_fa_message = caesar_cipher(encrypted_message_fa, guessed_shift, 'decrypt')
print(f"Decrypted with guessed shift {guessed_shift}: {decrypted_fa_message}")

By comparing the most frequent letter in the encrypted text ('L') to the most frequent letter in English ('E'), we can deduce the shift. 'L' is 7 letters after 'E' in the alphabet, so a shift of 7 seems likely. When we decrypt with a shift of 7, we get a meaningful message!

### 🎯 Mini-Challenge: Break the Code with Frequency Analysis!

You've intercepted another message. Use frequency analysis to determine the most likely shift and decrypt the message.

```
"wkh qljkw lv brxqj"
```

<details>
  <summary>Hint: Calculate the frequencies first.</summary>
  Use the `calculate_letter_frequencies` function on the encrypted message.
</details>

<details>
  <summary>Hint: Compare the most frequent letter.</summary>
  Identify the most frequent letter in your encrypted message and assume it corresponds to 'E' (the most common letter in English). Calculate the shift based on this assumption.
</details>

In [None]:
secret_message_fa = "wkh qljkw lv brxqj"

# YOUR CODE HERE


<details>
  <summary>Click to see a possible solution</summary>

  ```python
  secret_message_fa = "wkh qljkw lv brxqj"

  # Calculate frequencies of the encrypted message
  encrypted_freq = calculate_letter_frequencies(secret_message_fa)
  print("Encrypted Frequencies:", encrypted_freq)

  # The most frequent letter in English is 'E'.
  # Let's assume the most frequent letter in our encrypted message corresponds to 'E'.
  # From the output of encrypted_freq, let's say the most frequent is 'K'.
  # If 'K' corresponds to 'E', what is the shift?
  # 'K' is the 10th letter (0-indexed) and 'E' is the 4th letter.
  # Shift = (ord('K') - ord('E')) % 26 = (75 - 69) % 26 = 6 % 26 = 6
  # Let's try decrypting with a shift of 6.

  # In this specific example, 'H' is the most frequent letter in "wkh qljkw lv brxqj"
  # If 'H' (7th letter) corresponds to 'E' (4th letter), the shift is (7-4) = 3.
  # Let's verify this with the caesar_cipher function.

  guessed_shift_fa = 3 # Based on 'H' -> 'E'
  decrypted_fa_message_challenge = caesar_cipher(secret_message_fa, guessed_shift_fa, 'decrypt')
  print(f"
Decrypted with guessed shift {guessed_shift_fa}: {decrypted_fa_message_challenge}")
  # Expected output: the night is young
  ```
</details>

## 👋 Conclusion

Congratulations, Agent! You've successfully completed your training in breaking the Caesar cipher. You've learned two powerful techniques:

*   **Brute-Force Attack:** Trying every possible key until the message makes sense. This works well for ciphers with a small number of possible keys.
*   **Frequency Analysis:** Using the statistical properties of language (like how often certain letters appear) to deduce the key. This is a classic code-breaking technique that is effective against substitution ciphers.

These methods are fundamental in the world of cryptography and cryptanalysis. While modern encryption methods are far more complex, the underlying principles of analyzing patterns and trying possibilities remain key to understanding how codes are made and broken.

### 🤔 Reflection Question:

Consider the brute-force and frequency analysis methods. In what scenarios would one be preferred over the other? Can you think of any situations where neither would work well?

### Key Takeaways

*   **Brute-Force:** Effective for ciphers with a limited number of keys. Automates trying all possible keys.
*   **Frequency Analysis:** Exploits the predictable patterns of letter distribution in a language to infer the encryption key.
*   **Dictionaries:** Useful for counting occurrences of items, such as letters in a text, to perform frequency analysis.
*   **Python's flexibility:** Loops, dictionaries, and string manipulation are powerful tools for cryptographic tasks.

### Next Up: Notebook 12.a: Functions, Sequences, and Plots 📈

In our next series of notebooks, we'll shift gears from cryptography to explore the fascinating world where programming meets mathematics and data! We'll start with [Notebook 12.a: Functions, Sequences, and Plots](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/12.a-functions-sequences-and-plots.ipynb) focusing on how we can use Python to understand and visualize mathematical functions and sequences. Get ready to plot some data!

[Return to Table of Contents](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/table-of-contents.ipynb)