<a href="https://colab.research.google.com/github/sadmozer/caesar-cipher-key-finder/blob/main/Caesar_Cipher_key_finder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Caesar Cipher key finder
----------

First, we have to encrypt an example message

## Encrypt the message

In [1]:
import string

alphabet = list(string.ascii_lowercase)

def encode(text, rot):
    out = []
    for c in list(text):
        if c.islower() and c in alphabet:
            # The character is lowercase
            pos_in_alphabet = alphabet.index(c)
            pos_shifted = (pos_in_alphabet + rot) % 26
            out.append(alphabet[pos_shifted])
        elif c.lower() in alphabet:
            # The character is uppercase
            pos_in_alphabet = alphabet.index(c.lower())
            pos_shifted = (pos_in_alphabet + rot) % 26
            out.append(alphabet[pos_shifted].upper())
        else:
            # Other characters
            out.append(c)
    return "".join(out)

Now we can encrypt any text

In [2]:
from ipywidgets import widgets, interactive

output_area = widgets.Textarea(layout={'height':''}, placeholder="Output")
input_text = widgets.Textarea(layout={'height':'150px'}, placeholder="Any text you like")
input_rotation = widgets.IntSlider(min=0, max=25, step=1, value=0)
def text_encoder(text, rot):
    output_area.value = encode(text, rot)

input_area = interactive(text_encoder, text=input_text, rot=input_rotation)
widgets.HBox([input_area, output_area])

HBox(children=(interactive(children=(Textarea(value='', description='text', layout=Layout(height='150px'), pla…

## Finding the key using cross-entropy

Given two discrete distributions $p$, $q$ we define the **cross-entropy** of $q$ relative to $p$ over the set $X$:

$H(p, q) = -\sum_{x\in X}{} p(x)\space log(q(x))$

In [3]:
import numpy as np

def cross_entropy(p, q):
    return -np.sum(p*np.log(q, where=q>0, out=np.zeros(q.size)))

This is the frequency of the 26 letters in the English alphabet. For example, given any English text if i took one letter at random that choice would be, with a probability of 8.167%, the letter *a*.

In [4]:
frequencies_english = np.array([0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015,
    0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406,
    0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056,
    0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074])

This particular distribution will be our $q$.

The distribution $p$ is very similar to $q$: the only thing that changes is that it is the frequency of the letters of a **text** and not of all the existing English words.

In [5]:
def count_occurrencies(text):
    occ_in_text = [0 for i in range(26)]

    for c in list(text):
        if(c.lower() in alphabet):
            occ_in_text[alphabet.index(c.lower())] += 1
    return occ_in_text

def decode(text):
    entropies = []
    for k in range(26):
        curr_text = encode(text, k)
        occ = np.array(count_occurrencies(curr_text))
        tot_occ = np.sum(occ)
        frequencies_in_text = np.vectorize(lambda x: x/tot_occ)(occ)
        entropies.append(cross_entropy(frequencies_in_text, frequencies_english))
    min_entropy = min(entropies)
    prediction = entropies.index(min_entropy)
    return prediction

For an English text of *moderate length* (at least 5 characters) the algorithm should work

In [6]:
output_area2 = widgets.Textarea(layout={'height':'150px'}, placeholder="Output")
def text_decoder(text):
    if(text):
        prediction = decode(text)
        output_area2.value = encode(text, prediction)
        print(f"Predicted key: {26-prediction}")
    else:
        output_area2.value = ""

input_text2 = interactive(text_decoder, text=widgets.Textarea(layout={'height':'150px'}, placeholder="Enter here the encrypted message"))

widgets.HBox([input_text2, output_area2])

HBox(children=(interactive(children=(Textarea(value='', description='text', layout=Layout(height='150px'), pla…