In [1]:
from frequency_analysis import FrequencyAnalysis
fa = FrequencyAnalysis()

In [2]:
"""
One can call score_string() with the string in question as the argument. 
A higher number (they will all be negative, it's a log probability) has an increased chance of being english. 
Numbers above -12 are good contenders
"""

print(fa.score_string('now is the winter of our discontent'))
print(fa.score_string('fdhui feh wjkg dsh klfd'))

-9.073
-18.413


In [3]:
"""
As a demo, suppose you have found this: dcrtjedcpbxscxvwisgtpgn

And the puzzle suggests there's a three-letter rotation cipher. The strategy will be to try all 18k options.

First set up some code for the rotation
"""

def rot_char(c,n):

    # Keep unknown characters unchanged
    if c == '?':
        return '?'

    index = ord(c) + n
    if chr(index) > 'z':
        index -= 26
    return chr(index)

def rot_string(s,n):
    cipher = ''
    for c in s:
        cipher += rot_char(c,n)
    return cipher

def multi_rot_string(s,n):
    cipher = ''
    num_shifts = len(n)
    for i, c in enumerate(s):
        shift = n[i % num_shifts]
        cipher += rot_char(c, shift)
    return cipher

In [4]:
"""
We now build a list of all the possible contenders
"""

s_list = []
for i in range(26*26*26):
    s = multi_rot_string('dcrtjedcpbxscxvwisgtpgn', [(i//(26*26))%26, (i//26)%26, i%26])
    s_list.append(s)


In [5]:
"""
And list them by what is more likely to be english. The 'true' answer is ahead by a long way
"""

fa.rank_strings(s_list, num_to_output=5)

onceuponamidnightdreary -10.9
ongeutonemihnikhthreery -12.7
plcfspplangdoggirdscasw -13.0
ofcempofamadnaghldrwarq -13.5
onfeusondmignijhtgredry -13.6


In [6]:
"""
Note that it will still find the answer if you have a few errors, even with unlikely letters appearing.

Try again with three random errors

dcrtjedcpbxscxvwisgtpgn -> dcrbjedcpbxsaxvwisgvpgn
"""

s_list = []
for i in range(26*26*26):
    s = multi_rot_string('dcrbjedcpbxsaxvwisgvpgn', [(i//(26*26))%26, (i//26)%26, i%26])
    s_list.append(s)

fa.rank_strings(s_list, num_to_output=5)


oncmuponamidlightdrgary -12.4
ongmutonemihlikhthrgery -13.2
phgnotphenchmckinhsaess -13.4
phanonphyncbmceinbsayss -13.5
fundbafuldpocpryaoinlif -13.5


In [7]:
"""
It also handles unknown characters. In this case, use a '?' where you don't know where the letter is.

Here's an example where we're missing four letters:
"""

s_list = []
for i in range(26*26*26):
    s = multi_rot_string('dc?tjedcp?x?cxvwis?tpgn', [(i//(26*26))%26, (i//26)%26, i%26])
    s_list.append(s)

fa.rank_strings_missing(s_list, num_to_output=5)

on?eupona?i?nightd?eary -8.6
on?eutone?i?nikhth?eery -9.7
by?rfabyl?t?atrueo?plej -10.0
oy?eftoye?t?ntkheh?perj -10.1
ol?estole?g?ngkhrh?cerw -10.1
