# Breaking Rectangular Transposition

G-test: Does some observed distribution actually come from a theoretical/expectected distribution?
- Demographics
- Ecology
- Codebreaking (Rect Trans!)

Q: Basic Freq Analysis will NOT work for Rect Transposition! Why?
- Transposition cipher shuffles letters around based on position
- Transposition cipher just moves letters around --> relative frequencies are retained but DO NOT HELP (ciphertext match plaintext)

## Strategy

Period = |keyword|
- must be divisor of |ciphertext|

    $\text{Period} \in \{\text{Unique factors of |ciphertext|}\}$

> Guess: period from factorlist of |ciphertext|

For each `i != j`, look at the `ith` and `jth` columsn in rectangle
- Do these pairs match what's expected of plaintext English?

$$
G^{(i, )} = 2\sum_{\alpha\beta} O_{\alpha\beta}^{(i, j)} ln(\frac{O_{\alpha\beta}^{(i, j)}}{E_{\alpha\beta}})
$$
- $\sum$: Over all pairs of letters in English (26*26)
- $O$: # of times (i, j) pair appears in column
- $E$: # of times (i, j) pair appears in sample plaintext

In [21]:
from functools import reduce
from math import log, exp
import numpy as np


def factor_list(n):
    return set(reduce(list.__add__,
                ([i, n//i] for i in range(1, int(n**0.5)+1) if n % i == 0)))

In [7]:
cipher = "IDRNUEGHWTEHLOOUFDLARLAKDSADONLUDENASDYSHITENUATMUTNFHOAEERYNWETHLHCOEHUSUDPNOPGSRSIEYVLLENOITWEHHAESVNIEBHDEAAEPSNGSNAIELNOORNOSHCEAKBOTRUHSGAIHLNUAGDRYRLYERTATRCOAUFONCATYNRLDTEAHNTFGDONMULYEFSHATESDSAEHTSFHOEEVNEDIGRNNEOWWIIHNTWVEOIEFHMTNEACLYHLHOEOSOUHFSEUNRKOITWOHNTOIWWUABTSHWTTIIHFRELSGITEMSOPEFHBTDULIISNAEGONEFSUISFNAFRBELLGOEEOPRMEVDDAPMSIYIRTSINAISYEUFRFEALFBHOTERLFEIEANWSGEURLNEIVDENBAYYHOTAFLTAFHAPESLBUALRCEEABPUEOSCEISTIETMNWETINHTWIHHCTEHDMNUILSALUCYEERSIEEVTVNHEEETRSTNSNERAUATALMGIFEOTSEHDSETOAELEOTRRLRBEIOIOKLPEUODENHSTECNBEREOEFPMUOEENHMTHEEORAUENSEDHSTLIPEMDLNSAECPFAUETRAFEOTSOHDMEUANPIHOTENABEKLLWLSANUOTPAHVCEEATYNKEIELDWNOIPWUOSENFWAKRNSAEEGSDUADPNFOAENIWHTWUERNTFKODSYEAECEDRETTSIHWTAUTNEEDPRSRSIEFOOSNWOLHUIIHCCOACMNEPRTAEOOANLRHYTSSNAENTOMIPOERRROELPAYHNTHTTEOEATRFADEMRHOTEFERVLEULRPEPOOINHUTEMTBTEIPRASLTENOIREEYVLDYIAHFTEEEHDOIRUDOSNPIGPOOFFFVTEEHHITELARWSECAIINSNSAEKSNIISNAIGNCEIKFNOTGEHHAENRAUTDNEERDEEDMRRAIESNSOEOFHUTWGTHHNIHOCDGAIOFNOTGMHIAEAGNTICINOOTUDOLRRUETOITANTUHOGEFHSTIULMBAEHTWIWSTAUIASPOETTDKHNWIWHTAATSTHIOASUTRNEVNEEMIDENHCTEOTMNTPAILFOOTNOHHUEFSOUERSEIHSTAAWTMSEYLRALYOISLNEULNBOOCURGLIRDLAPEPHWTTIHHSAEYDWFOIACENASHTTWCODRPEUODANESMNIODPDEEIRFWSOADRETCLOALFKBCUATPNHOSENAUFTSAIRCOYTCCNLOOUINSTTAWHEHLBINEODYBDUTORTEEHCAEORNMIABNTOSIEOVRFMYIPSALNTELUAORCBETJISHCWVHAEHPTEOHOWRFESTUAHCFETFUIGSNLSILTATENHSAYILTSFHOOIPWSIELEROSMNANGOSCRIEADNTOSIOBYNERDUDOHETIPSTAPWIOSBSRLIEECFETLHETADETMRAFEIFDNEETRAARNREGMNETTFHOREATPLIUACFROTSCHSEEFNOTEEHDTESALOIEFHPTUITRCUEOLWSDEUBCFIIFTETONIMDFOPYREOPRASHNTANOLIIAHTTISEACPCAFIYOTRRORSUOFLWRIPEMOSINSAADCNGTNUITPNHODIIESEARIIMNDYESHREOHTTEOCPEIROPTUIISRNBAKFBOKLCAAUNLRDAITRDANHTTILYNAUURFNDFELLRUTESHBTEYLDELWAIGNNZDAEGWDONDWBTIUSTAHHEUDRDNEEMVTOEHRLRLIIHNTAGFNEOBPRUOEENHRTDEOEMDLEALNNIVDEETDRGIAEMTSFHOAERYGGSDEETADHNAEHSGTTYRLTESEENMADSVTEAHTCNAAYNEEDELKWIOIDWNVSEENERHLTIESNSSTIMHIASONGNFLOIOMNOROPOWEPSDOYTMSOAEFSLUOORJSNFOOEMWEETKISSPPORROITREERDROUIKSCHHRAEEDENBOOEFNOMBOYMNOPCOAINNBSNOIOYODHMBTAUENYAYARHDSPEASLIESNDUCORETLSMAIETNEEGLTAHTROEEWVRELHDAAYTLREHECEAIDENMSAITDPATANFROTTOHCUEYNRATTLTEEORRMFWHMHIIIHNCWISITYLLIDRMOTPTUAENUNTRADEAAHTDITMFEONDHOTEOARHNTRAESPLOARNYELTPSHMGEEAEVVNIECDNEFEOUROSVTAIAGNTOTIRHWIESTRPEOOEFKTAUECIBDLOLYLNIOESFSNAETMIADSLEODRRCWIHHROPEPDSEHSNIADMNOAEFEANSRSTEIDORTSEEEMAESSIBHAETNSDDNEIIEHSDYOLPNOESNRRAFILWEDINVTAIHFEOAWMTEPTGTNBIEYHCTRHEFEEUNSLMSFYOISCEOOTSMYLELEATVAIIFOOHNAIMLSIAYTDTWSHANEANMNEIWRHHCAIHLTILDSNMAMUHOCARWSEDSIIASTATWPHAPENAETRRHATETTAWHWETINITHSHURQEEHSWITLCALHDOEMWREOONOOFRMIHSTEOAINTIADANRCODCLIGYNYOEEBRDOTFTHIHWTWAIHLSILTSCNIOEDRDERAEYVGSNUISLRUANMOSM"

c_len = len(cipher)
f'possible factors: {[f for f in factor_list(c_len) if f <= 26]}'

'possible factors: [1, 2, 3, 5, 6, 9, 10, 15, 18]'

In [18]:
period = 5

def get_columns(text, col_size):
    return np.array([[cipher[5*i:5*i+5]] for i in range(c_len//col_size)])

cipher_cols = get_columns(cipher, period)

In [32]:
# Taking a look at a smaller set of rectangles from the input ciphertext
sample = cipher_cols[:5, :]
i, j = 1, 4

[[w[0][i], w[0][j]] for w in sample]

# Hmm, derivation is taking too long
# I'll revisit this later when I have time

[['D', 'U'], ['G', 'T'], ['H', 'O'], ['F', 'A'], ['L', 'D']]

## Observations

Does this G-box look like we guessed the period right?

For each row (except one), there is one # that is much less than every other
- $G^(i, j)$ being much smaller than the other numbers suggests that `i` and `j` are consecutive
- The freq. dist. of pairs that occur in the `[C/p, 2]` rectangle is close to the frequency dist of pairs occurring in the English plaintext

Example:

```
row 1: 1,5 are consecutive
row 2: 2,4 are consecutive
row 3: all big --> very last position in the permuatation
row 4: 4,1 are consecutive
row 5: 5,3 are consecutive

             v start here and work backwards
[2, 4, 1, 5, 3]
```



In [36]:
[f for f in factor_list(10000) if f <=26]

[1, 2, 4, 5, 8, 10, 16, 20, 25]

By hand: 
```
10000
2 (5000)
2 2 (2500)
2 2 5 (500)
2 2 5 5 (100)
2 2 2 5 5 (50)
2 2 2 5 5 5 (10)
2 2 2 2 5 5 5 (2)
2 2 2 2 2 5 5 5

2, 4, 8, 16,
5, 10, 20, 25
```