## Mono-alphabetic Substitution

Table of Contents
* [Introduction](#introduction)
* [Decipher Strategies](#strategies)
* [Examples (with solving steps)](#examples)
* [Practice problems](#practice)
* [Python Cipher/Decipher (generate your own samples)](#python)  


<a id='introduction'></a>

### Introduction

Ciphers with mono-alphabetic substitution use a cipher table where each letter in plain text is replaced by a different letter in cipher text; the letters have one-to-one correspondence. 

For example, here is a cipher table generated randomly,

```
Plain:   ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher:  JMNBOEZRGWXPQCYKSTFDALVIUH
```

and a plain text "Codebuster is fun!" can be encrypted as, with 'C'->'N', 'o'->'y' and so,  

```
Plaintext:   Codebuster is fun!
Ciphertext:  Nybomafdot gf eac!
```

With a decipher table, which is the inverse of the cipher table, 

```
Cipher:  ABCDEFGHIJKLMNOPQRSTUVWXYZ
Plain:   UDNTFSIZXAPVBCELMHQRYWJKOG
```

the decryption can be easily done, following the same alphabetic substitution procedure, with 'N'->'C', 'y'->'o', and so on. 

Of course, the decipher table is not provided in most cases, which is your task to find out. 

Mono-alphabetic ciphers may contain spaces (Aristocrats) or may have spaces removed (Patristocrats). Mono-alphabetic ciphers may use K1, K2, or random alphabets as defined by the ACA. 

Fully random alaphbets are difficult to decipher. Usually, some hints or keywords are provided. For example, in K1, K2, a keyword is provided in the cipher/decipher table, located randomly, while the rest letters follow the Caesar Cipher (with a shift). 


<pre>
K1: Plaintext alphabet contains keyword; Ciphertext alphabet normal.
Pt: <b>p o u l t r y</b> a b c d e f g h i j k m n q s v w x z
CT: R S T U V W X Y Z A B C D E F G H I J K L M N O P Q

K2: Pt alphabet normal; CT alphabet contains keyword.
Pt: a b c d e f g h i j k l m n o p q r s t u v w x y z
CT: V W X Z <b>K E Y B O A R D</b> C F G H I J L M N P Q S T U
</pre>

In Codebusters competitions, the types of problems may include (see each year's manual)

1. Aristocrates with a hint - messages with spaces included
2. Aristocrates - messages with spaces included, but without a hint
3. Aristocrates - messages with spaces and hints, but including spelling/grammar errors
4. Aristocrates - messages with spaces and including spelling/grammar errors but no hints
5. Patristocrats with a hint - message with spaces removed, and with a hint
6. Patristocrats - messages with spaces removed, but without a hint
7. Xenocrypt - plain text in foreign languages, commonly, Spanish 

<a id='strategies'></a>

### Codebusters strategies

There are some common strategies for mono-alphabetic substitution ciphers,

* [Identify The Smallest Words First](#small_words)
* [Look for contractions](#contractions)
* [Identify Common Pairs Of Letters](#repeated_pairs)
* [Look for repeated blocks](#repeated_blocks)
* [Look for other patterns](#other_pairs)
* [Use the letter frequency table ](#letter_frequency)

These strategies help you to figure some common patterns and letters. With more letters deciphered, you gain more hints to figure out the rest. 

In general, you need to practice a lot, to familiar with the different strategies, memorize certain patterns, and form your own strategy. 

<a id='small_words'></a>

#### Identify The Smallest Words First
Look for single-letter words at first; these are almost always be 'A' or 'I'. 

The most common two-letter words are 
```
of, to, in, it, is, 
be, as, at, so, we, 
he, by, or, on, do, 
if, me, my, up, an, 
go, no, us, am.
```
Note that for vowels, 'a', 'i', 'u' ('hi'?) only appear in the beggining; 'e' only shows up in the end; 'o' can be both. 

The most common three-letter words are 'the' and 'and'.

<a id='contractions'></a>

#### Look for contractions
If an apostrophe is seen in the ciphertext, it can be an easy way to start deciphering using the table below.

|Endings |	Examples|
|:--|:----|
|'T|	Won't, Don't, Isn't, Aren't, Weren't, Shouldn't, Couldn't, Didn't, Can't |
|'S|	He's, She's, It's, Who's, There's, That's |
|'D|	I'd, He'd, She'd, We'd, They'd, You'd |
|'M|	I'm |
|'RE|	You're, They're, We're |
|'VE|	They've, You've, We've |
|'LL|	I'll, He'll, She'll, We'll, They'll, It'll, Who'll |

<a id='repeated_pairs'></a>

#### Identify Common Pairs Of Letters
In English the most common repeated letters are (in frequency order) ll, ee, ss, oo, tt, ff and mm. If the ciphertext contains any repeated characters, you can assume that they represent one of these. 

Exceptions but also frequently used include 
```
pp: apply appeal appear suppose happy cappuccino sloppy
gg: egg, aggressive goggle juggle
rr: array hurry arrive ferry arrange arrival current starring correct
...

```

Commonly used three-words with repeating letters: 
```
all, too, off, see, bee, add, fee, zoo, ill, egg, inn, odd, ...
```

Commonly used four-letter words with repeating letters: 
```
ball been beep beer beet bell book boom boot bull butt 
call cell cook cool coon
dekk doll doom door
fall feed feel fell feet food fool foot full fuss
gall gull 
hall heed heel hell hill hood hoof hoop hoot hull
jeep
keel keen keep knee
less
mall mood
need
pall pass peed peek peel pill poll poof pool poor
reed reef reel roll rood room root
sass seed seek seem seen seep seer sell sill soon soot
tall teed teem teen tell toll tool
wall watt weed week weep well
```
More words with repeating letters can be found at [yourdictionary.com](https://grammar.yourdictionary.com/word-lists/words-with-double-letters.html). 


Other cases include the doubling the final consonant in verbs, for example, 

```
dropped, dropping, 
planned, planning, 
swimming, 
... 
```
Here, the ending 'ed' or 'ing' could also provide additional hints. 


<a id='repeated_blocks'></a>

#### Look for repeated blocks 

The repeat block with three letters is often 'THE': 'THE' is the most common english word, used almost twice as often as the second most common, 'BE'.


Most used digraphs, in order of frequency, are
```
th er on an re he in ed nd ha at en es of or nt ea ti to it st io le is ou ar as de rt ve
```
 

Most used trigraphs, in order of frequency, are 
```
the and tha ent ion tio for nde has nce edt tis oft sth men
```

<a id='other_patterns'></a>

#### Look for other patterns
Common word patterns include

```
axx- all or too
xaxb- away, even, ever
xa xb- it is (at the beginning of questions, is it)
xax... at the beginning of a word is probably eve(ry)
...xxa at the end of a word is possibly lly
axbycxy- science
abxcx- there (most common) where or these
xabx- that (most common) high or dead
axbcx- which
xyaxby- people
xayybxx- success (abcddxxyxy- succeeded).
```
Keep in mind less common words may occupy these frequencies. For example xyaxby could be "indian."


<a id='letter_frequency'></a>

#### Use the letter frequency table 

Most used English letters: 
```
E T A O I N S H R D L U
```

Most used initial letters: 
```
T O A W B C D S F M R H I Y E G L N P U J K
```

Most used final letters: 	
```
E S T D N R Y F L O G H A K M P U W
```

If possible, make a table of letter frequencies to the cipher text, (as well as the decipher table). 

In [this example](Example1.ipyhb), we count the frequency and make a table, 

|Ct  | A| B| C| D| E| F| G| H| I| J| K| L| M| N| O| P| Q| R| S| T| U| V| W| X| Y| Z|
|----|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|
|Freq| 0| 0| 3| 8| 6| 5| 1| 3| 0| 1| 0| 5| 9|13| 8| 1| 7| 1| 7| 2| 2| 5| 7| 0| 1| 0|
|Pt  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |


Since 'E' is the most used English letter, very likely, it's among 'NMDO', with high counts. Indeed, it's 'M' in the decipher.  


<a id='examples'></a>
### Examples 

#### K1: Plaintext alphabet contains keyword; Ciphertext alphabet normal.

1. From [The Cryptogram by ACA](https://www.cryptogram.org/wp-content/themes/wp-opulus-child/images/SampleCryptogram.pdf)
```
A-1. City living. K1 [90] PRIME
OB ISZDPH *GQG EFBE KZE NZUZPJ SQQO ZE EQ EOFNN AKFA BQT YFP'A EKQTA
FA AKD YFA VZAKQTA JDAAZPJ F OQTAKITN QI KFZS.
```
see [solving steps](K1Example.ipynb)


2. From toebes.com
```
MQKAI FXLA MVRUI DRBQ BQI DXAUN.
RB'M K MFVHXU XO OARIWNMQRY KWN YIKJI.
```
see [solving steps](https://toebes.com/Ciphers/Solving%20a%20K1%20Alphabet.htm)

#### K2: Plaintext alphabet normal; CT alphabet contains keyword.

1. From The Cryptogram by ACA
```
A-2. Identity crisis? K2 [96] (XVEWL) WABBIT
JAVGX GZJTAT *HZPRBZJ, QAABJI RPZQC EZJ RKNNGA KLL IZPMZIA RPSEF,
NBEFQ BR SN, QCKSRBJI, "CAX GZTX! XKS TPKNNAT XKSP NSPQA!"
```
see solving steps (TBD)


#### Aristocrats

1. messages with spaces included, but without a hint
From [Prob 1 Country-wide SO Practice - 11-07-2020](https://scilympiad.com/data/org/sopractice/public/CodeBustersC.Key.pdf) 
```
DV QWTM LM GR QHMO NFD WSS VJMC QHMO NL QNDMC WOE NL VSEMC WSS FHND FNLM N QWD UNOENOP LYDMSU WOE N N ENEOF TOVQ N QWD SVDF                                                  
```
see [solving steps](Example1.ipynb)


2. more examples (TBD)


<a id='practice'></a>

### More problems for practice

* [Practice sets from scilympiad.com](https://scilympiad.com/sopractice/Docs/UsefulDocs)
* [Solve a cipher by ACA](https://www.cryptogram.org/resource-area/solve-a-cipher/)

You could also use the python routines below to generate more samples. 

<a id='python'></a>

### Python routines

Several python routines are provided for different tasks. Check the [script](MonoAlphabeticCipher.py) for available routines and implementation details. 

In [2]:
# import the package
from MonoAlphabeticCipher import *

Keyword is updated to remove duplicate letters FAIRLDY
Plain:   WXZFAIRLDYBCEGHJKMNOPQSTUV
Cipher:  ABCDEFGHIJKLMNOPQRSTUVWXYZ


In [3]:
### K1 Cipher generator
K1CipherDict = createK1CipherDict(keyword = "fairlady")
printCipherTable(K1CipherDict)

Keyword is updated to remove duplicate letters FAIRLDY
Plain:   PQSTUVWXZFAIRLDYBCEGHJKMNO
Cipher:  ABCDEFGHIJKLMNOPQRSTUVWXYZ


In [5]:
### Fully random cipher generator
MyCipherDict = createCipherDict()
printCipherTable(MyCipherDict)

Plain:   ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher:  PIKVUSBWENMZALHDJQYXOGCTRF


In [7]:
### encrypt the plain text from the above generated cipher table

plaintext = "Many of life's failures are people who did not realize how close they were to success when they gave up."
K1Ciphertext = encrypt(plaintext, K1CipherDict)
print("Ciphertext with K1 Cipher Dict: ", K1Ciphertext.upper())
MyCiphertext = encrypt(plaintext, MyCipherDict)
print("Ciphertext with the random Cipher Dict: ", MyCiphertext.upper())


Ciphertext with K1 Cipher Dict:  XKYP ZJ NLJS'C JKLNEMSC KMS ASZANS GUZ OLO YZD MSKNLIS UZG RNZCS DUSP GSMS DZ CERRSCC GUSY DUSP TKFS EA.
Ciphertext with the random Cipher Dict:  APLR HS ZESU'Y SPEZOQUY PQU DUHDZU CWH VEV LHX QUPZEFU WHC KZHYU XWUR CUQU XH YOKKUYY CWUL XWUR BPGU OD.


In [8]:
### if you know the cipher table, you can decipher

# create a decipher table from the cipher table
K1DecipherDict = inverseCipherDict(K1CipherDict)
# use it to decrypt 
K1Decipheredtext = decrypt(K1Ciphertext, K1DecipherDict)
# print the result 
print("The deciphered text with K1: ", K1Decipheredtext)

The deciphered text with K1:  Many of life's failures are people who did not realize how close they were to success when they gave up.


#### Brute-force decipher

Even without the decipher table, it is rather easy for computer to crack the Mono-alphabetic Substitution ciphers. We can simply iterate over all possible cipher tables to check which one produces all (or some percentage to allow spelling errors) vocabulary words. (TBD)  

