<a href="https://colab.research.google.com/github/john94501/aoa-python/blob/main/Section_4a.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Academy of Alameda Python 4a

## Ciphers

Modern cryptography, the art of encrypting and decrypting messages, data etc, is well beyond the scope of this course, suffice to say modern computers are capable of things that the code breakers from WWII could not even imagine.

Code breaking, however, is something we can explore, using simpler ciphers than even the Enigma machine. Some of these ciphers go back several thousand years.

### The Ceasar Cipher

One of the simplest ciphers to understand was used by Julius Caesar, the Roman Emperor, to encrypt messages sent to his army. It was almost certainly used well before that time, but his use of it made it famous, and so it is named after him.

The Caesar Cipher is a substitution cipher. It has been widely used for masking text, even relatively recently. Some of the early Internet applications (before the World Wide Web was event thought of) used a variation of it called ROT-13.

The idea is simple. Each letter of the alphabet is replaced by one a number of letters later. That number is called the shift. Caesar, apparently, used a shift of 3, so A became D, B became E, C became F and so on. The last three letters wrap around, so X becomes A, Y becomes B and Z becomes C.

Let's write a Python function to encrypt a string using a given shift value:

Before we start, we will need to know a few more things:

### Character Encoding

In the world of computers, the characters you type are all encoded as numbers. The common ones are encoded by integers in the range 32 to 126. That includes the upper case letters (A to Z), the lower case letters (a to z), the numeric digits (0 - 9), and all the punctuation and symbols on the keyboard.

There are two functions in Python that let us switch between the numeric encoding (that can use arithmetic operators on) and the characters themselves (that we can build into strings):

`ord()` turns a character from a string into a number

`chr()` turns one of those numbers back to a character

The good news is, that while 'A' is not 1, the letters A to Z are encoded in order, so we can still use simple addition to apply the shift value.

### Accessing Characters In a String

If you recall, strings can be sliced using the `[]` notation. They can also be iterated over using for / in - try it:

In [None]:
for letter in "Here is a string":
  print(letter)

### Comparing Characters

We can also compare one character to another, so we can see, for example, whether our shifted letter is after 'Z' and needs to be wrapped back to the start of the alphabet like this - try it out with different letters in the first line:

In [None]:
letter=ord('X') + 3
if chr(letter) > 'Z':
  letter = letter - 26   # Back to the start of the alphabet
print(chr(letter))

A


### Putting it all Together

Now write a Python function to encrypt some text, and try it with the block below once you think you have it working. I've added some comments as hints.

In [None]:

# Casear Cipher Encrypt

def caesar_encrypt(plaintext, shift):

  # Always a good idea to validate our input
  # Check the shift is not less than 0 or greater than 25
  # If it is, print a message and return None

  # Create an empty string to build the result in
  encrypted = ""

  # Only work in upper case: convert the plain text to upper case.
  # Hint: the .upper() function converts a string to upper case

  # Loop through each character in plaintext

    # If this character is not A-Z, just leave it untouched

    # Otherwise:

      # Convert the character to a number & add the shift

      # If the new character is after 'Z', subtract 26

      # Append to 'encrypted'

  return encrypted

In [None]:
encrypted = caesar_encrypt("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 3)
print(encrypted)




## Challenges

1. See if you can create something that can decrypt this cipher. 

2. Can you see why ROT-13 was a popular variant of this cipher? (Hint: what is special about a shift of 13)

In [None]:
# Caesar Cipher Decrypt

## Substitution Ciphers

The Caesar cipher is a special variant of a more generic group of ciphers known as substitution ciphers. In a substitution cipher, as you may have guessed, one letter of the plaintext alphabet is substituted for another letter. In the Caesar cipher, the substitutions were made by just shifting the alphabet, but more complex ones can be made using random substitutions.

For example,

```
Plaintext:  ABCDEFGHIJKLMNOPQRSTUVWXYZ
Substitute: QWERTYUIOPASDFGHJKLZXCVBNM
```

See if you can write a function that will encrypt a message using this substitution.

In [1]:
# Substitution Cipher Encrypt

def substitution_encrypt(plaintext):

  subs="QWERTYUIOPASDFGHJKLZXCVBNM"

  # Create an empty string to build the result in
  encrypted = ""

  # Only work in upper case: convert the plain text to upper case.

  # Loop through each character in plaintext

    # If this character is not A-Z, just leave it untouched

    # Otherwise:

      # Convert the character to an index into the substitution string
      # Hint: you can subtract ord('A') to get back to 0 to 25

      # Check it is in bounds (just in case)


  return encrypted

Test it here:

In [3]:
print(substitution_encrypt("ABCDEFGHIJKLMNOPQRSTUVWXYZ"))
print(substitution_encrypt("The quick brown fox jumped over the lazy dog."))


QWERTYUIOPASDFGHJKLZXCVBNM
ZIT JXOEA WKGVF YGB PXDHTR GCTK ZIT SQMN RGU.


### Challenge

Create a function that can decrypt these messages.


In [None]:
# Substitution Cipher Decrypt

def substitution_decrypt(encrypted):

  subs="QWERTYUIOPASDFGHJKLZXCVBNM"

  # Build a reverse substitution string
  # Hint: Need to use a list here as we can't assign
  #       to individual characters in a string
  reverse = [" "] * 26
    
  # Create an empty string to build the result in

  # Only work in upper case: convert the plain text to upper case.

  # Loop through each character in encrypted string

    # If this character is not A-Z, just leave it untouched

    # Otherwise:

      # Convert the character to an index into the substitution string

      # Check it is in bounds (just in case)

  return plaintext

## Deciphering

Creating our own ciphers and deciphering them is not too complex, but suppose you didn't know the shift number, or the substitutions, or even what technique was used to encrypt the messages you were listening to on the radio? That was the challenge that faced the code-breakers in Bletchley Hall back in the second world war. They designed & built their computers to help decipher messages they did not have the key to.

### Breaking Simple Ciphers

If you know the language the text is written in, and you suspect a substitution cipher, then you can use some properties of the language to make some educated guesses about some of the letters. In English, we know the most common letters are E, T, A, I, O, N, S, H, and R.

But we can go further than that and look at the most commonly found letters at the start and ends of words. At the beginning of words, the most common letters are T, A, O, D, and W. And the most common endings are E, S, D, and T.

Here's a block of encrypted text. Let's see if we can break the cipher using this information:

> ZNSQJCGE BMN BKG ZCYDSNU QGW WSZCYDSNCGE JDST CU GBJ JBB ZBTYVSO, AMJ UMYYBUS LBM WCWG'J FGBK JDS UDCXJ GMTASN, BN JDS UMAUJCJMJCBGU, BN SISG KDQJ JSZDGCHMS KQU MUSW JB SGZNLYJ JDS TSUUQESU LBM KSNS VCUJSGCGE JB BG JDS NQWCB? JDQJ KQU JDS ZDQVVSGES JDQJ XQZSW JDS ZBWS-ANSQFSNU CG AVSJZDVSL DQVV AQZF CG JDS USZBGW KBNVW KQN. JDSL WSUCEGSW & AMCVJ JDSCN ZBTYMJSNU JB DSVY WSZCYDSN TSUUQESU JDSL WCW GBJ DQIS JDS FSL JB.

In addition to the clues in the letter frequency, can you see any other things in that block of text that might help you guess letters?

In [4]:
encrypted="ZNSQJCGE BMN BKG ZCYDSNU QGW WSZCYDSNCGE JDST CU GBJ JBB ZBTYVSO, AMJ UMYYBUS LBM WCWG'J FGBK JDS UDCXJ GMTASN, BN JDS UMAUJCJMJCBGU, BN SISG KDQJ JSZDGCHMS KQU MUSW JB SGZNLYJ JDS TSUUQESU LBM KSNS VCUJSGCGE JB BG JDS NQWCB? JDQJ KQU JDS ZDQVVSGES JDQJ XQZSW JDS ZBWS-ANSQFSNU CG AVSJZDVSL DQVV AQZF CG JDS USZBGW KBNVW KQN. JDSL WSUCEGSW & AMCVJ JDSCN ZBTYMJSNU JB DSVY WSZCYDSN TSUUQESU JDSL WCW GBJ DQIS JDS FSL JB."