In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("final-project-hill.ipynb")

# Final Project: Hill cipher

This choice for the final project will focus on the [Hill cipher](https://en.wikipedia.org/wiki/Hill_cipher).

From Wikipedia:
>In classical cryptography, the Hill cipher is a polygraphic substitution cipher based on linear algebra. Invented by Lester S. Hill in 1929, it was the first polygraphic cipher in which it was practical (though barely) to operate on more than three symbols at once.

You will investigate how to create the key and implement the encryption / decryption algorithms in Python.

## Import your toolkit

You are provided the following functions to help complete this assignment:
* `text_clean(<string>, <LETTERS>)`
* `text_block(<string>, <blocksize>)`
* `multiplicative_inverse(<integer>, <modulus>)` which returns `False` if no inverse exists

Run the cell below to import these assignments

In [None]:
from finaltoolkit import text_clean, text_block, multiplicative_inverse

## The Encryption key

The Hill cipher uses a square [matrix](https://en.wikipedia.org/wiki/Matrix_(mathematics)) as the key, typically with dimensions of either 2x2 or 3x3, but the process can be generalized to an $n$x$n$ matrix.

There is a package, `numpy` which allows you to use Python to perform many common matrix operations. Run the cell below to load `numpy`. You will soon learn how to use this package to perform matrix operations.

In [None]:
import numpy as np

### Question 1: Creating a square matrix using `numpy`

`numpy` uses a new data type called an `array` to work with matrices. There are a few different ways you can create an array object.

Each of the following code examples will create the array that represents the matrix $A$ shown below:

$$A = \begin{bmatrix}
   1 & 2 & 3 \\
   4 & 5 & 6 \\
   7 & 8 & 9
\end{bmatrix}
$$

The first way is to pass a list of lists to `np.array()` where each of the sublists represents a row of the matrix.

In [None]:
np.array([[1,2,3],[4,5,6],[7,8,9]])

The second way is to create a single row of the values in order you want the values to appear from left to right, top to bottom. Then, use the `reshape(<row>, <column>)` method to resize those values into the correct dimensions.

In [None]:
np.array([1,2,3,4,5,6,7,8,9]).reshape((3,3))

Assign the following variables as `numpy` arrays that match their mathematical definitions:

$$A = \begin{bmatrix}
   6 & 0 \\
   14 & 22 
\end{bmatrix}
$$

$$B = \begin{bmatrix}
   2 & 5 & 7 & 1 & 2 \\
   4 & 2 & 8 & 6 & 3 
\end{bmatrix}
$$

In [None]:
A = ...
A

In [None]:
B = ...
B

In [None]:
grader.check("q1")

### Question 2: Validating keys

Matrices are only valid for use as a key in the Hill cipher if that matrix has an inverse in a given modulus. Not all matrices will!

Those matrices with an inverse in a given modulus meet the following criteria:
* They are square (they have the same number of rows and columns)
  * You can ask `numpy` to tell you the shape of a matrix by calling `A.shape`. It will return a tuple/list that contains the row and column size. You can access them seperately by using an index (e.g. `A.shape[0]` is the row size, `A.shape[1]` is the column size.)
* They have a [determinant](https://en.wikipedia.org/wiki/Determinant) that is non-zero 
  * The determinant can be found using `numpy` with the `np.linalg.det(A)` function
* They have a determinant that has a multiplicative inverse in the given modulus (length of alphabet)
  * The multiplicative inverse can be found using the `multiplicative_inverse` function provided in the toolkit

**Note:** Determinants of matrices that only contain integers will always be an integer value. However, sometimes `numpy` will have strange calculation errors that result in a float that is very close to an integer. For example

In [None]:
np.linalg.det(A)

Use the `round` function to ensure that any weird computation artifacts are handled by rounding the float to an integer.

In [None]:
round(np.linalg.det(A))

Write a function `valid_key` that accepts a matrix (numpy array) and returns a boolean value `True` if the key is value and `False` if the key is invalid.

Examples
```python
>>> valid_key(np.array([[3,6],[1,3]]))
True

>>> valid_key(np.array([[3,6,1],[1,3,2]]))
False

>>> valid_key(np.array([[6, 24, 1],[13, 16, 10], [20, 17, 15]]))
True

In [None]:
def valid_key(A, LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    
    ...

In [None]:
grader.check("q2")

### Question 3: Creating a key

Typically a key is created by first choosing a keyword or phrase that is turned into numerical values and stored into the matrix. For example, the keyword `PLAY` becomes the array:

$$\begin{bmatrix}
   P & L \\
   A & Y 
\end{bmatrix}=\begin{bmatrix}
   15 & 11 \\
   0 & 24 
\end{bmatrix}$$

Write the function `keygen` which:
* takes in a keyword with a length that's a peffect square (e.g 4, 9, 16, 25, ...) and cleans it using the alphabet defined in `LETTERS`
* uses keyword length to determine the size of the square matrix that should be created to be used with the Hill cipher

You can assume that all cleaned keywords will be perfect squares in length, and therefore correctly fill a square matrix. You will check if the generated key is valid later on in your encryption program and should not test for key validity in this function.

**Hint**: Try creating a "flat" version of the key (essentially a single row array with all the correct numbers in order) and then use the `reshape` method to turn it into the correctly sized square array.

Examples:

```python
>>> keygen('HELPABCD', LETTERS='ABCD')
array([[0, 1],
       [2, 3]])

>>> keygen('HELP')
array([[ 7,  4],
       [11, 15]])

>>> keygen('GYBNQKURP')
array([[ 6, 24,  1],
       [13, 16, 10],
       [20, 17, 15]])
```

In [None]:
def keygen(keyword, LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    
    ...

In [None]:
grader.check("q3")

## The Hill Cipher

To encrypt a message with the Hill cipher, break the message into blocks of each containing `n` characters, where `n` is also the size of your square key matrix. Write each block of characters as a single column matrix with `n` entries. Then, multiply this matrix by your key, and reduce modulo 26. The result will be a `n` x 1 matrix that represents the ciphertext version of the block. Repeat until the entire message is encrypted. 

### Special Text Preparation
The Hill cipher requires that all messages complete "full" matrices so the key matrix can be multiplied with every block of the message. If your last block of the message doesn't have enough characters to create a full `n` x 1 matrix, pad the message with the character `X` until it's long enough to create a full block as a matrix.


**Example:**

Create a 2 x 2 key:

$$K = \begin{bmatrix}
   3 & 6 \\
   1 & 3 
\end{bmatrix}
$$

Using the plaintext message, `ET PHONE HOME`, create 2 x 1 matrices that numerical represent the message:

$$\begin{bmatrix}
   E  \\
   T 
\end{bmatrix}, \begin{bmatrix}
   P  \\
   H 
\end{bmatrix}, \begin{bmatrix}
   O  \\
   N 
\end{bmatrix}, \begin{bmatrix}
   E  \\
   H 
\end{bmatrix}, \begin{bmatrix}
   O  \\
   M 
\end{bmatrix}, \begin{bmatrix}
   E  \\
   X 
\end{bmatrix}
$$

$$\begin{bmatrix}
   4  \\
   19 
\end{bmatrix}, \begin{bmatrix}
   15  \\
   7 
\end{bmatrix}, \begin{bmatrix}
   14  \\
   13
\end{bmatrix}, \begin{bmatrix}
   4  \\
   7 
\end{bmatrix}, \begin{bmatrix}
   14  \\
   12 
\end{bmatrix}, \begin{bmatrix}
   4  \\
   23 
\end{bmatrix}
$$

**Note:** an `X` was added to the end of the message so it would completely fill the last 2x1 matrix.

Then, multiply the key by each plaintext matrix to create the ciphertext. The first result is shown below:

$$\begin{bmatrix}
   3 & 6 \\
   1 & 3 
\end{bmatrix} \begin{bmatrix}
   4  \\
   19 
\end{bmatrix}= \begin{bmatrix}
   3 \cdot 4 + 6 \cdot 19 \\
   1 \cdot 4 + 3 \cdot 19
\end{bmatrix} = \begin{bmatrix}
   126 \\
   61 
\end{bmatrix} = \begin{bmatrix}
   22 \\
   9 
\end{bmatrix} = \begin{bmatrix}
   W \\
   J 
\end{bmatrix}
$$

The full ciphertext would be: `WJ JK QB CZ KY UV`

To decrypt the message, repeat the process using the inverse of the key matrix.

### Question 4: Cleaning the message

When encrypting a message, the Hill Cipher requires that each block of characters fill the `n` x 1 matrix. 

Write a function `hill_textclean` that:
* first cleans the provided message in the usual way (comparing against LETTERS and only keeping the characters that are already in LETTERS) and then
*  checks if the provided `message` is long enough to fill the final block. If `message` would not create full final block, add the correct number of `X`s to the end of the message so it would.

This function should work for blocks of any size, not just 2 or 3.

Examples:

```python
>>> hill_textclean('ABDFGHABBABBDBBBCBBDF', 4, LETTERS='ABCX')
'ABABBABBBBBCBBXX'

>>> hill_textclean('ET PHONE HOME', 2)
'ETPHONEHOMEX'

>>> hill_textclean('et phone home', 5)
'ETPHONEHOMEXXXX'
```

In [None]:
def hill_textclean(message, block_size, LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    
    ...

In [None]:
grader.check("q4")

### Inverse matrices

Constructing the multiplicative inverse of a matrix, $K$ can easily be done using `numpy` with the command `np.lingalg.inv(K)`.  However, `numpy` is not concerned about ensuring the resulting matrix contains only integer values. 

For example:

In [None]:
K = np.array([[3, 6], [1,3]])
np.linalg.inv(K)

Since we want each entry in our keys and messages to be integers (since ultimately the result of their calculations will need to be integers that represent characters) we'll need another way to compute the inverse that ensure the final calculation yields integer values. Let's dig into how an inverse matrix is created to learn how this works.

The inverse of a square matrix, $A$, is computed by calculating the determinant of the matrix ($\det{A}$) and multiplying it's reciprocal by the adjugate matrix ($\text{adj}(A)$). If the input matrix is comprised of integers, then the adjugate matrix is guaranteed to contain only integers as well.

$$A^{-1} = \frac{1}{\det{A}} \text{adj}(A) = (\det{A})^{-1} \text{adj}(A) $$

We've seen that `numpy` can compute the determinant with the `np.linalg.det()` function. Unfortunately, `numpy` doesn't have a way to directly compute the adjugate matrix. However, we can "trick" it into computing one. Rearranging the equation from above:

$$\text{adj}(A) = \det(A) A^{-1}$$



In [None]:
K = np.array([[3, 6], [1,3]])
K_adj = np.linalg.det(K) * np.linalg.inv(K)
K_adj

**Note:** To ensure that each element is an integer (notice the . after each number indicating the computation resulted in a float), you must convert the numbers matrix to integers by first using the `.round()` command, followed by the `.astype(int)` command. This will be important since we want to ensure that our final calculations are considered integers by Python, since we plan to use the final calculated numbers as indices in the LETTERS alphabet to help us retrieve the correspond character, and this requires the numbers to be integers, not floats. Converting to integers now will ensure all subsequent calculations remain integers as well.

See below:

In [None]:
K_adj = K_adj.round().astype(int)
K_adj

To finish the process off, you'll need to compute the determinant, convert it to an integer using the `round` command, and then find it's multiplicative inverse in the correct modulo.

In [None]:
determinant = round(np.linalg.det(K))
det_inv = multiplicative_inverse( determinant, 26 )
det_inv

Lastly, multiply the multiplicative inverse of the determinant, `det_inv` by the adjugate matrix, `K_adj`, and reduce by the appropriate modulo.

In [None]:
K_inv = (det_inv * K_adj) % 26
K_inv

We can confirm that `K` and `K_inv` are inverses of each other by finding their dot product and verifying that it's equivalent to the identity matrix, $I$, which contains the number 0 everywhere except on the diagonal from the top left corner to the bottom right corner whose elements are 1. The size of the identity matrix is the same as the key. So, for our example:

$$
I = \begin{bmatrix}
   1 & 0 \\
   0 & 1 
\end{bmatrix}
$$

In [None]:
np.dot(K, K_inv) % 26

### Question 5: Compute the "Hill Inverse"

Write the function `hill_inverse` that:
* computes the inverse of a provided square matrix, `K`, in the provided modulus `n`
* returns the inverse matrix from the function

You can assume that only valid keys will be passed to this function.

Examples:

```python
>>> hill_inverse(np.array([[3, 6], [1,3]]), 26)
array([[ 1, 24],
       [17,  1]])

>>> hill_inverse(np.array([[3, 6], [1,3]]), 13)
array([[ 1, 11],
       [ 4,  1]])
       
>>> hill_inverse(np.array([[6, 24, 1], [13, 16, 10], [20, 17, 15]]), 26)
array([[ 8,  5, 10],
       [21,  8, 21],
       [21, 12,  8]])
```

In [None]:
def hill_inverse(K, n):
    
    ...

In [None]:
grader.check("q5")

### Using `numpy` to perform multiplication

The process of multiplying two matrices together is also known as the **dot product** between two matrices rows and columns. You can learn more about matrix multiplication by hand by watching [this video](https://www.youtube.com/watch?v=sYlOjyPyX3g&t=496s). Fortunately, `numpy` can perform the operation very quickly using the `np.dot()` function. You can then use the mod operation, `%` to perform the mod operation on each element in the matrix.

In [None]:
K = np.array([[3, 6], [1,3]])
block = np.array([[4,19]]).reshape(2,1)

ciphertext_numerical = np.dot(K,block) % 26
ciphertext_numerical

To access each element in the array individual, you can use indexing (just like with lists of lists) to specify exactly which element to retrieve. For example, to access `22` you should specify the location of row 0, column 0:

In [None]:
ciphertext_numerical[0][0]

And to retrieve 9, you should specify row 1, column 0:

In [None]:
ciphertext_numerical[1][0]

### Question 6: Encryption / Decryption

At this point you should now be able write a function `hill` that implements the Hill cipher.

The function should:
* accept a keyword (`str`), block_size (`int`), and message (`str`)
* an optional boolean `decrypt` that when set to `True` decrypts the message, and when set to `False` decrypts the message.
* an optional string `LETTERS` that can be used to specify that alphabet to use when cleaning and encrypting/decrypting the message.
* return a decrypted plaintext as a lowercase string and return an encrypted ciphertext as an uppercase message blocked into character groups with length of 5

Examples:
```python
>>> hill('DGBD', 2, 'ETPHONEHOME', LETTERS='ABCDEFGHIJKLMNOQRSTUVWXYZ')
'VIFZN AFZKZ'

>>> hill('DGBD', 2, 'WJJKQ BCZKY UV', decrypt=True)
'etphonehomex'

>>> hill('GYBNQKURP', 3, 'ACT')
'POH'
```

In [None]:
def hill(keyword, block_size, message, decrypt=False, LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ'):

In [None]:
grader.check("q6")

## Ciphertext analysis

There is a file included with this notebook that contains the first chapter of Pride and Prejudice. These responses will only be graded for accuracy once you've submitted your assignment, so make sure you are certain of your answers before submitting!

Run the cell below to load the chapter to the string named `plaintext`

In [None]:
with open('pride-prejudice-chapter-01.txt') as f:
    plaintext = f.read()

<!-- BEGIN QUESTION -->

### Question 7: Creating a bar chart

Encrypt the plaintext using a keyword of `PRIDEABCD`, and block size `3`, then create a bar chart that shows the single character frequencies of each of the 25 English characters used in the message.

In [None]:
ciphertext = ...


<!-- END QUESTION -->

### Question 8: Index of Coincidence

Write a function `index_of_coincidence` to compute the index of coincidence of the ciphertext. You should compute the "exact" version of the IoC, not the quicker approximation.

**Hint:** Reference Lesson 17 if you need a refresher!

In [None]:
def index_of_coincidence(message, LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    
    ...

In [None]:
grader.check("q8")

<!-- BEGIN QUESTION -->

### Question 9: Classifying the cipher

The Hill cipher is what's known as a polygraphic cipher, which we have not yet covered in this cousre.. It's not a monoalphabetic substitution cipher or a polyalphabetic substitution cipher. However, which of those does it seem to behave like? Describe the reasoning for your answer below, including specific details from your index of coincidence calculation and bar chart to back up your response.

_Type your answer here, replacing this text._

<!-- END QUESTION -->



## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit.

In [None]:
grader.export(pdf=False, force_save=True)