**DeapSECURE module 5: Cryptography for Privacy-Preserving Computation (part A: Data Protection)**

# Session 2: AES Encryption and Decryption

Welcome to the DeapSECURE online training program!
This is a Jupyter notebook for the hands-on learning activities of the
["Cryptography" module](https://deapsecure.gitlab.io/deapsecure-lesson05-crypt/).
*The contents of this notebook are new; they will be integrated into that lesson module at a later time.*
Please visit the [DeapSECURE](https://deapsecure.gitlab.io/) website to learn more about our training program.

In this session, we will learn the technical know-how to store & represent data on computers.

<a id="TOC"></a>
**Quick Links** (sections of this notebook):

* 1 [Setup](#sec-Setup)
* 2 [A Simplistic Cipher - AES Library](#sec-AES_module)
* 3 [Using Pycryptodome Library](#sec-Pycryptodome)
* 4 [Encryption in Real World (Advanced)](#sec-Encryption_real)
* 5 [Encryption Key Cracking](#sec-Cracking)

<a id="sec-Setup"></a>
## 1. Setup Instructions

If you are opening this notebook from Wahab cluster's OnDemand interface, you're all set.

If you see this notebook elsewhere and want to perform the exercises on Wahab cluster, please follow the steps outlined in our setup procedure.

1. Make sure you have activated your HPC service.
2. Point your web browser to https://ondemand.wahab.hpc.odu.edu/ and sign in with your MIDAS ID and password.
3. Create a new Jupyter session with the following parameters: Python version **3.7**, Python suite `tensorflow 2.6 + pytorch 1.10`, Number of Cores **1**, Number of GPU **0**, Partition `main`, and Number of Hours at least **4**. (See <a href="https://wiki.hpc.odu.edu/en/ood-jupyter" target="_blank">ODU HPC wiki</a> for more detailed help.)
4. From the JupyterLab launcher, start a new Terminal session. Then issue the following commands to get the necessary files:

       mkdir -p ~/CItraining/module-crypt
       cp -pr /shared/DeapSECURE/module-crypt/. ~/CItraining/module-crypt

The file name of this notebook is `Crypt-session-2.ipynb`.

### 1.1 Reminder

* Throughout this notebook, `#TODO` is used as a placeholder where you need to fill in with something appropriate. 
* To run a code in a cell, press `Shift+Enter`.
* Use `ls` to view the contents of a directory.

Before we go into encryption and decryption, we define the `encode_int`, `decode_int` and `leftpad16` helper functions as we did in session 1:

In [None]:
def encode_int(C, minlength=16):
    """Encodes an arbitarily long integer into a `bytes` object.
    The minimum length is by default 16 bytes (128 bits)."""
    C_hex = hex(C)[2:]
    if len(C_hex) % 2:
        C_hex = '0' + C_hex
    C_bytes = bytes.fromhex(C_hex)
    if len(C_bytes) < minlength:
        # pad the left side with NULLs
        C_bytes = C_bytes.rjust(minlength, b'\x00')
    return C_bytes

In [None]:
def decode_int(B):
    """Decodes a `bytes` object into a long integer.
    This is the converse of the `encode_int` function."""
    return int(B.hex(), 16)

In [None]:
def leftpad16(B):
    """Pad a bytes array from the left with NULL chars so that
    the length is a multiple of 16 bytes."""
    padlength = len(B) % 16
    if padlength > 0:
        return (b'\x00' * (16 - padlength)) + B
    else:
        return B

<a id="sec-AES_module"></a>
## 2. A Simplistic Cipher: `aes.py`

The `aes` module implements a very basic AES cipher that can only encrypt and decrypt exactly sixteen bytes (no more, no less).
The module is written in pure Python.
It is so short and simple to read for educational purposes.
You are encouraged to read this module, located in your hands-on package:

    ~/CItraining/AES/aes.py

*(This module has been made available to you by loading the DeapSECURE module earlier.)*

First, we load `aes` library:

In [None]:
import aes

Let's do some encryption and decryption using this module to illustrate the workings of AES.

### 2.1 Define a Secret Key 
(*Ssssh, don't share this with anyone, okay?!*)

In [None]:
# The master key (a secret) must be less than 128 bits (16 bytes):
secret_key = 0x5e413c

# Initializing "E", the object that can perform the encrypting / decrypting:
E = aes.AES(secret_key)

### 2.2 Define a Plaintext to Encrypt

In [None]:
# You can change any plaintext with 16 bytes in hexadecimal
# This plaintext string must be under 16 letters:
plaintext_string = 'IdeaFusion'
plaintext = plaintext_string.encode() # utf-8 encoding

# construct a long integer out of the bytes
# because aes.py expects the data in that format for encryption and decryption:
plaintext_int = decode_int(plaintext)

print('The plaintext in bytes is:      ', plaintext)
print('The plaintext in decimal is:    ', plaintext_int)
print('The plaintext in hexadecimal is:', hex(plaintext_int))

**QUESTION**:

* Does the conversion from `b'IdeaFusion'` to a hex string `49646561467573696f6e` count as encryption?

### 2.3 Encrypt the Plaintext


The encrypt and decrypt functions in `aes.py` expect the data (plaintext and ciphertext) in the form of long (128-bit) integers.

In [None]:
ciphertext_int = E.encrypt(plaintext_int)

---
> **SIDEBAR**:
In a Jupyter notebook, help can be obtained by using `FUNCTION_NAME?` syntax.

In [None]:
E.encrypt?

In the example above, the documentation did not indicate the datatype of the `plaintext` argument.
However, the [test code](https://github.com/bozhu/AES-Python/blob/master/test.py) shows that the expected input and output are long integers.

---

**QUESTIONS:** How does the ciphertext look like?

* How does it look in decimal (long integer) form?
* How does it look in hexadecimal form?
* Does it look like a readable string of bytes?

In [None]:
"""Now print the ciphertext and try to make sense of it:""";

#TODO

**EXERCISE**: Convert the ciphertext into `bytes` and see if it is meaningful (tips: use the `encode_int` function).

In [None]:
"""Convert the ciphertext into bytes and print it:""";

#TODO

**QUESTIONS**:

* How long is the original message?
* How long is the encrypted message?
* Are they of the same length? Why, or why not?

### 2.4 Decrypt the Ciphertext with the Correct Key

Let's try to recover the original text:

In [None]:
decrypted_int = E.decrypt(ciphertext_int)
print("Decrypted text in hexadecimal:", hex(decrypted_int))

In [None]:
print("Decrypted text in bytes:", encode_int(decrypted_int))

**EXERCISE**: How do we remove the trailing bytes?

### 2.5 Decrypt the Ciphertext with a Wrong Key

**EXERCISE**: Create another object named `E2` with a wrong key, e.g. `0x27F6A123`.
Try recovering the message using that key--did it work?

In [None]:
"""Use this cell to work on the exercise above""";



Compare the last "decrypted" plaintext against the original plaintext.
Do they match?
Try decrypting again with a few different keys and check the result.

*HINT*: It is more convenient at this point if you create a Python function
to perform the necessary sequence to decrypt and print the resulting message.

---

> **THINK ABOUT IT!** What can we do to recover the original message if we lose the encryption key?
(We will come back to this shortly.)

---


### 2.6 Encrypt the Plaintext with a Different Key

**EXERCISE**:
Please try another experiment.
Re-encrypt the plaintext message using a few very similar keys as the original one, and compare the ciphertext.
Can you learn something from the new ciphertexts?

In [None]:
print("Original secret_key =", hex(secret_key))

In [None]:
key1 = secret_key + 1
key2 = secret_key + 2
key3 = secret_key + 3

print("Try encrypting with at least the following keys:")
print(hex(key1))
print(hex(key2))
print(hex(key3))

Try encrypting with the keys printed above.
Can you learn something from the new ciphertexts?
Do they bear any similarity to the original ciphertext (`ciphertext_bytes` or `hex(ciphertext_int)`)?

### 2.7 Performance Matters

How fast is the `aes.py`-based encryption procedure?
The following ipython magic runs the `E.encrypt` statement 1000 times in a loop, which is then repeated 10 times to get the statistics:

In [None]:
%timeit -n 1000 -r 10 E.encrypt(plaintext_int)

During our development phase on Wahab cluster, it took ~0.2 seconds to run 1000 encryption process. How about decryption?

In [None]:
%timeit -n 1000 -r 10 E.decrypt(ciphertext_int)

It may be slightly longer, but not much longer.

**QUESTIONS**:

Given the timing above, estimate how long it wil take to crack:

* an 8-bit key ($2^8$ = 256 combinations)?

* a 16-bit key (how many combinations)?

* a 32-bit key?

* a 128-bit key?

Assume we have to try all possible key combinations.

<a id="sec-Pycryptodome"></a>
## 3. Using PyCryptodome Library

*(This exercise will be left for your own exploration)*

Let's try using a different library to encrypt data. We use [PyCryptodome](https://pypi.org/project/pycryptodome/), which is a production-grade implementation of cryptographic algorithms.

> ### Install PyCryptodome
>
> If this is your first time uisng PyCryptodome, please install the `PyCryptodome` python package by uncommenting the following line and running it:

In [None]:
#! pip install PyCryptodome

In [None]:
from Crypto.Cipher import AES

*DO NOT be confused with the `aes.AES` earlier!*

Here, `AES` actually refers to the `Crypto.Cipher.AES` submodule,
whereas `aes.AES` refers to the simplistic class named `AES` inside the `aes` module.

The calling convention for the `PyCryptodome` module is different from that of `aes`:
`PyCryptodome` functions expects keys, input data and output data to be of `bytes` datatype instead of long integers.
Once you understand this, using `PyCryptodome` is quite easy:

In [None]:
# Generate the same secret key in the form of 16-byte:
secret_bkey = encode_int(secret_key)
print("secret_key in hexadecimal (128-bit):", secret_bkey.hex())
print("secret_key in bytes: ", secret_bkey)

# Create a new encryptor class
EE = AES.new(secret_bkey, AES.MODE_ECB)

> !!!**IMPORTANT SECURITY NOTE**!!!
>
> The AES object above was created with the *ECB* mode.
> This is ok only for playing with encryption here, but you do NOT want to use ECB for your real encryption work, as it is easy for snoopers to understand what you are doing, and it is relatively easy to crack ECB-encrypted messages.
> A [graphical illustration on Wikipedia](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_codebook_(ECB)) shows the point.
> With ECB, the encrypted "Tux" penguin picture can still be visually deduced, therefore not all information is hidden yet:
>
> ![Penguin picture encrypted with ECB](fig/Tux_ecb.jpg)
>
> (Credit: Wikipedia user "Lunkwill".
> Derived from the Tux picture by Larry Ewing, <lewing@isc.tamu.edu>, and The GIMP project.)


### 3.1 Encryption

In [None]:
ciphertext_bytes3 = EE.encrypt(leftpad16(b'IdeaFusion'))
ciphertext_bytes3

Compare this with the `ciphertext_bytes` earlier:

In [None]:
"""Print ciphertext_bytes and ciphertext_bytes3""";

#TODO

### 3.2 Decryption

In [None]:
decrypted_bytes3 = EE.decrypt(ciphertext_bytes3)

In [None]:
print(decrypted_bytes3)

### 3.3 Performance of PyCryptodome

Now, how fast is `PyCrypyodome`-based crypto procedure?

Encyrption:

In [None]:
%timeit -n 1000 -r 10 EE.encrypt(leftpad16(b'Idea Fusion'))

Decryption:

In [None]:
%timeit -n 1000 -r 10 EE.decrypt(ciphertext_bytes3)

**QUESTION**: 

* The PyCryptodome performance is at least 30x that of `aes.py`? Why?

To understand this, you must look at the source code.
AES encryption and decryption operations are compute-intensive, and they work on bit-by-bit level.
The `aes.py` is written in pure Python, therefore the performance is very low.
PyCryptodome, on the other hand, has the performance-sensitive encryption and decryption procedures written in C, therefore they can achieve near-peak performance of the machine.

On the other hand, `aes.py` is small and is guaranteed to work as long as you abide in its restrictions.
In a learning environment like this one, or in an extremely constrained environment where performance is not critical, a simplistic implementation can be helpful.

The PyCryptodome can encrypt longer strings with ease, as long as the message is padded correctly to the multiple of 16-bytes:

In [None]:
plain_bytes4 = leftpad16(b'The master key (a secret) must be less than 128 bits (16 bytes)')
print(len(plain_bytes4))
(plain_bytes4)

In [None]:
cipher_bytes4 = EE.encrypt(plain_bytes4)
cipher_bytes4

In [None]:
EE.decrypt(cipher_bytes4)

<a id="sec-Encryption_real"></a>
## 4. Encryption in Real World (Advanced)

In real world, data encryption is a serious business.
Some strong advise:

1. Stay with well-established algorithms and procedures.
   They have proven track record of security.

2. Do not try to roll your own encryption procedure unless you know exactly what you are doing.
   For sure, this is not something that a novice should do,
   because we can easily weaken encryption to the point that it can easily be broken.
   Remember that hackers have long arms and can figure things out even when we think we are secure enough.

3. Make sure that you are using up-to-date crypto tools and algorithms.
   Avoid using an algorithm or tool that is already known to have weaknesses.
   For example: Avoid using obsolete encryption scheme (such as 3DES), because they can easily be cracked.
   Another example: `PyCryptodome` is an up-to-date fork of the original crypto library called `PyCrypto`.
   The former is still being actively updated, whereas its parent has not been updated for six years (as of year 2020).

In Python, there is another highly respected crypto library named [cryptography](https://pypi.org/project/cryptography/).
Do look into that project as well to meet your need of data protection.

### 4.1 AES in Real World

Earlier we said that ECB is not the correct way to use AES in real-world applications, because it makes it easy for attackers to at least "see" the long-range structure of the encrypted data.
ECB is the "plain vanilla" AES without additional measures to safeguard the data against tampering or learning.
The advanced block cipher modes such as EAX, GCM, CCM, OCB augment the "vanilla" AES with authentication and block-chaining techniques.
We will demonstrate *EAX* below; for the rest, please read the recommended readings below.

In [None]:
EAX = AES.new(secret_bkey, AES.MODE_EAX)

In [None]:
EAX_nonce = EAX.nonce;
print("EAX.nonce = ", EAX_nonce)

Note: *Nonce* is a random number that is used only once in the encryption process.
A new EAX context will use a different nonce.

In [None]:
crypt_EAX = EAX.encrypt(leftpad16(b'The master key (a secret) must be less than 128 bits (16 bytes)'))
crypt_EAX

In [None]:
# Decryptor class
# (will need a separate class because the EAX method is not stateless)
DD_EAX = AES.new(secret_bkey, AES.MODE_EAX, nonce = EAX_nonce)

In [None]:
DD_EAX.decrypt(crypt_EAX)

**EXERCISE**:
Repeat the encrytion and decryption procedures above severial times.
Do you notice any difference compared to the previous ones?

In [None]:
EE_EAX2 = AES.new(secret_bkey, AES.MODE_EAX)

In [None]:
print("EE_EAX2.nonce = ", EE_EAX2.nonce)

In [None]:
crypt_EAX2 = EE_EAX2.encrypt(leftpad16(b'The master key (a secret) must be less than 128 bits (16 bytes)'))
crypt_EAX2

In [None]:
DD_EAX2 = AES.new(secret_bkey, AES.MODE_EAX, nonce=EE_EAX2.nonce)
DD_EAX2.decrypt(crypt_EAX2)

**IMPORTANT:** The EAX encryption always differs every time because it draws different `nonce`:

### 4.2 Recommended Readings

These are good references for people who want to practice encryption in real world.

* https://pycryptodome.readthedocs.io/en/latest/src/examples.html#encrypt-data-with-aes \
An example good practice to encrypt and decrypt data with AES.

* https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation \
Some explanation of different block cipher modes, including schematic explanation of these modes.

* https://blog.cryptographyengineering.com/2012/05/19/how-to-choose-authenticated-encryption/ \
Reasons to use authentication authentication encryption.

<a id="sec-Cracking"></a>
## 5. 3ncrypt1on K3y Cr4cking !!

Now comes the exciting part!
Let's play hacker and crack a given ciphertext.
How do we crack an encrypted message if we don't know anything about the key?
The only sure way to recover the message is to try all the possible values of the key, decrypt the message, then figure if the decrypted message makes sense.

For a fully unknown 128-bit key, we wil have to try $2^{128}$ combinations (over $3 \times 10^{38}$).
That is truly daunting!

In [None]:
# total combination of 128-bit key

"%g" % (2**128)

### How Difficult Is This Problem?

**QUESTION:**
Suppose a high-speed cracker can try thru all combinations of a 24-bit key in 5 minutes.
Then how many hours are required to crack an AES-128 key with N unknown bits?

* 32-bits = 256 &times; 5 minutes = 21 hours
* 48-bits = 65536 & times; 21 hours = 1.38 million hours = over 57,000 days!!

*Always do a back-of-envelope calculation like this before you launch something computationally expensive!*
You get the point... a full 128-bit key cracking is clearly out of reach even for today's mega-supercomputers.

### OUR CRACKING CHALLENGE

Here is the formal definition of our AES-128 cracking challenge.

#### Reduced Key Space

For the reason elaborated above, we significantly reduce the number of unknown bits in the AES-128 key space.
We prepare some challenges with increasing levels of difficulty:

| key bits  |  Starting key (hex)                | Ending key (hex)                   | Total combinations     |
|-----------|------------------------------------|------------------------------------|------------------------|
|       8   | `00000000000000000000000000000000` | `000000000000000000000000000000ff` | 256                    |
|      16   | `00000000000000000000000000000000` | `0000000000000000000000000000ffff` | 65,536                 |
|      20   | `00000000000000000000000000000000` | `000000000000000000000000000fffff` | 1,048,576              |
|      24   | `00000000000000000000000000000000` | `00000000000000000000000000ffffff` | 16,777,216             |
|      32   | `00000000000000000000000000000000` | `000000000000000000000000ffffffff` | 4,294,967,296          |

The "starting key" and "ending key" above defines the smallest and largest numerical values to try for the AES-128 key.
By 32 bits, there are already nearly 4.3 billion combinations to try, so it requires HPC to solve it as soon as possible.

#### Plaintext Message

The secret plaintext messages are known to contain *only* letters (A-Z, a-z).
No whitespaces, no numbers, no symbols.

#### Verifying Your Cracker

Here are some ciphertexts with cracked messages to verify that your _cr4cker w0rks (!!)_

| key bits | ciphertext (hex)                   | plaintext                    |
|----------|------------------------------------|------------------------------|
|        8 | `f163060b1e6c68753c637854b838609e` | `IHaveNoIdea`                |
|        8 | `d3bbd04550497f943f6bf4c9e2291993` | `CongratsYouDidIt`           |
|       16 | `c7c88a08cd82ef5e8afb051e9cebe122` | `AESBetterThanDES`           |

## Rules of the Game

* The goal of the game is to crack the encryption key and the secret message in the shortest wallclock time possible. The solution which gets to recover the correct plain texts in the shortest time is the winner. 

* You must write your own program. You cannot use ready-made programs such as Johnny, John the Ripper, hashcat, or the like.

* You can program the cracker in any programming language.

* You may use any optimizing tools, parallelization libraries (e.g. MPI) or language constructs, etc.

* You can call any AES library, including but not limited to the `aes.py` and PyCryptodome. You can write your own AES function, if you like.
