There are two types of symmetric keys: stream and block ciphers. 

- Stream ciphers operate on data streams, i.e. one byte at a time. 
- Block ciphers operate on blocks of data, typically 16 bytes at a time. 

The most common block cipher and the standard one you should use unless you have a very good reason to use another one is the AES block cipher, also documented in FIPS PUB 197. 

AES is a specific subset of the Rijndael cipher. AES uses block size of 128-bits (16 bytes); data should be padded out to fit the block size - the length of the data block must be multiple of the block size. For example, given an input of ABCDABCDABCDABCD ABCDABCDABCDABCD no padding would need to be done. However, given ABCDABCDABCDABCD ABCDABCDABCD an additional 4 bytes of padding would need to be added.

> A common padding scheme is to use 0x80 as the first byte of padding, with 0x00 bytes filling out the rest of the padding. With padding, the previous example would look like: ABCDABCDABCDABCD ABCDABCDABCD\x80\x00\x00\x00.

Let's define our padding as a function

In [1]:
 def pad_data(data):
    # return data if no padding is required
   if len(data) % 16 == 0: 
       return data
   # subtract one byte that should be the 0x80
   # if 0 bytes of padding are required, it means only
   # a single \x80 is required.
   padding_required     = 15 - (len(data) % 16)
   data = '%s\x80' % data
   data = '%s%s' % (data, '\x00' * padding_required)
   return data
def unpad_data(data):
     if not data: 
         return data
  
     data = data.rstrip('\x00')
     if data[-1] == '\x80':
         return data[:-1]
     else:
         return data

Encryption with a block cipher requires selecting a block mode. By far the most common mode used is cipher block chaining or CBC mode or counter CTR mode(See cryptography engineering). 

Other modes include cipher feedback (CFB), and the extremely insecure electronic codebook (ECB). CBC mode is the standard and is well-vetted, so I will stick to that in this tutorial. Cipher block chaining works by XORing the previous block of ciphertext with the current block. You might recognise that the first block has nothing to be XOR’d with; enter the **initialisation vector**. This comprises a number of randomly-generated bytes of data the same size as the cipher’s block size. This initialisation vector should random enough that it cannot be recovered.

One of the most critical components to encryption is properly generating random data. Fortunately, most of this is handled by the PyCrypto library’s _Crypto.Random.OSRNG module_. You should know that the more entropy sources that are available (such as network traffic and disk activity), the faster the system can generate cryptographically-secure random data. I’ve written a function that can generate a nonce suitable for use as an initialisation vector. This will work on a Linux machine; the comments note how easy it is to adapt it to a Windows machine. This function requires a version of PyCrypto at least 2.1.0 or higher.

In [2]:
import Crypto.Random.OSRNG.posix as RNG
  
def generate_nonce():
    """Generate a random number used once."""
    return RNG.new().read(AES.block_size)



> I will note here that the python random module is completely unsuitable for cryptography (as it is completely deterministic). You shouldn’t use it for cryptographic code.

Symmetric ciphers are so-named because the key is shared across any entities. There are three key sizes for AES: 128-bit, 192-bit, and 256-bit, aka 16-byte, 24-byte, and 32-byte key sizes. Instead, we just need to generate 32 random bytes (and make sure we keep track of it) and use that as the key:

In [3]:
KEYSIZE = 32
def generate_key():
     return RNG.new().read(KEY_SIZE)

We can use this key to encrypt and decrypt data. To encrypt, we need the initialisation vector (i.e. a nonce), the key, and the data. However, **the IV isn’t a secret**. When we encrypt, we’ll prepend the IV to our encrypted data and make that part of the output. We can (and should) generate a completely random IV for each new message.

In [4]:
import Crypto.Cipher.AES as AES 
def encrypt(data, key):
     """
     Encrypt data using AES in CBC mode. The IV is prepended to the
     ciphertext.
     """
     data = pad_data(data)
     ivec = generate_nonce()
     aes = AES.new(key, AES.MODE_CBC, ivec)
     ctxt = aes.encrypt(data)
     return ivec + ctxt
 
 
def decrypt(ciphertext, key):
     """
     Decrypt a ciphertext encrypted with AES in CBC mode; assumes the IV
     has been prepended to the ciphertext.
     """
     if len(ciphertext) <= AES.block_size:
         raise Exception("Invalid ciphertext.")
     ivec = ciphertext[:AES.block_size]
     ciphertext = ciphertext[AES.block_size:]
     aes = AES.new(key, AES.MODE_CBC, ivec)
     data = aes.decrypt(ciphertext)
     return unpad_data(data)

However, this is only part of the equation for securing messages: AES only gives us confidentiality. Remember how we had a few other criteria? We still need to add **integrity and authenticity** to our process. 

Readers with some experience might immediately think of hashing algorithms, like MD5 (which should be avoided since is broken =.=) and SHA. The problem with these is that they are malleable: it is easy to change a digest produced by one of these algorithms, and there is no indication it’s been changed. 

Furthermore, here we need, a hash function that uses a key to generate the digest; the one we’ll use is called HMAC (which is a general framework). We do not want the same key used to encrypt the message; we should have a **new, freshly generated key** that is the same size as the digest’s output size (although in many cases, this will be overkill).

In order to encrypt properly, then, we need to modify our code a bit. The first thing you need to know is that HMAC is based on a particular SHA function. Since we’re using AES-256, we’ll use SHA-512 (Remember the birthday attack~). We say our message tags are computed using HMAC-SHA-512. This produces a 48-byte digest. Let’s add a few new constants in, and update the KEYSIZE variable:

In [5]:
__aes_keylen = 32
__tag_keylen = 64
KEYSIZE = __aes_keylen + __tag_keylen
__TAG_LEN = __tag_keylen

In [6]:
import Crypto.Hash.HMAC as HMAC
import Crypto.Hash.SHA512 as SHA512
 
def new_tag(ciphertext, key):
     """Compute a new message tag using HMAC-SHA-512."""
     return HMAC.new(key, msg=ciphertext, digestmod=SHA512).digest()

Here’s our updated encrypt function:

In [7]:
def encrypt(data, key):
    """
    Encrypt data using AES in CBC mode. The IV is prepended to the
    ciphertext.
    """
    data = pad_data(data)
    ivec = generate_nonce()
    aes = AES.new(key[:__aes_keylen], AES.MODE_CBC, ivec)
    ctxt = aes.encrypt(data)
    tag = new_tag(ivec + ctxt, key[__aes_keylen:]) 
    return ivec + ctxt + tag

Decryption has a snag: what we want to do is check to see if the message tag matches what we think it should be. However, the Python == operator stops matching on the first character it finds that doesn’t match. This opens a verification based on the == operator to a **timing attack**. We’ll use the streql package (i.e. pip install streql) to perform a constant-time comparison of the tags.

In [8]:
import streql
def verify_tag(ciphertext, key):
    """Verify the tag on a ciphertext."""
    tag_start = len(ciphertext) - __taglen
    data = ciphertext[:tag_start]
    tag = ciphertext[tag_start:]
    actual_tag = new_tag(data, key)
    return streql.equals(actual_tag, tag)

In [9]:
def decrypt(ciphertext, key):
     """
     Decrypt a ciphertext encrypted with AES in CBC mode; assumes the IV
     has been prepended to the ciphertext.
     """
     if len(ciphertext) <= AES.block_size:
         return None, False
     tag_start = len(ciphertext) - __TAG_LEN
     ivec = ciphertext[:AES.block_size]
     data = ciphertext[AES.block_size:tag_start]
     if not verify_tag(ciphertext, key[__AES_KEYLEN:]):
         return None, False
     aes = AES.new(key[:__AES_KEYLEN], AES.MODE_CBC, ivec)
     data = aes.decrypt(data)
     return unpad_data(data), True

We could also generate a key using a passphrase; to do so, you should use a key derivation algorithm, such as PBKDF2. A function to derive a key from a passphrase will also need to store the salt that goes with the passphrase. PBKDf2 takes three arguments: _the passphrase, the salt, and the number of iterations to run through_. The currently recommended minimum number of iterations in 16384; this is a sensible default for programs using PBKDF2.

> What is a salt? A salt is a randomly generated value used to make sure the output of two runs of PBKDF2 are unique for the same passphrase. Generally, this should be a minimum of 16 bytes (128-bits).

Here are two functions to generate a random salt and generate a secret key from PBKDF2:

In [10]:
import pbkdf2 

def generate_salt(salt_len):  
    """Generate a salt for use with PBKDF2."""
    return RNG.new().read(salt_len)

def password_key(passphrase, salt=None):
    """Generate a key from a passphrase. Returns the tuple (salt, key)."""
    if salt is None:
        salt = generate_salt(16)
    passkey = pbkdf2.PBKDF2(passphrase, salt, iterations=16384).read(KEYSIZE)
    return salt, passkey

Keep in mind that the salt, while a public and non-secret value, must be present to recover the key. To generate a new key, pass _None_ as the salt value, and a random salt will be generated. To recover the same key from the passphrase, the salt must be provided (and it must be the same salt generated when the passphrase key is generated). As an example, the salt could be provided as the first len(salt) bytes of the ciphertext.

That should cover the basics of block cipher encryption. We’ve gone over key generation, padding, and encryption/decryption.

### ASCII-Armouring

I’m going to take a quick detour and talk about ASCII armouring. If you’ve played with the crypto functions above, you’ll notice they produce an annoying dump of binary data that can be a hassle to deal with. One common technique for making the data a little bit easier to deal with is to encode it with base64. There are a few ways to incorporate this into python: {Absolute Base64 Encoding}. The easiest way is to just base64 encode everything in the encrypt function. Everything that goes into the decrypt function should be in base64 - if it’s not, the base64 module will throw an error: you could catch this and then try to decode it as binary data.

#### A Simple Header

A slightly more complex option, and the one I adopt in this tutorial, is to use a \x00 as the first byte of the ciphertext for binary data, and to use \x41 (an ASCII “A”) for ASCII encoded data. This will increase the complexity of the encryption and decryption functions slightly. We’ll also pack the initialisation vector at the beginning of the file as well. Given now that the IV argument might be _None_ in the decrypt function, I will have to rearrange the arguments a bit; for consistency, I will move it in both functions. My modified functions look like this now:

In [20]:
def encrypt(data, keyin, armour=False):    
    """
    Encrypt data using AES in CBC mode. The IV is prepended to the
    ciphertext.
    """
    salt, key = password_key(keyin)
    data = pad_data(data)
    ivec = generate_nonce()
    aes = AES.new(key[:__aes_keylen], AES.MODE_CBC, ivec)
    ctxt = aes.encrypt(data)
    tag = new_tag(ivec+ctxt, key[__aes_keylen:])
    if armour:
        return '\x41' + (ivec + ctxt + tag).encode('base64')
    else:
        return '\x00' + ivec + ctxt + tag
      
def decrypt(ciphertext, keyin):
    """
    Decrypt a ciphertext encrypted with AES in CBC mode; assumes the IV
    has been prepended to the ciphertext.
    """
    salt, key = password_key(keyin)
    if ciphertext[0] == '\x41':
        ciphertext = ciphertext[1:].decode('base64')
    else:
        ciphertext = ciphertext[1:]
    if len(ciphertext) <= AES.block_size:
        return None, False
    tag_start = len(ciphertext) - __tag_len
    ivec = ciphertext[:AES.block_size]
    data = ciphertext[AES.block_size:tag_start]
    if not verify_tag(ciphertext, key[__aes_keylen:]):
        return None, False
    aes = AES.new(key[:__aes_keylen], AES.MODE_CBC, ivec)
    data = aes.decrypt(data)
    return unpad_data(data), True

There are more complex ways to do it (and you'll see it with the public keys latter) that involve putting the base64 into a container of sorts that contains additional information about the key.

In [21]:
key = 'star'
plaintext = 'AG is god'
ciphertext = encrypt(plaintext, key)

In [23]:
print decrypt(ciphertext, key)

NameError: global name '__TAG_LEN' is not defined

In [11]:
import sys
import os
import math
from collections import Counter

%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
from Crypto.Cipher import AES

In [12]:
IV_SIZE = 16
BLOCK_SIZE = 16

In [None]:
def fn_entropy(s):
    p, lns = Counter(s), float(len(s))
    return -sum( count/lns * math.log(count/lns, 2) for count in p.values())

In [None]:
def encrypt_image(cipher_mode):
    """Encrypt an image file and write out the results as a JPEG."""

    input_image = Image.open(os.getcwd() + '/yuwen.jpg')
    print input_image
    plt.figure(figsize=(16,6))
    plt.subplot(121)
    plt.imshow(input_image)

    # Key must be one of 16/24/32 bytes in length.
    key = "0123456789ABCDEF"
    if cipher_mode == 'ECB':
        mode = AES.MODE_ECB
    elif cipher_mode == 'CBC':
        mode = AES.MODE_CBC
    elif cipher_mode == 'CFB':
        mode = AES.MODE_CFB
    elif cipher_mode == 'OFB':
        mode = AES.MODE_OFB
    iv = os.urandom(IV_SIZE)

    aes = AES.new(key, mode, iv)

    image_string = input_image.tobytes()
    # The input string must be padded to the input block size.
    image_padding_length = BLOCK_SIZE - len(image_string) % BLOCK_SIZE
    image_string += image_padding_length * "~"

    # generate the encrypted image string
    encrypted = aes.encrypt(image_string)

    # create an image from the encrypted string
    encrypted_img = Image.frombuffer("RGB", input_image.size, encrypted, 'raw', "RGB", 0, 1)
    
    plt.subplot(122)
    plt.imshow(encrypted_img)
    plt.show()
    
    # create and save the output image
    # encrypted_img.save(output_filename, 'PNG')

    print("Encrypted using AES in " + cipher_mode + " mode!")
    print 'Entropy on original:  ', fn_entropy(image_string)
    print 'Entropy on encryption: ', fn_entropy(encrypted_img.tobytes())

In [None]:
def encrypt_text(text,cipher_mode):
    
    # Key must be one of 16/24/32 bytes in length.
    key = "0123456789ABCDEF"
    if cipher_mode == 'ECB':
        mode = AES.MODE_ECB
    elif cipher_mode == 'CBC':
        mode = AES.MODE_CBC
    elif cipher_mode == 'CFB':
        mode = AES.MODE_CFB
    elif cipher_mode == 'OFB':
        mode = AES.MODE_OFB
    iv = os.urandom(IV_SIZE)

    aes = AES.new(key, mode, iv)
    
    # Padding
    # The input string must be padded to the input block size.
    text_padding_length = BLOCK_SIZE - len(text) % BLOCK_SIZE
    text += text_padding_length * "~"
    text += text
    
    text_enc = aes.encrypt(text)
    
    print("Encrypted using AES in " + cipher_mode + " mode!")
    print 'Input text:     ', text
    print 'Encrypted text: ', text_enc.encode('hex')
    ####

In [None]:
encrypt_text('Hello World','ECB')

In [None]:
encrypt_text('Hello World','CBC')

In [None]:
encrypt_image('ECB')

In [None]:
encrypt_image('CBC')

In [None]:
encrypt_image('CFB')

In [None]:
encrypt_image('OFB')