# **An Introduction to Cryptography Engineering with age**

---



**Student ID:** *put your student ID here!*

*Table of contents:*

- [Setup code](#setup-code-cell)
- [Introduction](#intro)
- [Scrypt: generating the wrap key](#wrap-key)
- [ChaCha20-Poly1305: decrypting the file key](#file-key)
- [HKDF: generating the payload key](#payload-key)
- [HMAC: authenticating the header](#header-hmac)
- [Decrypting the payload](#payload-decryption)
- [Putting it all together](#e2e-demo)

In [None]:
# @title **Setup** <a name="setup-code-cell"></a>
# @markdown **Make sure to run this cell before working on the rest of this notebook!**
# @markdown
# @markdown This cell will install dependencies that you need to run the rest of
# @markdown this notebook. It also defines `b64encode` and `b64decode` functions
# @markdown that you will need to encode and decode data using
# @markdown [Base64](https://en.wikipedia.org/wiki/Base64).

print("Installing dependencies...")

!pip install -q 'cryptography==37.0.2' 'rich[jupyter]'

import base64 as _base64
from rich import print as rich_print

def b64encode(data: bytes) -> bytes:
    """Base64-encode a string or byte string without padding."""
    data = _base64.b64encode(data)
    return data.replace(b"=", b"")


def b64decode(data: bytes) -> bytes:
    """Base64-decode a byte string without padding."""
    data += b"=" * (-len(data) % 4)
    return _base64.b64decode(data)

### Test cases generated from various age files

import dataclasses
from typing import Tuple

@dataclasses.dataclass
class AgeFileTestCase:
    password: bytes
    scrypt_salt: bytes
    scrypt_work_factor: int
    wrap_key: bytes

    file_key: bytes
    file_key_encrypted: bytes

    header_mac: bytes
    
    payload_key: bytes
    payload: bytes

    def generate_header(self) -> bytes:
        header = b"age-encryption.org/v1\n"
        header += b"-> scrypt " + self.scrypt_salt + b" " + str(self.scrypt_work_factor).encode("utf-8") + b"\n"
        header += self.file_key_encrypted + b"\n"
        header += b"---"
        return header

    def test_wrap_key_args(self) -> Tuple[bytes, bytes, int, bytes]:
        return (self.password, self.scrypt_salt, self.scrypt_work_factor, self.wrap_key)

    def __repr__(self) -> str:
        fields = dataclasses.asdict(self)
        fields = "\n".join(f"  {key:<20s} = {val!r}" for (key, val) in fields.items())
        return f"{self.__class__.__name__}:\n" + fields


TEST_CASES = []

###########################################################
# TEST CASE
#
# Generated with: rage -e -p -o /dev/stdout <(echo "hello, world" | tee input.txt) | tee output.age | strings

test_case = AgeFileTestCase(
    password = b"teach-wet-adjust-lucky-stand-order-north-release-avocado-text",
    scrypt_salt = b"AyKPpy4hAnp306wrAoIpeA",
    scrypt_work_factor = 19,
    wrap_key = b"C\xef\xab\x81\x86\xc9\xd6\xad\x98\x1e\x81J\x18\xe6F|=\x98\x15\xe2\xb6~K\xf0\xef\x9c\xfa\xf0\xbb\xdd:D",
    file_key_encrypted = b"hQqSsgO3IcqsRySXWWsFgYfoozZ3ezpYayw7KqmaNy4",
    file_key = b"\x1f\x07e/\xe7\x14\xf5@Pa\x1f5\xb38s#",
    header_mac = b"6puD87ToLSviw+26C02WXaJSukkd2i9x3093F+6g+D8",
    payload_key = b'\x15`\x8c\xa0\xfe\x88\xc7kh\xe1\xe6\xb4\xa0\xcb\xfd\xf8\xf6`\xf6\x8b\x12\xa9\xbc\xc8q$\x10\xab\x11q\x04\xf5',
    payload = b"tF\x14\x00\x06\x9f\xed\xd3G\x07\xbf\x00\xcf\x9d<\x99\xf8\x7fA\xea\x82l\x9f\xa3\xb44\x8bq$\x08\x8e\xb5\x1fl\x94)\xfb\x86@\xee\xa8\xeeU\xad\xd2",
)
TEST_CASES.append(test_case)

###########################################################
# TEST START
#
# Generated with: rage -e -p -o /dev/stdout <(echo "age-encryption.org" | tee input.txt) | tee output.age | strings

test_case = AgeFileTestCase(
    password = b"owner-better-crawl-morning-engine-burden-pottery-fitness-three-else",
    scrypt_salt = b"QsX8Fvngakq5II6tR2edKQ",
    scrypt_work_factor = 18,
    wrap_key = b"\xa7\x1dR\xb1BI\xb6~V\x0e[I\x97#\x8b\x1d&\xc8\xa2\xb8\x93\"3\xb6\xc8\x89:\x8a\x04\xb2\xd2b",
    file_key_encrypted = b"Ycr2xcHYC5GNtqklqYJDsCzda4pQDOoyW3UJwqmZ6pg",
    file_key = b"w9\xc1\xbd\xce&\x1e\xee\x83pE\xd6dh{\x9d",
    header_mac = b"RzzVioP4Lq9yOcmge+8QVIEONAaqt87NY2CMIpmvM1c",
    payload_key = b'\xbbS\xde\xf9\xc7\xd1{\xa8\xe8O\x80\xa7^(,<\x08\xc7%CK\x93\xbd|w\t1\xe1\x15\x83\x9e5',
    payload = b"\xf7\xd1A\xa3,Y\xb5\xdd\xd8\r\xc2\xcfD\xef\xd5\x01\x84\x1fX*i\x92\x05\xdc\xa3\x9fz\xd0\xee\xfe\xfe\x1b9\x15\x9c\x93\x85\xf5Dh\xed\xbfQ\xd3\x06<\x9d\xc5\xa9\xb1\x9a",
)
TEST_CASES.append(test_case)

###########################################################
# TEST START
#
# Generated with: rage -e -p -o /dev/stdout <(echo "cryptography" | tee input.txt) | tee output.age | strings

test_case = AgeFileTestCase(
    password = b"riot-alcohol-forum-fire-silent-trust-surface-enough-offer-viable",
    scrypt_salt = b"EV2eaSAcbT+wI2aub+KDLA",
    scrypt_work_factor = 18,
    wrap_key = b"\xd8$\xac0\x03;\x8c71\xb1\\\xf3\xf2?\xad\x02(P\x88\xb0\x1c\x13<}u.-k\xb6;)\xd2",
    file_key_encrypted = b"/CoZrZZ+2wmAhYghZKEAemf9Y12fOwsjtSIkCf2nA2k",
    file_key = b"cU\xa3\x16;Z<\x00Y\x1a\x11\xd3g\\\xf0}",
    header_mac = b"HPQQmI4+UMWasD7mnpR5+i5EdG9nReNLWuTED/ASoA0",
    payload_key = b"'7\x95\xcc\xd5sT\xad\xcd_J\xb7S;\xff\xc9\xa7\xb3\xbc\xd3\xaa\x15\xe0aU\xc3\x08q\xd4c\x88g",
    payload = b"^U\xe0\x91\x17f\xd4^\x05\x7f\x1b\xe3\x85G\x95\xc6^\xacwRe\xf5\x08\xb4\xd7BgzP\x01\xe66\x91\xe7\x94R$\x1b,[\xf2;\xbd\x90\x13",
)
TEST_CASES.append(test_case)

###########################################################

# Utilities for reading and parsing age files

from dataclasses import dataclass

@dataclass
class AgeFile:
    """Class used to represent the contents of an age file.

    :param bytes header: the full header of the file, which should be validated
        against the ``header_hmac`` using HMAC-SHA-256.
    :param bytes scrypt_salt: the salt used by Scrypt when generating the wrap
        key.
    :param bytes scrypt_work_factor: the base-2 log of the work factor used by
        Scrypt when generating the wrap key.
    :param bytes encrypted_file key: the file key, encrypted using
        ChaCha20-Poly1305 using wrap key from the Scrypt stanza.
    :param bytes header_hmac: the HMAC of the file header, which should be
        validated to ensure that the header has not been tampered with.
    :param bytes payload_key_salt: the salt used by HKDF-SHA-256 for the payload
        key.
    :param bytes payload: the age file payload.
    """

    header: bytes
    scrypt_salt: bytes
    scrypt_work_factor: int
    encrypted_file_key: bytes
    header_hmac: bytes
    payload_key_salt: bytes
    payload: bytes

    @property
    def size(self) -> int:
        return len(self.payload)

    def __repr__(self) -> str:
        return f"{type(self).__name__}({self.size} bytes)"


def parse_age_file(path) -> AgeFile:
    with open(path, "rb") as f:
        header = [f.readline().rstrip() for _ in range(4)]
        payload = f.read()

    assert header[0] == b"age-encryption.org/v1"
    assert header[1].startswith(b"-> scrypt")
    assert header[3].startswith(b"--- ")

    _, _, scrypt_salt, scrypt_work_factor = header[1].split(b" ")
    scrypt_work_factor = int(scrypt_work_factor)
    encrypted_file_key = header[2]
    _, header_hmac = header[3].split(b" ")
    header = b"\n".join(header[:3]) + b"\n---"

    payload_key_salt = payload[:16]
    payload = payload[16:]

    return AgeFile(
        header,
        scrypt_salt,
        scrypt_work_factor,
        encrypted_file_key,
        header_hmac,
        payload_key_salt,
        payload
    )

###########################################################

print("Running setup script...")

script = """
#!/bin/bash

set -euo pipefail

DOWNLOAD_URL="https://github.com/FiloSottile/age/releases/download/v1.0.0/age-v1.0.0-linux-amd64.tar.gz"

if ! which age >/dev/null; then
    pushd $(mktemp -d) >/dev/null
    wget --quiet "$DOWNLOAD_URL" -O age.tar.gz
    tar xf age.tar.gz

    mv age/age /usr/local/bin/age
    mv age/age-keygen /usr/local/bin/age-keygen
    popd >/dev/null
fi

if [ ! -d ./age-notebook ]; then
    git clone --depth 1 https://github.com/kernelmethod/age-notebook.git
fi
"""

with open("/tmp/setup.sh", "w") as f:
    f.write(script)

!bash /tmp/setup.sh

## **General hints before you start**

**_Before you start writing code, run the ["setup" code cell](#setup-code-cell) above!_**

This homework is all about learning how different cryptographic ideas work with one another. You shouldn't need to write large amounts of code: all of the functions in the instructor's solution have $\le 5$ lines of code except for `decrypt_payload`, which is about 20 lines.

You will use Python's [`cryptography` package](https://cryptography.io/en/latest/) for this assignment. When in doubt, consult its documentation on https://cryptography.io, or come visit us during office hours. For this assignment, you'll use the following classes from `cryptography`:

- `cryptography.hazmat.primitives.kdf.scrypt.Scrypt` ([docs](https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.scrypt.Scrypt))
- `cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305` ([docs](https://cryptography.io/en/latest/hazmat/primitives/aead/#cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305))
- `cryptography.hazmat.primitives.hmac.HMAC` ([docs](https://cryptography.io/en/latest/hazmat/primitives/mac/hmac/#cryptography.hazmat.primitives.hmac.HMAC))
- `cryptography.hazmat.primitives.hashes.SHA256` ([docs](https://cryptography.io/en/latest/hazmat/primitives/cryptographic-hashes/#cryptography.hazmat.primitives.hashes.SHA256))
- `cryptography.hazmat.primitives.kdf.hkdf.HKDF` ([docs](https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.hkdf.HKDF))

Some other tips:

**Is this your first time using Google Colab or Jupyter?** If you've never used Jupyter notebooks before (or Google Colab, which hosts Jupyter for you), check out [this intro](https://colab.research.google.com/notebooks/intro.ipynb).

**Base64:** [Base64](https://en.wikipedia.org/wiki/Base64) is a data encoding format that allows you to represent arbitrary binary data as text. For instance, the Base64 encoding of the byte string

```python
b"\x05\x8f\xe2\x1a\x05}\xb5\xae"
```

is

```python
b"BY/iGgV9ta4="
```

Base64 is often convenient to use when you need some way to represent arbitrary binary data in text.

For this assignment, you will need to Base64-decode some data, i.e. convert it from its Base64 representation back into bytes. For this, you will want to use the `b64decode` function I've prepared for you, which is defined in the [setup cell](#setup-code-cell) above. You can call `b64decode` by passing in the string you want to convert from Base64 to bytes. For instance:

```python
>>> b64decode(b"BY/iGgV9ta4=")
b"\x05\x8f\xe2\x1a\x05}\xb5\xae"
```

---
## **Intro: an overview of the age file format** <a name="intro"></a>

For this assignment, we're going to look at the [age file format](https://c2sp.org/age). Your task will be to write Python code that will allow you to read the contents of an age file.

### **Introduction to age**

age (pronounced like the Italian word ["aghe"](https://translate.google.com/?sl=it&text=aghe)) is a format for storing encrypted files. Here is what an age file looks like:

age_file.drawio.svg

An age file consists of two pieces: the **header** and the **payload**. The **header** contains a lot of useful information about the file, including

- What version of age it uses (right now there's only one version, `v1`)
- **_Stanzas_** specifying different **_recipient types_**. These stanzas tell you who is able to decrypt the file and what method they can use to decrypt it.
- Finally, it ends with an _**HMAC**_ (Hash-based Message Authentication Code) that verifies that an attacker hasn't messed with the header contents.

The **payload** contains the actual file contents, which are encrypted with [ChaCha20-Poly1305](https://en.wikipedia.org/wiki/ChaCha20-Poly1305).

### **How do you read an age file?**

In general, there are multiple ways to decrypt an age file based on what stanzas are in it. To streamline this assignment we're only going to look at one kind of stanza, the scrypt stanza.

The simplified process for reading an age file is going to work as follows:

* First, we will take the file password and use the scrypt key derivation function to generate a **wrap key**.
* Next, we will use the **wrap key** to decrypt the **file key** using ChaCha20-Poly1305.
* The **file key** will be used to generate a **payload key** using HKDF-SHA-256.
* Finally, the **payload key** will be used to decrypt the contents of the file, once again using ChaCha20-Poly1305. <a name="cite_ref-keys-explained"></a>[<sup>[keys-explained]</sup>](#cite_note-keys-explained)

age_keys.drawio.svg

---

## **Generating the wrap key: Key derivation functions (KDFs)** <a name="wrap-key"></a>

Our first step in reading the contents of an age file is to generate a *wrap key*. The wrap key is a 32-byte value generated from a secret password using a [*key derivation function (KDF)*](https://en.wikipedia.org/wiki/Key_derivation_function).


### **What's a KDF?**

A *key derivation function* is an algorithm for taking a secret, random value and turning it into one or more keys for encryption. In our case this secret value is a password, but it can also be another encryption key (as we'll see later when we talk about HKDF-SHA-256) or some other random data.

A *password-based* KDF is a specialized form of KDF specially designed for generating an encryption key from a password. It is expensive (in CPU and memory costs) to compute. This makes it difficult for an attacker to try to guess the password used to generate the wrap key.

Password-based KDFs almost always generate keys from a password (which is hidden from an attacker) and a random [*salt*](https://en.wikipedia.org/wiki/Salt_(cryptography)) (which is made public). The use of a salt makes it much more difficult for an attacker to build a [rainbow table](https://en.wikipedia.org/wiki/Rainbow_table) (a table of pre-computed encryption keys generated from KDFs), since they have to store a key in the table for every possible password-salt pair.

The password-based KDF we'll be using to generate the wrap key in this assignment is [scrypt](https://en.wikipedia.org/wiki/Scrypt).

### **Wait, isn't scrypt a password hashing algorithm?**

You may also be familiar with scrypt as a password hashing algorithm. In fact, password-based KDFs can often also be used as password hashing algorithms, and vice-versa. scrypt isn't the only example of a KDF that is also used as a password hashing algorithm: [bcrypt](https://en.wikipedia.org/wiki/Bcrypt), [PBKDF2](https://en.wikipedia.org/wiki/PBKDF2), and [Argon2](https://en.wikipedia.org/wiki/Argon2) are all also examples of password-based KDFs that can also be used for password storage.

### **Scrypt recipient stanza**

One of the recipient types specified by the age file format is the [scrypt recipient type](https://github.com/C2SP/C2SP/blob/main/age.md#recipient-stanza). Here's an example of an scrypt recipient stanza:

```
-> scrypt 8BjF/AOxcwvC4x5KPFb0cQ 18
Zx0/vhZtBYnn2a2Qdrk8b6YqvK6jJbeYgAy4VUndIAc
```

scrypt_stanza.drawio.svg

Breaking this down, we have:

- _**Stanza type:**_ this says that this is an scrypt recipient type.
- _**Salt:**_ this is a Base64-encoded random salt.
- _**Work factor:**_ this is the base-2 log of the CPU/memory cost parameter used by scrypt (i.e., the full cost parameter is $2^{18}$). A higher work factor makes it tougher for an attacker to crack the password, but it also makes it more expensive to compute the wrap key.
- _**Encrypted file key:**_ this is the file key after being encrypted with the wrap key and then Base64-encoded.

**The first step of reading an age file is generating a 32-byte wrap key using scrypt.** The salt given to scrypt is

```python
b"age-encryption.org/v1" + base64_decoded_salt
```

and the cost parameter is $n = 2^{\text{work factor}}$.

**Task:** In the following cell, you should implement the `compute_wrap_key` function to compute the wrap key using the [`Scrypt` class](https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.scrypt.Scrypt) from Python's `cryptography` package. You will take the password (as a byte string), the salt (as a Base64-encoded byte string), and the work factor as inputs, and return the wrap key as output.

**Hints:**

- [Documentation for the `Scrypt` class](https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.scrypt.Scrypt)
- Make sure to use the `b64decode` function (defined in the ["setup" section](#setup-code-cell) of the notebook) to convert the `salt` from a text string into bytes.
- Don't forget to prepend `b"age-encryption.org/v1"` to the salt before you give it to `Scrypt`.
- You will want to pass in `r = 8` and `p = 1` as the block size and parallelization parameters to `Scrypt`.

In [None]:
from cryptography.hazmat.primitives.kdf.scrypt import Scrypt


def generate_wrap_key(password: bytes, salt: bytes, work_factor: int) -> bytes:
    """Compute the wrap key for an scrypt recipient stanza.
    
    :param bytes password: the password that the wrap key is derived from.
    :param bytes salt: the scrypt salt stored as a Base64-encoded byte string.
    :param int work_factor: the base-two logarithm of the scrypt work factor.
    :return: the stanza's wrap key, which is used to decrypt the file key.
    """

    # TODO: your code here!

In [None]:
# @title **Wrap key tests**
# @markdown If you implemented `generate_wrap_key` correctly, the tests in this
# @markdown code cell should pass.

def test_wrap_key(password, salt, work_factor, expected_wrap_key) -> None:
    wrap_key = generate_wrap_key(password, salt, work_factor)
    assert len(wrap_key) == 32, "Wrap key length must be 32 bytes"
    assert wrap_key == expected_wrap_key, (
        "Computed wrap key did not match actual wrap key.\n"
        f"  computed = {wrap_key!r}\n"
        f"  actual   = {expected_wrap_key!r}\n"
        "Parameters:\n"
        f"  password    = {password!r}\n"
        f"  salt        = {salt!r}\n"
        f"  work_factor = {work_factor!r}\n"
    )

for (i, test_case) in enumerate(TEST_CASES):
    test_wrap_key(*test_case.test_wrap_key_args())
    rich_print(f"[green]Passed test case {i}")

###########################################################
# ADDITIONAL TEST CASES
#
# Generated with: rage -e -p -o /dev/stdout <(echo "work factor = 14") | strings

password = b"fury-reunion-suggest-topic-zero-keen-neither-rate-post-banner"
salt = b"5NI1z65wZbvbvSZ3yj3bpg"
work_factor = 14

expected_wrap_key = b";p\xf8\xfc\xd1\xf0d\xc20\xda\xec\x99)\xb8;<\x05\xcd\xa0V\xc0\xad\xf1B\xe2\x1f\x9dm`&jY"
test_wrap_key(password, salt, work_factor, expected_wrap_key)
rich_print("[green]Passed additional test case 1")

###########################################################

rich_print("[bold green]All tests passed!")

---

## **Decrypting the file key: ChaCha20-Poly1305** <a name="file-key"></a>

Now that you have the wrap key, it's time to decrypt the file key!


### **Authenticated encryption**

An [**authenticated encryption with additional data (AEAD) algorithm**](https://en.wikipedia.org/wiki/Authenticated_encryption) is an algorithm that takes three inputs:

- **A plaintext:** some data that you want to encrypt
- **A key:** a secret string of bytes
- **(Optional) A header:** the header is the "additional data" in the term "authenticated encryption with additional data". This is some data that you don't want to encrypt, but for which you still want to guarantee integrity (i.e. you want to ensure that an attacker cannot modify it)

and returns two outputs:

- **A ciphertext:** the encrypted version of the plaintext.
- **An [authentication tag](https://en.wikipedia.org/wiki/Message_authentication_code):** Some extra bytes at the end of the encrypted data that is used to guarantee that an attacker has not modified the ciphertext or the header.

The decryption algorithm takes the ciphertext, the tag, and the unencrypted header, and checks whether an attacker has modified the ciphertext or header. If not, it decrypts the ciphertext, returning the original plaintext. An AEAD algorithm usually has two parts:

- a *cipher* (which encrypts the data and guarantees confidentiality), and
- a *message authentication code (MAC)* (which guarantees integrity).

[**ChaCha20-Poly1305**](https://en.wikipedia.org/wiki/ChaCha20-Poly1305) is an AEAD algorithm that combines the [ChaCha20](https://cr.yp.to/chacha.html) stream cipher with the [Poly1305](https://cr.yp.to/mac.html) MAC. It is an increasingly popular AEAD algorithm which is used in things like

- OpenSSH (a common Linux service for accessing a server remotely)
- Version 1.3 of the Transport Layer Security (TLS) protocol
- [Wireguard](https://www.wireguard.com/), an encrypted VPN (virtual private network) tunneling protocol
- [QUIC](https://www.rfc-editor.org/rfc/rfc9000.html), a relatively new transport protocol that has been proposed as an alternative to TCP (the Transport Control Protocol).

**For this assignment, you can treat ChaCha20-Poly1305 as a black box:** you don't need to know anything about how it works internally, just how to use it.

### **Using ChaCha20-Poly1305**

ChaCha20-Poly1305 takes four inputs: the three AEAD inputs mentioned earlier (a plaintext, a key, and optionally a header) and a [nonce](https://en.wikipedia.org/wiki/Cryptographic_nonce), which acts like a message ID. For each message you want to encrypt, you select a nonce (usually either randomly or using a counter) and encrypt the message with that `(key, nonce)` pair.<a name="cite_ref-nonces"></a>[<sup>[nonces]</sup>](#cite_note-nonces)

### **Using ChaCha20-Poly1305 to decrypt the file key**

Coming back to age: now that we've computed the wrap key, we can decrypt the file key, which is stored in the "body" of the scrypt recipient stanza. The file key is a 16-byte value encrypted using ChaCha20-Poly1305, with a nonce equal to $0$ and no header. You can use the [`ChaCha20Poly1305` class](https://cryptography.io/en/latest/hazmat/primitives/aead/#cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305) from the `cryptography` package to decrypt the body.

**Task:** implement the `decrypt_file_key` function below using the `ChaCha20Poly1305` class, taking the wrap key and the body of the scrypt stanza as inputs.

**Hints:**

- [Documentation for the ChaCha20Poly1305 class](https://cryptography.io/en/latest/hazmat/primitives/aead/#cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305)
- The body is once again Base64-encoded, so make sure to decode it first!
- The nonce should be a byte string of length 12 where all of the bytes are equal to $0$.
- The header should be an empty byte string.

In [None]:
from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305


def decrypt_file_key(
    wrap_key: bytes,
    body: bytes
) -> bytes:
    """Decrypt the file key from an scrypt recipient stanza.
    
    :param bytes wrap_key: the wrap key corresponding to the scrypt recipient
        stanza.
    :param bytes body: the body of the scrypt recipient stanza. This is a Base64-
        encoded byte string that contains the file key (encrypted using
        ChaCha20-Poly1305 with the wrap key.)
    :return: a 16-byte file key for the age file.
    """

    # TODO: your code here!

In [None]:
# @title **File key tests**
# @markdown If you implemented `decrypt_file_key` correctly, the tests in this
# @markdown code cell should pass.

from cryptography.exceptions import InvalidTag

for (i, test_case) in enumerate(TEST_CASES):
    wrap_key = test_case.wrap_key
    encrypted_file_key = test_case.file_key_encrypted

    try:
        file_key = decrypt_file_key(wrap_key, encrypted_file_key)
        assert len(file_key) == 16
        assert file_key == test_case.file_key, (
            "Decrypted file key did not match actual file key.\n"
            f"  decrypted file key = {file_key}\n"
            f"  actual file key    = {test_case.file_key_decrypted}\n"
            f"{test_case!r}"
        )
    except InvalidTag as ex:
        rich_print(f"[bold red]InvalidTag exception raised in test case {i + 1}")
        rich_print(
            "[red]This indicates that the message authentication code was not "
            "verified as authentic by ChaCha20-Poly1305. Make sure that you "
            "are performing decryption correctly."
        )
        raise ex

    rich_print(f"[green]Passed test case {i}")

rich_print("[bold green]All tests passed!")

---

## **Generating the payload key: KDFs redux** <a name="payload-key"></a>

Before we can decrypt the payload, we have to convert the *file key* into a *payload key*. For this, we'll use HKDF-SHA-256.

### **HMAC-based KDFs**

The file key is a 128-bit value *TODO: finish justification*

_**HMAC-based key derivation functions**_ are a class of KDFs that use HMAC to generate a key from a random string of data. Unlike Scrypt (which we used earlier to generate the wrap key), you wouldn't want to use HKDF to generate a key from a password. Instead, HKDF is great for lengthening keys, turning one key into many, and for [key exchange protocols](https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange).

HKDF takes a `salt` and an `info` parameter. The exact meaning of these parameters is a little subtle; there's a [great blog post](https://soatok.blog/2021/11/17/understanding-hkdf/) explaining the difference between them <a name="cite_ref-RFC5869"></a>[<sup>[RFC 5869]</sup>](#cite_note-RFC5869). One thing to know is that `info` is the parameter that allows you to turn one key into multiple. In this part we will generate the payload key by passing in `info = b"payload"`, and in the next part we will create a key to authenticate the file header using `info = b"header"`.

### **HKDF in age**

[age uses HKDF-SHA-256](https://github.com/C2SP/C2SP/blob/main/age.md#payload) (HKDF based on HMAC-SHA-256, which we will discuss next) to lengthen the 16-byte file key into a 32-byte key that can be used by ChaCha20-Poly1305. It passes in the first 16 bytes of the payload as the salt, and the byte string `b"payload"` as the info parameter.

**Task:** use HKDF-SHA-256 to generate the payload key from the file key.

**Hints:**

- [Documentation for the `HKDF` class](https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.hkdf.HKDF)

In [None]:
from cryptography.hazmat.primitives.hashes import SHA256
from cryptography.hazmat.primitives.kdf.hkdf import HKDF


def generate_payload_key(file_key: bytes, salt: bytes) -> bytes:
    """Generate the age payload key from its file key using HKDF-SHA-256.

    :param bytes file_key: a 16-byte file key.
    :param bytes salt: a salt for HKDF, read from the first 16 bytes of the age
        file payload.
    :return: the 32-byte payload key.
    """

    # TODO: your code here!

In [None]:
# @title **Payload key tests**
# @markdown If you implemented `generate_payload_key` correctly, the tests in
# @markdown this code cell should pass

for (i, test_case) in enumerate(TEST_CASES):
    file_key = test_case.file_key
    salt = test_case.payload[:16]

    payload_key = generate_payload_key(file_key, salt)
    assert len(payload_key) == 32, "The payload key must be 32 bytes long."
    assert payload_key == test_case.payload_key, (
        "Computed payload key did not match actual payload key.\n"
        f"  computed payload key = {payload_key}\n"
        f"  actual payload key   = {test_case.payload_key}\n"
        f"{test_case!r}"
    )
    rich_print(f"[green]Passed test case {i}")

rich_print("[bold green]All tests passed!")

---

## **HMAC: authenticating the header** <a name="header-hmac"></a>

There's one thing we missed: what if an attacker decided to mess with the header of the age file? Before we decrypt the payload, we'll want to verify the integrity of the file header.

### **Going back to the header**

Here's an example of the full header of an age file:

```
age-encryption.org/v1
-> scrypt 8BjF/AOxcwvC4x5KPFb0cQ 18
Zx0/vhZtBYnn2a2Qdrk8b6YqvK6jJbeYgAy4VUndIAc
--- WmgPk/80rjupQqVNhLn47/4MRimwxusk4A/NMbz5qjM
```

Following this header is the binary data containing the encrypted file contents.

One thing that we haven't done yet is verify that an attacker hasn't messed with the header. To check the header's integrity, we will use a message authentication code (MAC), in this case *HMAC-SHA-256*.

full_age_header.drawio.svg

### **HMAC**

[*Hash-based message authentication codes (HMACs)*](https://en.wikipedia.org/wiki/HMAC) are ways of turning cryptographic hash functions into MACs. Like other message authentication codes, HMAC takes two inputs:

- a secret key, and
- a message that should be authenticated.

HMAC generates a byte string that 

HMAC generates a byte string that acts like a "signature" for that key and message. It is extremely difficult for an attacker to forge a signature for a different message without knowing the key.

### **Verifying the header with HMAC-SHA-256**

[age includes a header MAC](https://github.com/C2SP/C2SP/blob/main/age.md#header-mac) created with HMAC-SHA-256 (HMAC based on the SHA-256 hash function) and then encoded with Base64. It uses this HMAC to verify that the file header hasn't been tampered with. To compute HMAC-SHA-256 we're going to need the [`HMAC` class](https://cryptography.io/en/latest/hazmat/primitives/mac/hmac/#cryptography.hazmat.primitives.hmac.HMAC) and the [`SHA256` class](https://cryptography.io/en/latest/hazmat/primitives/cryptographic-hashes/?highlight=sha256#cryptography.hazmat.primitives.hashes.SHA256). Make sure to check out the documentation for `HMAC` to learn how it works.

We're also going to need to generate a key that we'll use to create the signature. For that we'll use HKDF-SHA-256 again, like we did in the last problem. If you got [`HKDF`](https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/#cryptography.hazmat.primitives.kdf.hkdf.HKDF) working previously you can almost exactly copy your code to the cell below. There are two differences in how we use `HKDF` for this problem, though:

- The salt we use is an empty byte string, `b""`, and
- The `info` parameter is now the byte string `b"header"`.

**Task:** Implement the `generate_hmac_key` and `is_valid_header` functions below to take the file header, the file key, and the byte string

**Hints:**

- [Documentation for the `HMAC` class](https://cryptography.io/en/latest/hazmat/primitives/mac/hmac/#cryptography.hazmat.primitives.hmac.HMAC)
- Once again, remember to Base64-decode the MAC given to `is_valid_header`
- `generate_hmac_key` should be almost exactly the same as `generate_payload_key`, except you should use different `salt` and `info` parameters.

In [None]:
from cryptography.hazmat.primitives.hashes import SHA256
from cryptography.hazmat.primitives.hmac import HMAC
from cryptography.hazmat.primitives.kdf.hkdf import HKDF

def generate_hmac_key(file_key: bytes) -> bytes:
    """Generate the key used for the age header HMAC using HKDF-SHA-256.

    :param bytes file_key: the file key.
    :return: returns a 32-byte byte string, computed using HKDF-SHA-256 with an
        empty salt and the info parameter b"header".
    """

    # TODO: your code here!


def is_valid_header(header: bytes, hmac_key: bytes, mac: bytes) -> bool:
    """Determine whether or not the age file header is valid by checking
    its MAC.

    :param bytes header: the file header, as a byte string.
    :param bytes hmac_key: the key used for HMAC, computed using the
        generate_hmac_key function.
    :param bytes mac: the MAC stored in the header, stored as a Base64-encoded
        byte string.
    :return: return True if the input MAC matches the true MAC.
    """    

    # TODO: your code here!

In [None]:
# @title **Header MAC test cases**
# @markdown If you implemented the `generate_hmac_key` and `is_valid_header`
# @markdown functions correctly, the tests in this code cell should pass.

from secrets import token_bytes

for (i, test_case) in enumerate(TEST_CASES):
    header = test_case.generate_header()
    header_mac = test_case.header_mac

    hmac_key = generate_hmac_key(test_case.file_key)
    assert len(hmac_key) == 32, "HMAC key must be 32 bytes long"

    valid = is_valid_header(header, hmac_key, header_mac)
    assert valid, "Header was not authenticated"

    rich_print(f"[green]Passed test case {i}")

# Pass in some headers that are explicitly invalid

for (i, test_case) in enumerate(TEST_CASES):
    header = test_case.generate_header()
    header_mac = b64encode(token_bytes(32))
    hmac_key = generate_hmac_key(test_case.file_key)

    valid = is_valid_header(header, hmac_key, header_mac)
    assert not valid, "Bad header was incorrectly authenticated"

    rich_print(f"[green]Passed erroneous test case {i}")

---

## **Decrypting the payload** <a name="payload-decryption"></a>

Final step! We've created the payload key and verified that the age file header looked okay. Now we can finally read the contents of the age file.

### **The payload format**

The final step is to decrypt the payload using the payload key. We aren't going to use any new tools from `cryptography` for this step, we're just going to be using the [`ChaCha20Poly1305` class](https://cryptography.io/en/latest/hazmat/primitives/aead/#cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305) again.

[When an age file is created](https://github.com/C2SP/C2SP/blob/main/age.md#payload), the payload is split into chunks of size 64KiB (KiB = "kibibyte" = $1024$ bytes, so the chunk size is $64 \cdot 1024 = 65,536$ bytes); note that the last chunk may have $< 65,536$ bytes. These chunks are then encrypted using ChaCha20-Poly1305. The twelve-byte nonce given to ChaCha20-Poly1305 is split into two pieces:

- The first 11 bytes are a big-endian counter, which is equal to $0$ for the first chunk, $1$ for the second chunk, $2$ for the third chunk, and so on.
- The last byte is equal to $0$ for every chunk except the last, where it's equal to $1$.

To decrypt the contents of an age file, you'll want to iterate over it in chunks and decrypt each chunk using ChaCha20-Poly1305 using the correct nonce for that chunk.

In practice, this chunking makes it easier to extract data from a random point in the file when the payload is very large. Instead of having to decrypt the entire payload in order to check its integrity, you only have to decrypt the chunk containing the data you want.

payload_encryption.drawio.svg

**Task:** Implement the `decrypt_payload` function below by using ChaCha20-Poly1305 to decrypt the payload as described above. Your function should accept two inputs, the payload (which is a long string of bytes) and the payload key. If you need to, make sure to reference the [documentation of the `ChaCha20Poly1305`](https://cryptography.io/en/latest/hazmat/primitives/aead/#cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305) class again so that you know how to decrypt the payload.

**Hints:**

To implement this function, you should iterate over the payload in chunks of 65,536 + 16 bytes (the last 16 bytes are the Poly1305 MAC that ChaCha20Poly1305 appends to each chunk). You should keep a counter of which chunk you're on and then use

```python
counter.to_bytes(length=11, byteorder="big")
```

  to convert it into an 11-byte big-endian byte string.
- In Python, you can create a single byte equal to zero with `b"\x00"`, and a single byte equal to one with `b"\x01"`.
- We don't attach any additional/associated data to our chunks, so you can pass in `b""` as the `associated_data` parameter to `ChaCha20Poly1305`'s `decrypt` method.

In [None]:
from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305

# NOTE: we set the chunk size to 65,536 + 16 because the last 16 bytes of each
# chunk are always a Poly1305 tag.
CHUNK_SIZE: int = 64 * 1024 + 16

def decrypt_payload(payload_key: bytes, payload: bytes) -> bytes:
    """Decrypt the payload of an age file.

    :param bytes payload_key: the key for the ChaCha20-Poly1305 cipher used to
        encrypt the payload.
    :param bytes payload: the payload of the age file.
    :return: the decrypted payload as a byte string.
    """

    # TODO: your code here!

In [None]:
# @title **Payload decryption tests**
# @markdown If you implemented `decrypt_payload` correctly, the tests in this
# @markdown code cell should pass.

from cryptography.exceptions import InvalidTag

for (i, test_case) in enumerate(TEST_CASES):
    payload_key = test_case.payload_key
    payload = test_case.payload[16:]

    try:
        file_contents = decrypt_payload(payload_key, payload)
        rich_print(f"[green]Passed test case {i}")
        rich_print(f"  [green]file {i} contents: [bold]{file_contents}")
    except InvalidTag as ex:
        rich_print(f"[bold red]InvalidTag exception raised in test case {i + 1}")
        rich_print(
            "[red]This indicates that the message authentication code was not "
            "verified as authentic by ChaCha20-Poly1305. Make sure that you "
            "are performing decryption correctly."
        )
        raise ex

rich_print("[blue]Running test case with 70Kb file...")
parsed_file = parse_age_file("age-notebook/tests/large-test/output.age")
payload_key = b"\xc8\xb4Z\x02\xc6q\xcfV\xdc\x8f\xc2\x15\x83\xe2Ci\xc0\xf8\xac^X\x1e\xc8\x1c\x1a\x9e\x9b\xc9\x9er\xc0{"

try:
    file_contents = decrypt_payload(payload_key, parsed_file.payload)
    assert len(file_contents) == 70000, (
        "Output generated for 70Kb file test had incorrect length (expected "
        f"len = 70000, got len = {len(file_contents)}"
    )
    assert file_contents == b"\x00" * 70000, (
        "Output generated for 70Kb file test was incorrect"
    )
except InvalidTag as ex:
    rich_print(f"[bold red]InvalidTag exception raised in large file test case")
    rich_print(
        "[red]This indicates that the message authentication code was not "
        "verified as authentic by ChaCha20-Poly1305. Make sure that you "
        "are performing decryption correctly."
    )
    raise ex

rich_print()
rich_print("[bold green]All tests passed!")

---

## **Putting it all together** <a name="e2e-demo"></a>

Now we've written all of the functions we need to read an age file! In this last step, we're going to put them all together.



### **Reading an age file**

Let's go back to the beginning and review the steps required to read an age file:

1. Extract the stanzas and encrypted payload from the file. In our simplified version of age, the file just has a single scrypt stanza.
2. Take a password as input, and use scrypt to generate a wrap key (`generate_wrap_key`)
3. Use the wrap key to decrypt the body of the scrypt stanza with ChaCha20-Poly1305, getting a file key (`generate_file_key`)
4. Use the file key to generate the header key (`generate_header_key`) and check the validity of the header using HMAC-SHA-256 (`is_valid_header`)
5. Use HKDF-SHA-256 to generate the payload key from the file key (`generate_payload_key`)
6. Use ChaCha20-Poly1305 to decrypt the payload (`decrypt_payload`)

Now comes the final part: using all of the functions you've implemented in this assignment, take the contents of an age file and decrypt its payload. Your function should take an `AgeFile` instance as input, which is a class roughly defined as follows (you can see the "setup" code cell for the precise definition):

```python
class AgeFile:
    """Class used to represent the contents of an age file.

    :param bytes header: the full header of the file, which should be validated
        against the ``header_hmac`` using HMAC-SHA-256.
    :param bytes scrypt_salt: the salt used by Scrypt when generating the wrap
        key.
    :param bytes scrypt_work_factor: the base-2 log of the work factor used by
        Scrypt when generating the wrap key.
    :param bytes encrypted_file key: the file key, encrypted using
        ChaCha20-Poly1305 using wrap key from the Scrypt stanza.
    :param bytes header_hmac: the HMAC of the file header, which should be
        validated to ensure that the header has not been tampered with.
    :param bytes payload_key_salt: the salt used by HKDF-SHA-256 for the payload
        key.
    :param bytes payload: the age file payload.
    """

    header: bytes
    scrypt_salt: bytes
    scrypt_work_factor: int
    encrypted_file_key: bytes
    header_hmac: bytes
    payload_key_salt: bytes
    payload: bytes
```


You will need to use each of the functions you've written for this assignment:

- `generate_wrap_key`
- `decrypt_file_key`
- `generate_hmac_key`
- `is_valid_header`
- `generate_payload_key`
- `decrypt_payload`

**Task:** implement `read_age_file` in the cell below. It should take the password and an instance of the `AgeFile` class as input, and

- raise an exception if the header has been tampered with, otherwise
- it should return the decrypted payload as output.

In [None]:
def read_age_file(password: bytes, age_file: AgeFile) -> bytes:
    """Read the contents of an age file.

    :param bytes password: the password for the age file.
    :param AgeFile age_file: an ``AgeFile`` instance containing the various
        components of the age file.
    :return: the decrypted payload.
    :raises Exception: if the header is invalid.
    """

    # TODO: your code here!

In [None]:
# @title **Age file reading tests**
# @markdown If you implemented `read_age_file` correctly, the tests in this
# @markdown code cell should all pass.

from pathlib import Path

tests = Path.cwd() / "age-notebook" / "tests"

def run_test_case(name: str) -> bytes:
    output_path = tests / name / "output.age"
    rich_print(f"[blue]Reading {str(output_path)}")
    age_file = parse_age_file(output_path)
    with open(tests / name / "passphrase.txt", "rb") as f:
        passphrase = f.readline().rstrip()

    return read_age_file(passphrase, age_file)


for name in ("test1", "test2", "test3", "large-test"):
    with open(tests / name / "input", "rb") as f:
        input_data = f.read()
    contents = run_test_case(name)
    assert contents == input_data, f"Failed test {name}"
    rich_print(f"[green]Passed test [bold]{name}")

# Run a test for a file with a bad HMAC
raised_error: bool = False
try:
    run_test_case("erroneous-test")
except:
    raised_error = True

assert raised_error, (
    "File with bad HMAC did not raise exception"
)
print("[green]Passed test [bold]erroneous-test")
print("[blue]Running final test case")
output = run_test_case("final-test")
print(f"[green]Decrypted content: [bold]{output}")

---

## Footnotes

### <a name="cite_note-keys-explained"></a>**_Keys explained:_** [(^)](#cite_ref-keys-explained)

You might be wondering why there are so many keys involved in this process. Why couldn't we just stop after generating the wrap key in the first step?

The answer is unfortunately a complicated bit of cryptography engineering that can't be easily explained without a deeper cryptographic background. Suffice it to say, the wrap key $\to$ file key conversion is needed to implement a core feature of age that allows you to use asymmetric cryptography to decrypt the file. And it turns out that we need one key to decrypt the file and another key to authenticate the header, hence an additional file key $\to$ payload key conversion.

### <a name="cite_note-nonces"></a> **_The dangers of reusing nonces:_** [(^)](#cite_ref-nonces)

**It is _extremely_ important that you never use the same (key, nonce) pair to encrypt two different messages!** An attacker who intercepts two messages encrypted using the same (key, nonce) pair can decrypt the messages. Recalling the discussion on [one-time pads](https://en.wikipedia.org/wiki/One-time_pad), if you encrypt two messages $m_1$ and $m_2$ with the same one-time pad to get ciphertexts $C_1$ and $C_2$, an attacker can learn the XOR of the messages by calculating

$$
C_1 \oplus C_2 = (P \oplus m_1) \oplus (P \oplus m_2) = m_1 \oplus m_2
$$

(where $\oplus =$ XOR). ChaCha20 works by taking a (key, nonce) pair and turning it into a one-time pad up to 256 GB long, so if you reuse a key and nonce to encrypt two different messages, you are effectively using the same one-time pad twice. For a real-world example of where nonce reuse had severe consequences, you can look at the [2010 hack of Sony's Playstation 3](https://archive.org/details/console-hacking-2010), which gave hackers access to the keys required to run any software on the console.

### <a name="cite_note-RFC5869"></a>**_RFC 5869:_** [(^)](#cite_ref-RFC5869)

If you're feeling up for a challenge, you can also try reading [RFC 5869](https://datatracker.ietf.org/doc/html/rfc5869), which specifies how HKDF works.