In [1]:
import base64
from hashlib import shake_256
import hmac, hashlib, secrets

## Key Methods

This is a notebook that contains a bunch of helpful methods for both the explanation of the key exchange algorithm and the actual implementation of that algoritm. It contains conversions between some of these types:

1. $\mathbb{Z}$: Integers

2. $\mathbb{Y}:$ Python `bytes` object

3. $\mathbb{B}:$ Strings that use the subset of ASCII characters from the url-safe version of Base64. 

4. $\mathbb{S}:$ Strings

It also contains methods to generate keys, as well as an XOR function and functions for `HMAC`.

### Conversion Functions

Having functions to convert between different types is very important in this process, as this key exchange algorithm is constantly changing the types of the keys and the encrypted messages. There are four conversion functions, mainly involving translating to and from bytes.

First is the function to convert an integer to a Python `bytes` object, which isn't very readable as you'll see, but many cryptographic functions take a `bytes` object as their input, so it's quite a useful conversion to have. The reverse is also important, to make the bytes mean something to humans. In math notation:
\begin{align}
\mathbb{Z} \rightarrow \mathbb{Y}\\
\mathbb{Y} \rightarrow \mathbb{Z}\\
\end{align}

In [None]:
def int_to_bytes(n: int) -> bytes:
    return n.to_bytes((n.bit_length() + 7) // 8, byteorder='big')
def bytes_to_int(b: bytes) -> int:
    return int.from_bytes(b, byteorder='big')

In the code above, the `bit_length` function above returns the number of bits needed to represent whatever it is called on. We then add seven and floor divide by eight. This step converts the length in bits to the length in bytes, and rounds up to the nearest whole byte. The `'big'` argument makes it so that the array is a **big-endian byte array**. All this means is that they most significant byte comes first, and this is standard in moth cryptographic settings.

To give an example, big-endian is the order in which we store our numbers. For the number 2054, we store the 2 first. since it represents two thousand, which is the biggest number. All of our numbers are stored in big-endian order, so this should feel familiar, even if it's not immediately obvious when looking at a byte string.

Within this byte representation, however, not all of the characters are ASCII characters, which can cause issues when transmitting data over email. Additionally, byte encoded messages are generally very long, so we convert the byte representation to Base64. This encoding contains all the same data, but in all ASCII characters, and in a more condensed format. The `decode` function is then used to convert the `Base64` number into a UTF-8 string, which can be sent over email. Having the reverse is also useful for the receiver, who can then convert the message back into bytes to decode. In math notation:
\begin{align}
\mathbb{Y} \rightarrow \mathbb{B}\\
\mathbb{B} \rightarrow \mathbb{Y}\\
\end{align}

In [None]:
def bytes_to_Base64(n: bytes) -> str:
    return base64.b64encode(n).decode()

def Base64_to_bytes(n: str) -> bytes:
    return base64.b64decode(n)

Converting from regular `String` objects to bytes is necessary as well, since that is what allows us to encode our messages in bytes, and manipulate them with cryptographic functions:
\begin{align}
\mathbb{S} \rightarrow \mathbb{Y}\\
\mathbb{Y} \rightarrow \mathbb{S}\\
\end{align}

In [None]:
def str_to_bytes(s: str) -> bytes:
    return s.encode('utf-8')

def bytes_to_str(b: bytes) -> str:
    return b.decode('utf-8')

These functions act as wrappers for the Python `encode` and `decode` functions, but their function signature is meant to add some clarity to the function they perform. The argument `'utf-8'` specifies how the string the function is given should be interpreted. `'uft-8'` is generally the standard encoding for strings these days, although you will see `'ascii'` as well sometimes.

Also included here is a function to convert a very large integer to its scientific notation. Python's `int` type has arbitrary precision, so it can represent some absolutely massive numbers. Python's `float`, however, cannot do that, and thus it is impossible to convert a very large `int` to `float`. This function uses the Python `fstring` to represent an `int` in scientific notation:

In [None]:
def print_large_int_sci(x, digits=5):
    s = str(x)
    exponent = len(s) - 1
    mantissa = s[:digits]
    return f"{mantissa[0]}.{mantissa[1:]}e+{exponent}"

### Other Necessary Cryptographic Functions

Now that we have ways to manipulate our data into different data types, we also need some functions that actually perform the operations necessary for this key exchange algorithm. 

First is the `generate_keystream` function, which will generate a variable length `bytes` object from the `bytes` representation of a message we want to send:

In [None]:
def generate_keystream(secret_bytes: bytes, length: int, nonce: bytes = b"") -> bytes:
    return shake_256(nonce + secret_bytes).digest(length)

To generate our keystream, we need a hash function that will map the bytes of our message to a seemingly random. We will do this using the `shake256` hash function, which we can depict with $H$. This function is great because it has a variable output length, so our message can be different lengths. the `digest` function lets us choose the amount of bytes in the output keystream. We first encode our message in bytes, then convert it to a keystream of the same length.

A hash function is used here so that our final keys look indistinguishable from random, and they also remove any possible structure that could be gleamed from our message. Our function also includes input for a nonce. A nonce is important to include to patch some security holes in the algorithm, and you can read more about them in the potential issues section of the `key_exchange_basic` notebook. 

In this algorithm, we also need to define an XOR function, which can be symbolized with this: $\oplus$. We define it on a `Base64` representation of a message and a `bytes` object, generally a `keystream`, such that when applied, it applies the XOR function bit by bit through the both objects, and it outputs a new object, also of the `bytes` type:
\begin{align}
\mathbb{B} \oplus \mathbb{Y} \rightarrow \mathbb{Y} \\
\end{align}


In [None]:
def xor_bytes(b64_bytes: bytes, keystream: bytes) -> bytes:
    return bytes(m ^ k for m, k in zip(b64_bytes, keystream))

The `zip` function above is used to loop over the two `bytes` objects simultaneously. 

### HMAC

Lastly are our `MAC` functions, which are essential for ensuring that our messages are not tampered with in transit. A more in depth explanation about their necessity is in the potential issues section of the `key_exchange_basic` notebook. 

This first function, `create_mac`, is used to create a MAC object from our `keystream`, our message as a `bytes` object with the nonce added to it, call it `message_blob`, and a hashing function to use, which in this case is the `sha256` algorithm:

$$
\begin{align*}
\text{HMAC}_{\text{keystream}}(\text{message}\_ \text{blob}) 
&= H\left((\text{keystream} \oplus \text{opad}) \,\|\, \right. \\
&\quad \left. H((\text{keystream} \oplus \text{ipad}) \,\|\, \text{message}\_  \text{blob})\right)
\end{align*}
$$

Where:

- \( H \): cryptographic hash function (e.g., SHA-256)
- \( K' \): the key \( K \), padded or truncated to the block size (typically 64 bytes)
- \( opad \): outer padding (`0x5c` repeated to block size)
- \( ipad \): inner padding (`0x36` repeated to block size)
- \( $\oplus$ \): byte-wise XOR
- \( $ \| $ \): byte concatenation

HMAC ensures both **message integrity** and **authenticity**.

What this HMAC function does:

1. **Prepare the key**:
   - If \( K \) is longer than the block size, hash it: \( K' = H(K) \)
   - If shorter, pad with zero bytes to the block size

2. **Inner hash**:
   - \( \text{inner} = H((K' \oplus \text{ipad}) \,\|\, M) \)

3. **Outer hash**:
   - \( \text{HMAC}_K(M) = H((K' \oplus \text{opad}) \,\|\, \text{inner}) \)

This construction prevents length-extension attacks and is secure even if \( H \) itself has certain weaknesses.


In [None]:
def create_mac(key_bytes: bytes, blob: bytes):
    return hmac.new(key_bytes, blob, hashlib.sha256).hexdigest()

def verify_mac(secret_bytes: bytes, data: bytes, mac: str) -> bool:
    return hmac.compare_digest(create_mac(secret_bytes, data), mac)

The `hexdigest` function returns our `HMAC` in a 64 character hexadecimal output. 

The `verify_mac` function simply compares the `HMAC` created on the local machine to the one received from the sender to make sure they match.

All of these functions are essential in creating a readable, concise description of our key exchange algorithm. 