## Computational Theory Assessment

In [1]:
import numpy as np

## Problem 1: Binary Words and Operations

**Brief:** Implement the following functions in Python. Use numpy to ensure that all variables and values are treated as 32-bit integers. These functions are defined in the Secure Hash Standard (see page 10). Document each function with a clear docstring, explain its purpose and behaviour in Markdown, and test it with appropriate examples to verify correctness.

### Problem 1 Introduction:

The Secure Hash Standard (FIPS 180-4) defines a family of cryptographic hash algorithms, including **SHA-224, SHA-256, SHA-384,** and **SHA-512**. These algorithms use carefully designed bitwise operations to achieve diffusion and non-linearity, which are critical for cryptographic security.
This section of the Problems.ipynb document provides detailed documentation for the core bitwise operations used in the SHA-256 cryptographic hash algorithm, as specified in FIPS PUB 180-4 (Secure Hash Standard). The functions in this section (Problem 1) implement the operations which form the fundamental building blocks of SHA-256's compression function and message schedule expansion, as defined in the Secure Hash Standard.

**SHA-256 Core Functions - Comprehensive Documentation** 

SHA-256 achieves cryptographic security through:

* Diffusion: Small input changes cause large output changes
* Non-linearity: Complex relationships between inputs and outputs
* Avalanche effect: One bit change affects approximately 50% of output bits

The functions in this section implement the following primitives as defined in **Section 4 of the Secure Hash Standard**:

* u_32: Type conversion which ensures correct 32-bit arithmetic
* Ch, Maj, Parity: Logical functions which provide non-linear mixing
* Sigma functions: Rotation and shift operations which provide diffusion

#### **u_32(x)**
**Purpose:**
Enforces 32-bit unsigned integer arithmetic required by the SHA-256 specification - essentially this function ensures all arithmetic and bitwise operations stay within 32-bit unsigned integer bounds. All SHA-256 operations must use modulo 2^32 arithmetic to ensure consistent, reproducible hash values across all platforms and implementations- essentailly, all SHA operations must use unsigned 32-bit integers to produce correct hash values

**Why This Is Essential:** Python’s integers are unbounded by default, which would produce incorrect SHA-256 hashes. SHA-256 uses modulo 2^32 arithmetic for every operation which forces values into a 32-bit type. 

This function ensures:

* Correct overflow behavior: Values wrap around at 2^32−1 (4,294,967,295)
* Proper masking: All values stay within 32-bit bounds
* Platform consistency: Same results on all systems
* Specification compliance: Matches FIPS 180-4 requirements exactly

**Usage in SHA-256**

This function is the foundation of all SHA-256 operations:

* Applied to inputs before any bitwise operation
* Applied to intermediate results during computation
* Applied to final outputs to ensure correct format
* Must be used consistently throughout the algorithm

Below, the u_32(x) function implements this logic by using the NumPy library's uint32 type to convert a value to an unsigned 32-bit integer. It’s used here to ensure that integer values are:
* exactly 32 bits wide
* non-negative

As described in NumPy's official documentation, uint32 stores exactly 32 bits, meaning it can represent values from 0 to 2^32−1 (4,294,967,295). NumPy ensures that a value x is in the representable range (a 32-bit unsigned integer) by applying modulo 2^32 arithmetic, which means forcing a number to fit into 32 bits by keeping only its lowest 32 bits if the value goes out of range (i.e. once x goes past 2^32 − 1). Essentially, for any integer x, applying modulo 2^32 means computing: x mod 2^32, which gives the remainder after dividing x by 2^32. The result is always in the range: 0 ≤ x mod 2^32 < 2^32

Specific cases:
* If x is a negative value: NumPy applies modulo 2^32 wrapping. −1 wraps to 2^32 − 1 (4,294,967,295)
* If x is larger than 2^32 − 1, NumPy discards higher-order bits. 2^32 + 10 wraps to 10


In [2]:
def u_32(x):
    """
    Purpose: convert a value to an unsigned 32-bit integer using NumPy's uint32 data type

    This function acts as a helper function to ensure that all SHA operations use
    unsigned 32-bit integers in order to produce correct hash values, as stated in
    the requirements of SHA-256's 32-bit modular arithmetic.
    This function must be applied consistently to prevent overflow and ensure 
    correct hash computation. 
    
    How the function works: The conversion wraps values (x) by applying modulo 2^32 and 
    retaining only the lowest 32 bits. Negative values and values larger than 2^32 - 1
    are automatically wrapped.

    Parameters
    ----------
    x : int
        Input integer value to be converted to an unsigned 32-bit integer
        
    Returns
    -------
    np.uint32
        Value of x when converted to an unsigned 32-bit integer with modulo 2^32 wrapping
    """

    # returns the value of x converted to an unsigned 32-bit integer using uint32 
    return np.uint32(x)

#### **Parity(x, y, z)**

**Purpose** 

Parity is a simple nonlinear combination used in some hashing and cryptographic constructions (although SHA-256 does not use Parity, it appears in SHA-1). It serves as a symmetric bitwise mixing function by computing x ⊕ y ⊕ z using the XOR bitwise operator, meaning each output bit is 1 if an odd number of inputs have that bit set. Essentially, the Parity function determines whether an odd or even number of bits are set at each bit position across three 32-bit inputs.

For each bit position:

* Output is 1 if an odd number (1 or 3) of inputs have 1
* Output is 0 if an even number (0 or 2) of inputs have 1

**Mathematical Definition:** Parity(x, y, z) = x ⊕ y ⊕ z

**Usage Context**

This Parity operation is used within the SHA-1 algorithm to introduce mixing and complexity during the hash computation process in SHA-1 rounds 20-39 and 60-79. Parity is not used SHA-256. It is included here for:

* Completeness: Understanding the SHA family
* Comparison: Contrast with SHA-256's Ch and Maj functions
* Educational value: Demonstrates simpler non-linear mixing

SHA-256 replaced Parity with the more complex **Ch** and **Maj** functions (defined later in this section) for stronger security.

**Implementation**

The Parity operation is implemented as a function Parity(x, y, z) in this section (Problem 1), and uses the bitwise XOR operation, as it is defined in the official python documentation, to determine whether an odd or even number of bits are set at each bit position across three 32-bit inputs (x, y, z). Parity(x, y, z) applies the XOR operation twice as follows:

* Step 1: x ⊕ y → Produces 1 where x and y differ
* Step 2: (x ⊕ y) ⊕ z → Produces 1 where the result differs from z

Each bit of the final result tells you the parity of the corresponding bits of x, y, and z. It creates a parity check: the final bit is 1 if an odd number of input bits are 1.

**Relationship to Other Functions**

* Simpler than Ch: Treats all inputs symmetrically (no selector)
* Simpler than Maj: No majority voting, just XOR
* Pure XOR: Fully reversible and linear
* Weaker security: This is why SHA-256 uses Ch and Maj instead

In [3]:
def Parity(x, y, z):
    """
    Purpose: compute the Parity function used in SHA-1.
    
    Implements the XOR parity operation as defined in FIPS 180-4 for SHA-1.
    This function performs bitwise XOR on three inputs to determine parity:
    returns 1 when an odd number of corresponding bits are set, 0 otherwise.
    
    The operation proceeds as follows:
    1. XOR compares corresponding bits of x and y:
       - Output 1 if bits differ (one 1, one 0)
       - Output 0 if bits match (both 0 or both 1)
    This result represents partial parity for x and y:
       - Bit is 0 → even number of 1s (0 or 2)
       - Bit is 1 → odd number of 1s (1)
    
    2. XOR then compares this result with z:
       - Output 1 if bits differ
       - Output 0 if bits match
    
    3. Final result shows parity across all three inputs:
       - Bit is 1 → odd number of 1s (1 or 3)
       - Bit is 0 → even number of 1s (0 or 2)

    Parameters
    ----------
    x, y, z : int
        32-bit integer values
        
    Returns
    -------
    np.uint32 
        Result of x ⊕ y ⊕ z
    
    """
    # returns the unsigned 32-bit integer result of the XOR of x, y, and z, which are converted to unsigned 32-bit integers using u_32() helper function
    return u_32(x) ^ u_32(y) ^ u_32(z)

#### **Ch(x, y, z)** — Choose Function

**Definition (FIPS 180-4):** Ch(x, y, z) = (x ∧ y) ⊕ (¬x ∧ z)

**Purpose**
Ch implements SHA-256's conditional selection function where x acts as a selector to "choose" bits from either y or z. This creates complex, non-linear relationships essential for cryptographic security.

**Implementation**

The Ch(x, y, z) function implements the Choose (Ch) function as defined in the Secure Hash Standard FIPS 180-4. It computes the conditional selection of a 32-bit value. The Ch() function chooses bits from either of the two input integers y and z, based on the value of input x. Essentially, this function uses x as a selector to choose bits from either y or z: 

* if a bit in x is 1, then the corresponding bit from y is chosen
* if a bit in x is 0, then the corresponding bit from x is chosen.

In the implementation of the Ch(x, y, z) function, this is done by using the AND bitwise operation on x and y, then using the NOT operation on x, before applying AND to x and z.

* First, AND compares each bit in both values being considered and only outputs 1 when both bits are 1. If only one bit is 1 or neither bits are 1, a 0 is output. Therefore where x has 1s, the corresponding bits from y are output, and where x has 0s, 0s are output. 
* Next, When performing the AND operation on x and z, the NOT operation is first used to flip all the bits in x (1→0, 0→1). When AND is performed on (NOT)x and z, the bits where x orignally had 0s (now 1s) outputs the corresponding bits of z, and the bits where x originally had 1s (now 0s) outputs 0s. 
* Finally, the XOR bitwise operation is performed on the resulting values of x AND y, and (NOT)x and z. XOR is used to compare two values by outputing 1 when bits are different. Therefore when XOR is applied, we get a binary number which shows which bits are different the results of both AND operations performed on x y and x z. However, due to the face that NOT was applied when calculating the result of x and z, but not when calculating the result of x and y, this results in the outputs of these operations being mutually exlusive - there is no overalp between the two: each result has 1s where the other has 0s and vice versa. So, in this situation, XOR simply acts like OR and combines the two results. XOR is used because...

As stated in the Secure Hash Standard, this produces an ouput which is very difficult to reverse-engineer as it depends on all three inputs in a complex way: small changes in x can cause the function to switch between y and z. This creates the avalanche effect crucial for cryptographic security.

**Usage in SHA-256** 

Used in the main compression function, rounds 0-63, where its applied to working variable e in each round

Purpose in algorithm: T1 = h + Σ₁(e) + Ch(e, f, g) + K[t] + W[t]

Where e, f, g are working variables: Ch selects between f and g based on e, which creates complex dependencies between state variables

**Cryptographic Properties**

* Non-linearity: Output depends on all three inputs in complex ways
* Avalanche effect: Small changes in x can completely change the final output (switches between y and z)
* Asymmetry: x has special role as selector (unlike Maj where all inputs are equal)
* Differential resistance: Prevents attackers from predicting how bit changes propagate

**Relationship to Other Functions**

* Different from Maj: Ch treats inputs asymmetrically (x is selector)
* Different from Parity: Ch is more complex, provides stronger mixing
* Complements Maj: SHA-256 uses both for different purposes
* Works with Σ₁: Applied together in compression function

In [4]:
def Ch(x, y, z):
    
    """
    Purpose: compute the Choose (Ch) function used in SHA-256.
    
    Implements the conditional selection operation from FIPS 180-4 Section 4.1.2.
    The Ch function uses x as a bitwise selector to choose bits from either y or z,
    creating non-linear mixing essential for SHA-256's security.
    
    Operation Mechanics
    -------------------
    For each of the 32 bit positions:
    - If bit in x is 1 → select corresponding bit from y
    - If bit in x is 0 → select corresponding bit from z
    
    Operation Proceedings
    --------------------
    1. Compute (x ∧ y): Extracts bits from y where x = 1
       - AND compares each bit in x and y
       - Outputs 1 only when both are 1
       - Result: y's bits where x = 1, zeros elsewhere
    
    2. Compute ¬x: Flip all bits in x (1→0, 0→1)
    
    3. Compute (¬x ∧ z): Extracts bits from z where x = 0
       - AND compares each bit in ¬x and z
       - Result: z's bits where x was 0, zeros elsewhere
    
    4. Combine with XOR: (x ∧ y) ⊕ (¬x ∧ z)
       - The two parts are mutually exclusive (no overlap)
       - XOR acts like OR, combining both selections
       - Result: Perfect bit-by-bit selection

    Parameters
    ----------
    x, y, z : int  
      32-bit integer values
        
    Returns
    -------
    np.uint32 
      Result of (x ∧ y) ⊕ (¬x ∧ z) converted into an unisgned 32-bit integer 
    
    """

    # the result of x AND y is combined (using XOR) with the result of (NOT) x AND z
    # the result is converted to an unsigned 32-bit integer value and returned
    return u_32((u_32(x) & u_32(y)) ^ (~u_32(x) & u_32(z)))

#### **Maj(x, y, z)** — Majority Function

**Purpose** 

Maj (the majority function), as defined in the Secure Hash Standard, is a bitwise Boolean function which makes up part of the SHA-256 compression function, where it is used in each round to update the hash state. It is used to mix bits in a non-linear and balanced way by outputting the majority bit of three input words at each bit position which imrpoves resistance to cryptanalysis. This is because the majority function is balanced and treats all inputs equally, but changing one input only changes the output where that input is the minority, and the relationship between inputs and outputs is not a simple and linear combination which makes it harder to decrypt. 

**Mathematical Definition:** Maj(x, y, z) = (x ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z)

**Implementation**

The Maj(x, y, z) function implements SHA-256's majority voting function Maj where each output bit represents the majority value (0 or 1) among the three input bits at that position:

* If 2 or more inputs have 1 → output is 1
* If 2 or more inputs have 0 → output is 0

Essentially what it does is compare each of the corresponding bits in x, y, and z and if 2 or more of the inputs have a 1, the output is 1. If 2 or more of the inputs have a 0, the output is 0. In other words: the bit value that appears most frequently is output - hence majority vote. To achieve this majority vote, the Maj(x, y, z) function (implemented below) follows these steps:

* Step 1: the AND bitwise operation is performed on (x and y), (x and z), and (y and z) respectfully. For each pair, each corresponding bit in the two values are compared: if the corresponding bits are the same, 1 is output. If they are different, 0 is output. This is essentially where the two values "vote together" for 1.
* Step 2: the first two results of these operations are combined using XOR: (x ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z), meaning each corresponding bit in the three results are compared and if the bits are the same, then 0 is ouput. If one or more bits are different, then the output is 1. 

Essentially XOR outputs 1 when there's an odd number of 1s in its inputs:

* Majority is 1 (2+ ones) → Odd pairs agree (1 or 3) → XOR outputs 1 
* Majority is 0 (2+ zeros) → Even pairs agree (0) → XOR outputs 0 

**Usage in SHA-256**

Maj is used in the main compression function, all 64 rounds and is applied to working variable a in each round

Purpose in algorithm: T2 = Σ₀(a) + Maj(a, b, c)

Where: a, b, c are working variables that Maj combines through majority voting, where the result updates the state in next round

* Ch: Applied to variables (e, f, g)
* Maj: Applied to variables (a, b, c)

**Cryptographic Properties**

* Symmetric: All three inputs treated equally (no special roles)
* Balanced: Output evenly distributed (50% ones, 50% zeros)
* Non-linear: Relationship between inputs and outputs is complex
* Smooth transitions: Changing one input affects output only where it's in minority
* Democratic: Consensus-based, no single input dominates

**Relationship to Other Functions**

* Different from Ch: Maj treats all inputs equally, meaning it is symmetric, while Ch is asymetric
* Different from Parity: Maj does majority voting, not just XOR as XOR alone is linear and vulnerable to attacks

SHA-256 uses both Ch and Maj for complete mixing because they provide different types of non-linearity: 
* Ch: Asymmetric, selector-based (x controls output)
* Maj: Symmetric, voting-based (all equal)


Works with Σ₀: as it will be discussed later in this section, Maj and Sigma0 are applied together in the compression function as using both ensures complete mixing of all working variables so there are no patterns or symmetries attackers can exploit, therefore rendering the result resistant to various cryptanalytic techniques

In [5]:
def Maj(x, y, z):
    """
    Purpose: compute the Majority (Maj) function used in SHA-256.
    
    Maj operates on three 32-bit value inputs (x, y, z) and outputs a single 32-bit value,
    which is the the result of a combination of bitwise AND and XOR operations used to determine “majority” 
    value among the three inputs.
    - If 2 or more inputs have 1 → output 1 (majority votes 1)
    - If 2 or more inputs have 0 → output 0 (majority votes 0)
    - The bit value appearing most frequently wins
    
    Parameters
    ----------
    x, y, z : int 
        32-bit integer values
        
    Returns
    -------
    np.uint32
        Result of (x ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z)
        
    """

    # returns the result of the XOR operation which combines the results of the AND operation performed on the unsigned 32-bit integers:
    # (x and y), (x and z), and (y and z)
    return u_32((u_32(x) & u_32(y)) ^ (u_32(x) & u_32(z)) ^ (u_32(y) & u_32(z)))

#### **rotr(x, n)** — Right Rotation

**Purpose:** perform a right rotation on a 32-bit value. rort(x, n) rotates bits instead of shifting in zeros which isn a core operation in SHA-256’s mixing steps. rotr(x, n) is different from a logical shift because instead of bits being dropped once they go out of range, the rotation wraps around the dropped bits and adds them back onto the beginning of the vale. This preserves all bit information while performing a diffusion to improve encryption, which is the main function of rotr(x, n). Rotations are a core building block of the **Σ** and **σ** functions (implemented later in this section), so this helper function rotr(x, n) (shown below) implements rotations to facilitate these functions.

**Why rotations matter:**
* They are nonlinear with respect to shifts
* They ensure every output bit depends on multiple positions of the input
* They preserve all bits (unlike shifts, which lose information)

**Implementation**

The function rotr(x, n) implements the rotation as follows: **(x≫n)∣(x≪(32−n))**

The rotr(x, n) implements the operation ROTR^n(x), as defined in the Secure Hash Standard, which is a circular shift where all bits are shifted to the right and added back onto the begging when they go out of range. This is how the function achieves this:

* The bits which are shifted out of range off the right end of the integer, using right shift (>>), wrap around to the left and are added back onto the beginning of the integer using by combining the right shift (>>) with a left shift (<<) operation using the OR bitwise operation "|". 
* This allows for all bits which are shifted off the right end to be added back onto the left end of the 32-bit value. This preserves all bit information while performing a diffusion to improve encryption

In [6]:
def rotr(x, n):
    """
    Purpose: perform a right rotation on a 32-bit value.
    
    This function implements ROTR^n(x) to perform a circular shift where all bits are shifted to 
    the right and added back onto the begging when they go out of range by combining
    the right shift (>>) with a left shift (<<) operation using the OR bitwise operation "|". 

    Parameters
    ----------
    x : int 
        32-bit integer value to rotate
    n : int 
        Number of positions to rotate right (0 ≤ n ≤ 32)
        
    Returns
    -------
    u_32((x >> n) | (x << u_32(32 - n))): 
        The rotated value of x by n bits converted to an unsigned 32-bit integer

    """

    # convert x to an unsigned 32-bit integer using u_32() helper funciton
    x = u_32(x)

    # return the 32-bit integer value of the circular right rotation of x by n bits
    return u_32((x >> n) | (x << u_32(32 - n)))

### **Sigma Functions**

The Secure Hash Standard defines four Sigma/sigma function which are used in the SHA-256 compression function to introduce strong diffusion and ensure that small changes in input values result in widespread changes in the internal state of the hash function which is essential for cryptographic security. Essentially, by combining multiple rotated versions of the same word, the functions ensure that each output bit depends on multiple input bits. The four sigma functions operate on 32-bit words using combinations of bitwise rotations (ROTR) and logical shifts (SHR) as specified in NIST FIPS 180-4.
The sigma functions are divided into two categories:

* **Uppercase Σ (Σ₀, Σ₁):** Used in the compression function to mix the working variables.
* **Lowercase σ (σ₀, σ₁):** Used in the message schedule expansion to spread message bits across rounds.

All sigma functions operate on unsigned 32-bit integers and must be evaluated using modulo 2^32 arithmetic, as required by the Secure Hash Standard.

#### **Sigma0(x)** — Uppercase **Σ₀**

Definition (FIPS 180-4): **(x)=ROTR^2(x) ⊕ ROTR^13(x) ⊕ ROTR^22(x)**

**Purpose:** Σ₀ is a function used in the SHA-256 compression function to non-linearly mix the working variable a. It is applied repeatedly inside the compression function.Its purpose is to provide strong diffusion during each round of compression, ensuring that changes to the internal state propagate quickly and unpredictably.

**Implementation** 

The Sigma0(x) function implements the Sigma0 operation as defined in the Secure Hash Standard FIPS 180-4. Sigma0 combines three different rotations in order to achieve strong bit diffusion. It does this by applying three different rotations to the input variable x using ROTR^n(), where n represents the number of rotations performed on the input value x, and then combines the results of these rotations using XOR. The Sigma0 function below implements this functionality with the following steps:

* Step 1: the first rotation ROTR^2, rotates the bits in x by 2: meaning each bit in x is shifted to the right by 2, and the bits that go out of range on the right side of the x value are wrapped around and added back onto the left side of the x value, therefore ensuring a circular right rotation where no bits are lost.
* Step 2: The second rotation used on x is ROTR^13, which shifts the bits in x by 13 and ensures that the values that go out of range are wrapped onto the beggining of x to preserve all bits. 
* Step 3: The third rotation used on x is ROTR^22, which shifts the bits in x by 22 and ensures that the values that go out of range are wrapped onto the beggining of x to preserve all bits. 
* Step 4: The results of these three rotations are combined using XOR. This compares all corresponding bits in all values and outpus 1 if there is an odd number of 1s present (1 or 3) at this bit across the three rotations and outputs 0 if there are an even number of 1s. 

As stated in the Secure Hash Standard, the implementation of Sigma0 combines three different rotations to provide strong bit diffusion in the working variable updates. The specific rotation amounts (2, 13, 22) were chosen to maximize diffusion and resist cryptanalysis, as the rotation amounts sum to 37, which is prime, ensuring good bit distribution across multiple rounds. 

**Usage in SHA-256**

The Secure Hash Standard specifies Σ₀ as part of the computation of the temporary value: **T2​=Σ0​(a)+Maj(a,b,c)**

**Cryptographic Properties**

**Relationship to Other Functions**


In [7]:
def Sigma0(x):
    """
    Purose: compute the Sigma0 (Σ₀) function for SHA-256 compression
    
    This function applies three different rotations to the input variable x using the helper 
    function rotr(x, n). Sigma0 takes the input, rotates it right by three different amounts 
    (2, 13, and 22), then applies XOR tp combine all three rotated versions.
       
    Parameters
    ----------
    x : int
        32-bit integer value
        
    Returns
    -------
    np.uint32: 
        Result of ROTR^2(x) ⊕ ROTR^13(x) ⊕ ROTR^22(x)

    """
    
    # convert the input value x to an unsigned 32-bit value to ensure it is handled correctly 
    x = u_32(x)
    
    # returns the result of performing 3 rotations on input x using helper method rotr()
    # a rotation of 2, 13, and then 15 are applied to x using rotr()
    # and then using XOR the three results are compared to produce the final 32-bit value result
    return rotr(x, 2) ^ rotr(x, 13) ^ rotr(x, 22)

#### **Sigma1(x)** — Uppercase **Σ₁**
Definition (FIPS 180-4): **(x)=ROTR^6(x) ⊕ ROTR^11(x) ⊕ ROTR^25(x)**

**Purpose:**

Σ₁ is a function used in the SHA-256 compression function to non-linearly mix the working variable e in each round of SHA-256's main compression loop. Like Σ₀, it is applied repeatedly inside the compression function and it is intended to work complementarily with Σ₀ to provide enhanced bit diffusion during each round of compression, ensuring that changes to the internal state propagate quickly and unpredictably.

**Implementation**

This function implementing Sigma1 follows the same logic as the function implementing Sigma0: three different rotations are applied to the input variable x using the helper function rotr(x, n), where n represents the number of rotations performed on the input value x, and then the results of these rotations are combined using XOR. Sigma1 implements this functionality as follows:

* Step 1: The first rotation ROTR^6 is applied to x - it rotates the bits in x by 6
* Step 2: The second rotation used on x is ROTR^11, which shifts the bits in x by 11
* Step 3: The third rotation used on x is ROTR^25, which shifts the bits in x by 25
* Step 4: The results of these three rotations are combined using XOR. This compares all corresponding bits in all values and outpus 1 if there is an odd number of 1s present (1 or 3) at this bit across the three rotations and outputs 0 if there are an even number of 1s. 

The rotation amounts (6, 11, 25) differ from Sigma0 to ensure different diffusion patterns. These amounts 6, 11, and 25, were specifically chosen by NIST through extensive cryptanalysis to resist known attack vectors.

**Usage in SHA-256** 

It provides diffusion and non-linearity during the update of the internal state and is part of the computation of: T1​=h+Σ1​(e)+Ch(e,f,g)+Kt​+Wt​

This ensures that the evolution of the state depends on multiple rotated views of e.


In [8]:
def Sigma1(x):
    """
    Purpose: compute the Sigma1 (Σ₁) function for SHA-256 compression
    
    This function applies three different rotations to the input variable x using the helper 
    function rotr(x, n). Sigma1 takes the input, rotates it right by three different amounts 
    (6, 11, and 25), then applies XOR tp combine all three rotated versions.
    
    Parameters
    ----------
    x : int 
        32-bit integer value
        
    Returns
    -------
    np.uint32: 
        Result of ROTR⁶(x) ⊕ ROTR¹¹(x) ⊕ ROTR²⁵(x)
        
    """

    # convert the input value x to an unsigned 32-bit value to ensure it is handled correctly 
    x = u_32(x)

    # return the result of performing 3 rotations of 6, 11, and 25, on input x using helper method rotr() and then extracting a final value with XOR
    return u_32(rotr(x, 6) ^ rotr(x, 11) ^ rotr(x, 25))

#### **sigma0(x)** — Lowercase **σ₀**
Small mixing function using rotations and shifts: **(x)=ROTR^7(x) ⊕ ROTR^18(x) ⊕ (x≫3)**

Used when expanding the message schedule array.

**Purpose:** Mix input words to create additional message schedule values & introduce nonlinearity early, before compression

**Note:** uses a logical right shift in addition to rotations.

Works in conjunction with sigma1 in the message schedule equation: W[t] = σ₁(W[t-2]) + W[t-7] + σ₀(W[t-15]) + W[t-16]

In [9]:
def sigma0(x):
    """
    Function's purpose: compute the sigma0 (σ₀) function for SHA-256 message schedule.
    
    This function implements the sigma0 function as defined in the Secure Hash Standard
    FIPS 180-4, which is used to generate words 16-63 of the message schedule from the
    original 16 words of the message block. sigma0 applies two different rotations to 
    the input variable x using ROTR^n(x), where n represents the number of rotations 
    performed on the input value x, and then uses the right shift bitwise operation 
    (n >> x) to shift the bits of x to the right, where n represents the number of bit 
    shifts performed on x. Then, the three results of these operations are combined
    using XOR.
        - The first rotation performed on x is ROTR^7 which rotates the bits in x by 7: 
        meaning each bit in x is shifted to the right by 7, and the bits that go out of 
        range on the right side of the x value are wrapped around and added back onto the 
        left side of the x value, therefore ensuring a circular right rotation where no 
        bits are lost.
        - The second rotation used on x is ROTR^18, which shifts the bits in x by 18
        and ensures that the values that go out of range are wrapped onto the beggining
        of x to preserve all bits. 
        - The third operation performed on x is a right shift of 3, which sifts all bits
        in x to the right by an amount of 3, causing the bits that go out of range to be
        lost and three 0s are added on to the start of x to pad the value and preserve its
        32-bit state. 
        - The final result is achieved by using XOR to combine the result of each of the 
        three operations performed on x. This operation produces an output by comparing
        the corresponding bits in all three values and outpus 1 if there is an odd number 
        of 1s present (1 or 3) at this bit across the three rotations and outputs 0 if 
        there are an even number of 1s. 
    sigma0 is set apart from both uppercase Sigma operations by a critical distinction:
    The final operation is right SHIFT (>>), not rotation. This means the top 3 bits become 0, 
    which introduces asymmetry that enhances the avalanche effect.
    
    Parameters
    ----------
    int x : 32-bit integer value
        
    Returns
    -------
    np.uint32: Result of ROTR⁷(x) ⊕ ROTR¹⁸(x) ⊕ SHR³(x)
    
    """

    # convert the input value x to an unsigned 32-bit value to ensure it is handled correctly 
    x = u_32(x)
    
    # return the result of performing the following operations on input x:
    # 2 rotations of 7, and then 18, using helper method rotr() and a right shift of 3 
    # the final 32-bit value result is found by using XOR to combine the three results from the operations
    return u_32(rotr(x, 7) ^ rotr(x, 18) ^ (x >> 3))

#### **sigma1(x)** — Lowercase **σ₁**
Small mixing function using rotations and shifts: **(x)=ROTR^17(x) ⊕ ROTR^19(x) ⊕ (x≫10)**

**Purpose:** Also part of the message schedule expansion. Helps propagate entropy from earlier message words into later ones.

In [10]:
def sigma1(x):
    """
    Function's purpose: compute the sigma1 (σ₁) function for SHA-256 message schedule

    This function implements the sigma1 function as defined in the Secure Hash Standard
    FIPS 180-4, which complements sigma0 in generating the extended 64-word message schedule.
    sigma1 applies two different rotations to the input variable x using ROTR^n(x), where 
    n represents the number of rotations performed on the input value x, and then uses 
    the right shift bitwise operation (n >> x) to shift the bits of x to the right, where 
    n represents the number of bit shifts performed on x. Then, the three results of these 
    operations are combined using XOR.
        - The first rotation performed on x is ROTR^17 which rotates the bits in x by 17: 
        meaning each bit in x is shifted to the right by 17, and the bits that go out of 
        range on the right side of the x value are wrapped around and added back onto the 
        left side of the x value, therefore ensuring a circular right rotation where no 
        bits are lost.
        - The second rotation used on x is ROTR^19, which shifts the bits in x by 19
        and ensures that the values that go out of range are wrapped onto the beggining
        of x to preserve all bits. 
        - The third operation performed on x is a right shift of 10, which sifts all bits
        in x to the right by an amount of 10, causing the bits that go out of range to be
        lost and ten 0s are added on to the start/left side of x to pad the value and preserve 
        its 32-bit state. 
        - The final result is achieved by using XOR to combine the result of each of the 
        three operations performed on x. This operation produces an output by comparing
        the corresponding bits in all three values and outpus 1 if there is an odd number 
        of 1s present (1 or 3) at this bit across the three rotations and outputs 0 if 
        there are an even number of 1s. 
    The combination of rotations and shifts ensures that each bit of the original message 
    influences many bits in the final hash.
    Like sigma0, sigma1 is set apart from both uppercase Sigma operations by a critical distinction:
    The final operation is right SHIFT (>>), not rotation. This means the top 10 bits become 0, 
    which introduces asymmetry that enhances the avalanche effect, therefore contributing to the 
    non-linear properties of the message schedule.
    
    Parameters
    ----------
    int x : 32-bit integer value
        
    Returns
    -------
    np.uint32: Result of ROTR¹⁷(x) ⊕ ROTR¹⁹(x) ⊕ SHR¹⁰(x)
    
    """
    # convert the input value x to an unsigned 32-bit value to ensure it is handled correctly 
    x = u_32(x)
    # return the result of performing the following operations on input x:
    # 2 rotations of 17, and then 19, using helper method rotr() and a right shift of 10
    # the final 32-bit value result is found by using XOR to combine the three results from the operations
    return u_32(rotr(x, 17) ^ rotr(x, 19) ^ (x >> 10))

### **How these functions work together**

Ch, Maj, Sigma0, and Sigma1, combine in each of the 64 rounds:

Simplified SHA-256 round: 

* T1 = h + Σ₁(e) + Ch(e, f, g) + K[t] + W[t]
* T2 = Σ₀(a) + Maj(a, b, c)

Update working variables

h = g

g = f

f = e

e = d + T1

d = c

c = b

b = a

a = T1 + T2

### **Problem 1 Tests**

#### ***Testing u_32() helper function***

In [11]:
print("=== Testing unsigned_32 ===")
value = u_32(np.uint32(0xffffffff) + np.uint32(1))
print("Input: 0xFFFFFFFF + 1")
print("32-bit wrapped output:", u_32(value))
print("Expected: 0x00000000\n")

=== Testing unsigned_32 ===
Input: 0xFFFFFFFF + 1
32-bit wrapped output: 0
Expected: 0x00000000



  value = u_32(np.uint32(0xffffffff) + np.uint32(1))


#### ***Testing Parity() function***

In [12]:
print("\n=== Testing Parity ===")
print("Input: 0b1010, 0b0101, 0b1100")
print("Output:", bin(Parity(np.uint32(0b1010), np.uint32(0b0101), np.uint32(0b1100))))
print("Expected: 0b1010 ^ 0b0101 ^ 0b1100 =", bin(0b1010 ^ 0b0101 ^ 0b1100))


=== Testing Parity ===
Input: 0b1010, 0b0101, 0b1100
Output: 0b11
Expected: 0b1010 ^ 0b0101 ^ 0b1100 = 0b11


#### ***Testing Ch() function***

In [13]:
print("\n=== Testing Ch ===")
print("If x bit = 1 → choose y bit")
print("If x bit = 0 → choose z bit\n")
print("Example:")
print("x = 0xFFFFFFFF, y = 0x12345678, z = 0x87654321")
print("Output:", hex(Ch(np.uint32(0xffffffff), np.uint32(0x12345678), np.uint32(0x87654321))))
print("Expected: y → 0x12345678")


=== Testing Ch ===
If x bit = 1 → choose y bit
If x bit = 0 → choose z bit

Example:
x = 0xFFFFFFFF, y = 0x12345678, z = 0x87654321
Output: 0x12345678
Expected: y → 0x12345678


#### ***Testing Maj() function***

In [14]:
print("\n=== Testing Maj ===")
print("Example bits: x=1, y=1, z=0 → majority is 1")
print("Example bits: x=0, y=0, z=1 → majority is 0\n")
print("Input: 0b1010, 0b1100, 0b1000")
print("Output:", bin(Maj(np.uint32(0b1010), np.uint32(0b1100), np.uint32(0b1000))))
print("Expected majority:", bin(0b1000))


=== Testing Maj ===
Example bits: x=1, y=1, z=0 → majority is 1
Example bits: x=0, y=0, z=1 → majority is 0

Input: 0b1010, 0b1100, 0b1000
Output: 0b1000
Expected majority: 0b1000


#### ***Testing rotr() helper function***

In [15]:
print("\n=== Testing rotr ===")
print("Input: x = 0b0001, n = 1")
print("Output:", bin(rotr(np.uint32(0b0001), 1)))
print("Expected: 0x80000000 bit pattern")
print("\nInput: x = 0x12345678, n = 4")
print("Output:", hex(rotr(np.uint32(0x12345678), 4)))
print("Expected:", hex(0x81234567))


=== Testing rotr ===
Input: x = 0b0001, n = 1
Output: 0b10000000000000000000000000000000
Expected: 0x80000000 bit pattern

Input: x = 0x12345678, n = 4
Output: 0x81234567
Expected: 0x81234567


#### ***Testing Sigma0() function***

In [16]:
print("\n=== Testing Sigma0 ===")
x = np.uint32(0x6a09e667)
print("Input: x =", hex(x))
print("Output:", hex(Sigma0(x)))
print("This mixes the word using 3 rotations.")


=== Testing Sigma0 ===
Input: x = 0x6a09e667
Output: 0xce20b47e
This mixes the word using 3 rotations.


#### ***Testing Sigma1() function***

In [17]:
print("\n=== Testing Sigma1 ===")
x = np.uint32(0xbb67ae85)
print("Input: x =", hex(x))
print("Output:", hex(Sigma1(x)))
print("Used every round inside SHA-256 compression.")


=== Testing Sigma1 ===
Input: x = 0xbb67ae85
Output: 0x758db092
Used every round inside SHA-256 compression.


#### ***Testing sigma0() function***

In [18]:
print("\n=== Testing sigma0 ===")
x = u_32(0x428a2f98)
print("Input:", hex(x))
print("Output:", hex(sigma0(x)))
print("This is used in the message schedule.")


=== Testing sigma0 ===
Input: 0x428a2f98
Output: 0xb332410e
This is used in the message schedule.


#### ***Testing sigma1() function***

In [19]:
print("\n=== Testing sigma1 ===")
x = np.uint32(0x71374491)
print("Input:", hex(x))
print("Output:", hex(sigma1(x)))
print("Also used in message schedule expansion.")


=== Testing sigma1 ===
Input: 0x71374491
Output: 0x4ac6db6c
Also used in message schedule expansion.


## Problem 2: Fractional Parts of Cube Roots
**Brief:** Use numpy to calculate the constants listed at the bottom of page 11 of the Secure Hash Standard, following the steps below. These are the first 32 bits of the fractional parts of the cube roots of the first 64 prime numbers.
1. Write a function called primes(n) that generates the first n prime numbers.
2. Use the function to calculate the cube root of the first 64 primes.
3. For each cube root, extract the first thirty-two bits of the fractional part.
4. Display the result in hexadecimal.
5. Test the results against what is in the Secure Hash Standard.



## Problem 2 Introduction:

We've seen how the functions from Problem 1 work in the SHA-256 compression function. What other aspects are there to the compression function? This is the subject of Problem 2: Implementing functions which generate the 64 constant values (K₀ through K₆₃) used in the SHA-256, as defined in the Secure Hash Standard. SHA-256 operates on 32-bit words and relies on round constants (K0–K63) in order to ensure maximum diffusion and unpredictability so there are no patterns that attackers could exploit and no hidden weaknesses. 

Cryptographic constants must be verifiable to earn trust, unpredictable-looking to prevent structural weaknesses, and free from suspicion to ensure confidence in the algorithm. In order to achieve this, cube roots of primes are used because they are a trustworthy source of “random-looking” values that cannot be secretly manipulate.
* Primes are neutral: They are well-known, fixed numbers with no hidden structure. Anyone can verify which primes are used.
* Cube roots look random: Taking the cube root of a prime produces an irrational number with no repeating pattern, so its bits don’t follow predictable rules.
* Fractional parts mix well: The decimal part of these cube roots appears random, making it ideal for spreading small input changes across many bits.

Since the process is public and deterministic, it is verifiable that the constants weren’t selected in order to hide weaknesses.

The 64 constants (K0–K63) are derived using the formula: **∛(pᵢ)** 

Where: pᵢ is the i-th prime number (p₀=2, p₁=3, p₂=5, ..., p₆₃=311) and ∛ denotes cube root

From the result of ∛(pᵢ), the fractional part is extracted (digits after the decimal point), and from that fractional part the first 32 bits are extracted and converted into hexidecimal. This process is repeated for each of the first 64 prime numbers, giving us the 64 constants K0–K63.

To implement this logic and calculate these constants, we have implemented four different functions to break down the process into simple steps. These four function are:

* primes(n): gets the first 64 prime numbers
* cube_roots(primes): calculates the cube roots of the first 64 prime numbers
* frac_32(): gets the fractional part of the cube roots of the first 64 prime numbers
* hex_conversion(constants): converts the fractional parts into hexidecimal

#### **primes(n)** Function:

**Purpose:** This function implements the first step in the process for creating SHA-256's 64 round constants by generating the first 64 prime numbers (2 through 311).

**Implementation**

To implement this logic, this function uses a trial division algorithm with the following steps:

* Step 1: Start with the number 2
* Step 2: Assume the current number is prime
* Step 3: Test if the current number is divisible by any integer from 2 up to (but not including) the number itself
* Step 4: If no divisors are found, the candidate is prime and is added to the list of primes. If any divisor is found, the candidate is not prime and is discarded. 
* Step 5: Increment the number, i.e. move to the next number incrementally 
* Step 6: Repeat the process until all n (in this case 64) primes have been generated

**Why Primes for SHA-256**

Prime numbers are used because they:
* Are mathematically fundamental with no hidden structure
* Cannot be factored (by definition)
* Provide transparent, verifiable constant generation
* Create "nothing-up-my-sleeve" numbers (no backdoors)
* Ensure constants appear random but are reproducible

**Relation to other functions**

The primes(n) function generates the first 64 prime numbers that will be used to compute cube roots using the cube_roots(primes) function, facilitating the first step in the constant generation process. 

In [20]:
def primes(n):
    """"
    Purpose: generate the first n prime numbers
    
    This function implements a trial division algorithm to find prime numbers sequentially.

    Parameters
    ----------
    n : int
        Number of prime numbers to generate (SHA-256 uses n=64)
        
    Returns
    -------
    primes : list of int
        First n prime numbers in ascending order
        For n=64: [2, 3, ..., 311]

    """

    # list to store prime numbers
    primes = []

    # start from the first prime
    num = 2

    # start loop that runs untill all n prime numbers have been found
    while len(primes) < n:
        # flag to keep primality of numbers
        is_prime = True
        # test divisibility of current num by all integers from 2 up to (num - 1)
        for i in range(2,num):
            # If num is divisible by i, it is not a prime number
            if num % i == 0:
                # change the flag to mark num as a non-prime number
                is_prime = False
                # exit loop early once number is found to be non-prime
                break
        # check flag to see if num is prime
        if is_prime:
            # add the prime number to the list
            primes.append(num)
        # move on to the next integer to test
        num += 1
    # return the list of the first n prime numbers
    return primes

#### **cube_roots(primes)** Function:

**Purpose:** This function implements the second step in the process for creating SHA-256's 64 round constants by calculating the cube roots (∛x) of the first 64 prime numbers (2 through 311), which were generated using the previous function **primes(n)**.

**Implementation**

The cube root of a number x is the value y such that y³ = x. Therefore the cube root of a number is calculated using the formula: (∛x)

To implement this logic, the cube_roots(primes) function uses NumPy's cbrt function to calculate the cube root of each value in the "primes" list. NumPy's cbrt function is vectorised, meaning it works by applying this formula: cbrt(x)=x^1/3 to every element in the primes list independently, so no explicit python loop is required. cbrt() automatically handles integers, floats, and negative values, and produces a NumPy array of floating-point values.

The cube_roots(primes) function implements this logic using the following steps: 

* Step 1: Take in a list "primes" of n prime numbers, in this case the first 64 prime numbers
* Step 2: Create an empty list "cube_roots" intended to store the calculated cube root values
* Step 3: Compute the cube roots of each prime in "primes" by calling cbrt(), and store the resulting array in "cube_roots"
* Step 4: Return the NumPy array "cube_roots" containing the cube roots of all values in the input list "primes"

**Why Primes for SHA-256**

SHA-256 uses cube roots of primes for the K constants because they produce irrational numbers which have infinite, non-repeating decimal expansions, making their fractional parts appear random while remaining reproducible. Using cube roots instead of square roots allows SHA-256 to ensure that there is no relationship between different constant sets.

**Relation to other functions**

The cube_roots(primes) function generates the first cube roots of the first 64 prime numbers, provided by the primes(n) function. The first 32 bits of the fractional part of these cube roots will be extracted using the frac_32() function, which is the next step in the constant generation process. 

In [21]:
def cube_root(primes):
    """
    Purpose: Compute the cube roots of a list of prime numbers
    
    Calculates ∛p for each prime p using NumPy's cbrt() function. 
    
    Parameters
    ----------
    primes : list of int
        List of prime numbers (typically first 64 primes)
        
    Returns
    -------
    numpy.ndarray of float64
        Cube root of each prime

    """

    # create list to store cube_roots
    cube_roots = [] 
    # calculate cube roots of values in "primes" list using NumPy's cbrt()
    cube_roots = np.cbrt(primes)
    # return the NumPy array of cube roots
    return cube_roots

#### **frac_32() Function:**

**Why in SHA-256:** Using the fractional part ensures a uniform, non-repeating bit pattern for cryptographic mixing.

**Purpose:** This function implements the third and final step in the process for generating SHA-256's 64 round constants by extracting the first 32 bits of the fractional part of the cube roots of the first 64 prime numbers calculated by the cube_roots(primes) function. Essentially, this function utilises the functions implemented previously (primes(n), cube_roots(primes)) to generate the 64 K constants, each of which are used in a round of SHA-256's compression function.

**Implementation**

The function frac_32() extracts the fractional part of the cube roots using np.modf() by returning the value of a floating point value split into two parts: (fractional, integer). Next, the frac_32() function scales the fractional part to 32 bits and applies NumPy's floor() function to round each fractional value down to the nearest integer which ensures no rounding up occurs as this could change bit patterns and produce incorrect constants. Finally the 32-bit fractional part is cast to an unsigned 32-bit integer using NumPy's uint32 data type.

The function frac_32() implements this logic using the following steps:

* Step 1: Call the primes(n) function to generate the first 64 prime numbers
* Step 2: Call the cube_roots(primes) function to calculate the cube roots of the first 64 prime numbers
* Step 3: Split each cube root into its fractional part and integer part using NumPy's modf() function. The fractional parts are stored in "frac" while the integer parts are disgarded using a blank variable "_". 
* Step 4: The fractional parts are scaled, meaning the first 32 bits of each fractional part are extracted by using modulo 2^32 on the "frac" variable
* Step 5: Remove any remaining fractional bits from the values in "frac" using NumPy's floor() function
* Step 6: Convert the frac variables to 32-bit unsigned integers by using NumPy's uint32 data type
* Step 7: Returns a NumPy array containing the 64 constants derived from the fractional parts of the cube roots of the first 64 prime numbers, stored as unsigned 32-bit integers

**Usage in SHA-256**

The 64 K constants, as defined in SHA-256, are added to the working variables in each round of the SHA-256 compression function:

In round t (0 ≤ t ≤ 63): **T1 = h + Σ₁(e) + Ch(e, f, g) + K[t] + W[t]**

Each round uses a different constant K[t] to prevent patterns/symmetries in the algorithm, ensure each round has unique behavior, and maximise diffusion across all 64 rounds, which overall enhances security.

**Relation to other functions**

The frac_32() function generates the 64 K constants by utilising the cube_roots(primes) function to generate the cube roots of the first 64 prime numbers, provided by the primes(n) function.

In [22]:
def frac_32(n):
    """
    Purpose: Generate the 64 K constants of SHA-256's compression function

    This function extracts the first 32-bits of the fractional parts of the
    cube roots of the first 64 prime numbers, and converts them to unsigned
    32-bit integers. 
    
    Returns
    -------
    numpy.ndarray of np.uint32
        Array of 64 K constants as 32-bit unsigned integers

    """
    # get the first 64 prime numbers 
    p = primes(n)
    # get the cube roots of the first 64 prime numbers
    cube_roots = cube_root(p)
    # extract the fractional part of the cube roots using NumPy's modf()
    frac, _= np.modf(cube_roots)
    # generate the K constants by extracting the first 32 bits of the fractional parts using mod 2^32 and NumPy's floor()
    # convert these to unsigned 32-bit integers using NumPy's uint32
    constants = np.floor(frac * (2**32)).astype(np.uint32)
    # return the constants
    return constants 

#### **hex_conversion(constants) Function:**

**Purpose:** Helper function that converts the 32-bit integer k constants generated by the functions implemented above (primes(n), cube_roots(primes), frac_32()), into 8-character hexadecimal strings. This allows us to verify the correctness of the generated constants by comparing them to the official K constants specified in the Secure Hash Standard, which are listed in hexidecimal format. 

**Implementation**

This function converts unsigned 32-bit integers into zero-padded hexadecimal strings for verification against the Secure Hash Standard K constant values.
The hex_conversion() function implements using the following steps:

* Step 1: Read in the NumPy array of 32-bit integer constants generated using the frac_32() function
* Step 2: Loop over the constants array and convert each uint32 to hex using Python's hex() function
* Step 3: Remove the '0x' prefix which is generated when converting to hex, using string slicing [2:]
* Step 4: Zero-pad to 8 characters using zfill(8) to ensure all values are 8 characters, as slicing can remove the first character of a value if its a 0
* Step 5: Return the list of constants as formatted hex strings

In [23]:
def hex_conversion(constants):
    """
    Purpose: Convert 32-bit constants to hexadecimal string representations

    This function transforms numpy uint32 array into zero-padded hexadecimal strings
    using hex conversion (hex()), slicing ([2:]), and zero-padding (zfill(8))

    Parameters
    ----------
    constants : numpy array of 32-bit unsigned integer constants
        
    Returns
    -------
    list of str
        Hexadecimal strings, each 8 characters long (no '0x' prefix)

    """

    # create array for storing converted constants
    hex_constants = []
    # loop over each constant in constants array
    for c in constants:
        # convert each constant to hexidecimal
        hex_constants.append(hex(c)[2:].zfill(8))
    # return the array of constants as hex strings
    return hex_constants

#### The official 64 K constants as defined in the Secure Hash Standard

In [24]:
K = np.array([
    "428a2f98", "71374491", "b5c0fbcf", "e9b5dba5", "3956c25b", "59f111f1", "923f82a4", "ab1c5ed5",
    "d807aa98", "12835b01", "243185be", "550c7dc3", "72be5d74", "80deb1fe", "9bdc06a7", "c19bf174",
    "e49b69c1", "efbe4786", "0fc19dc6", "240ca1cc", "2de92c6f", "4a7484aa", "5cb0a9dc", "76f988da",
    "983e5152", "a831c66d", "b00327c8", "bf597fc7", "c6e00bf3", "d5a79147", "06ca6351", "14292967",
    "27b70a85", "2e1b2138", "4d2c6dfc", "53380d13", "650a7354", "766a0abb", "81c2c92e", "92722c85",
    "a2bfe8a1", "a81a664b", "c24b8b70", "c76c51a3", "d192e819", "d6990624", "f40e3585", "106aa070",
    "19a4c116", "1e376c08", "2748774c", "34b0bcb5", "391c0cb3", "4ed8aa4a", "5b9cca4f", "682e6ff3",
    "748f82ee", "78a5636f", "84c87814", "8cc70208", "90befffa", "a4506ceb", "bef9a3f7", "c67178f2"
], dtype=str)

### **Problem 2 Testing**

#### ***Testing primes(n)***

In [25]:
print("\n==================== TEST 1: PRIME NUMBER GENERATION ====================")
print("Purpose: Verify that the first 64 prime numbers are generated correctly.")

prime_nums = primes(64)

print("\nGenerated primes:")
print(prime_nums)
print(f"Number of primes generated: {len(prime_nums)} (expected: 64)")


Purpose: Verify that the first 64 prime numbers are generated correctly.

Generated primes:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311]
Number of primes generated: 64 (expected: 64)


#### ***Testing cube_roots(primes)***

In [26]:
print("\n==================== TEST 2: CUBE ROOT CALCULATION ====================")
print("Purpose: Compute cube roots of the first 64 primes as required by FIPS 180-4.")

cubes = cube_root(prime_nums)

print("\nCube roots of primes:")
print(cubes)


Purpose: Compute cube roots of the first 64 primes as required by FIPS 180-4.

Cube roots of primes:
[1.25992105 1.44224957 1.70997595 1.91293118 2.22398009 2.35133469
 2.57128159 2.66840165 2.84386698 3.07231683 3.14138065 3.33222185
 3.44821724 3.50339806 3.60882608 3.75628575 3.89299642 3.93649718
 4.0615481  4.14081775 4.1793392  4.29084043 4.36207067 4.4647451
 4.59470089 4.65700951 4.68754815 4.7474594  4.77685618 4.83458813
 5.0265257  5.07875308 5.15513674 5.18010147 5.30145919 5.32507402
 5.39469071 5.46255557 5.50687845 5.57205466 5.63574079 5.65665283
 5.75896522 5.77899657 5.81864787 5.83827246 5.95334181 6.06412699
 6.1001702  6.11803317 6.15344949 6.20582179 6.22308425 6.30799355
 6.35786118 6.40695858 6.45531481 6.47127363 6.51868392 6.54991162
 6.56541443 6.6418522  6.74599671 6.77516895]


#### ***Testing frac_32()***

In [27]:
print("\n==================== TEST 3: FRACTIONAL PART EXTRACTION/CONSTANTS GENERATION ====================")
print("Purpose: Separate fractional and integer parts of cube roots.")
print("Only the fractional part is used to derive SHA-256 constants.")

constants = frac_32(64)

print("\n32-bit integer K constants generated using the first 64 prime numbers:")
print(constants)


Purpose: Separate fractional and integer parts of cube roots.
Only the fractional part is used to derive SHA-256 constants.

32-bit integer K constants generated using the first 64 prime numbers:
[1116352408 1899447441 3049323471 3921009573  961987163 1508970993
 2453635748 2870763221 3624381080  310598401  607225278 1426881987
 1925078388 2162078206 2614888103 3248222580 3835390401 4022224774
  264347078  604807628  770255983 1249150122 1555081692 1996064986
 2554220882 2821834349 2952996808 3210313671 3336571891 3584528711
  113926993  338241895  666307205  773529912 1294757372 1396182291
 1695183700 1986661051 2177026350 2456956037 2730485921 2820302411
 3259730800 3345764771 3516065817 3600352804 4094571909  275423344
  430227734  506948616  659060556  883997877  958139571 1322822218
 1537002063 1747873779 1955562222 2024104815 2227730452 2361852424
 2428436474 2756734187 3204031479 3329325298]


#### ***Testing hex_conversion***

In [28]:
print("\n==================== TEST 5: HEXADECIMAL CONVERSION ====================")
print("Purpose: Convert constants to hexadecimal format for comparison with FIPS values.")

hex_constants = hex_conversion(constants)

print("\nGenerated hex constants:")
print(hex_constants)

print("\nThe generated constants and official constants are equal: ", np.array_equal(K, hex_constants))


Purpose: Convert constants to hexadecimal format for comparison with FIPS values.

Generated hex constants:
['428a2f98', '71374491', 'b5c0fbcf', 'e9b5dba5', '3956c25b', '59f111f1', '923f82a4', 'ab1c5ed5', 'd807aa98', '12835b01', '243185be', '550c7dc3', '72be5d74', '80deb1fe', '9bdc06a7', 'c19bf174', 'e49b69c1', 'efbe4786', '0fc19dc6', '240ca1cc', '2de92c6f', '4a7484aa', '5cb0a9dc', '76f988da', '983e5152', 'a831c66d', 'b00327c8', 'bf597fc7', 'c6e00bf3', 'd5a79147', '06ca6351', '14292967', '27b70a85', '2e1b2138', '4d2c6dfc', '53380d13', '650a7354', '766a0abb', '81c2c92e', '92722c85', 'a2bfe8a1', 'a81a664b', 'c24b8b70', 'c76c51a3', 'd192e819', 'd6990624', 'f40e3585', '106aa070', '19a4c116', '1e376c08', '2748774c', '34b0bcb5', '391c0cb3', '4ed8aa4a', '5b9cca4f', '682e6ff3', '748f82ee', '78a5636f', '84c87814', '8cc70208', '90befffa', 'a4506ceb', 'bef9a3f7', 'c67178f2']

The generated constants and official constants are equal:  True


## Problem 3: Padding

**Brief:** Write a generator function block_parse(msg) that processes messages according to section 5.1.1 and 5.2.1 of the Secure Hash Standard. The function should accept a bytes object called msg. At each iteration, it should yield the next 512-bit block of msg as a bytes object. Ensure that the final block (or final two blocks) include(s) the required padding of msg as specified in the standard. Test the generator with messages of different lengths to confirm proper padding and block output.

###  Problem 3 Introduction

Before the SHA-256 compression function can process a message, it must be prepared in a specific way - everything needs to be the right size and format. This section details the first two aspects of this preprocessing: padding and parsing. 

To give some context, the SHA-256 compression function can process input data in fixed-size chunks of 512 bits called blocks, but messages often arrive in unpredictable/inconsistent lengths. Therefore all messages need to be processed and converted into a format that the SHA-256 compression function can read: blocks of 512 bits. To achieve this, there a two main steps to follow:

* Padding the message: extend the message to a specific length by padding with k 0s (Section 5.1.1)
* Parsing the message: divide the padded message into fixed-size (512 bits) blocks (Section 5.2.1)

**Importance** 

These two stages are essential parts of the compression function process, and are needed for the following reasons: 

* Needed for processing: Without this, short message couldn't be processed and long messages couldn't be divided uniformly, rendering the compression function ineffective. It ensures every block is exactly 512 bits
* Ensures clear message representation: This guarantees that different messages never produce the same result after padding. 
* Encoding message length: Ensures the original length of the message is embedded into the padded data, which makes it bound to itself and preventing attacker from appending data and computing a valid hash. 
* Remains deterministic: The same message will always recieve the same padding and therefore produce the same hash. This ensures that all SHA-256 implementations are consistent/reproducible. 

**Padding:** Make the message length a specific multiple of 512 bits by applying the following process to a message that is l bits long:

* Add a "1" bit to the end of your message
* Add k zero bits, where k is chosen so that (l + 1 + k) equals 448 when you divide by 512 and look at the remainder
* Add a 64-bit representation of l (the original message length)

**Parsing:** Once padded, the message is broken down into n chunks (blocks) of 512 bits. Each 512-bit block is then subdivided into sixteen 32-bit words as: ***16 × 32 = 512***, meaning that each block i contains words: M₀⁽ⁱ⁾, M₁⁽ⁱ⁾, M₂⁽ⁱ⁾, ... M₁₅⁽ⁱ⁾.

To achieve padding and parsing, this section implements four functions which help in this process: 
 
* file_parse(): Converts a message into bits
* calculate_padding_length(message_length_bytes): Calculates the padding needed for a message
* create_padding_block(partial_block, message_length_bits): Apply the calculated padding to the message
* block_parse(msg): Parse the padded message by dividing it into n blocks of exactly 512 bits each

#### **file_parse(filepath) function:**

**Purpose:** This function reads in a file in binary mode and returns its contents as a bytes object. This ensures that the message is in the right format for the rest of the preprocessing stage to work correctly. 
Binary mode is used because it reads exact bytes without interpretation which avoids any text mode issues such as encoding issues, altered line endings, and only successfully processing text files. The benefits of binary mode are as follows:

* all data is preserved exactly as it is stored 
* performs exact byte-for-byte reading with no encoding conversions
* Works with any file type

Essentially using binary mode means a file’s contents are read exactly as stored, without modification, which ensures reproducible hash values.

**Implementation**

The file_parse(filepath) function implements the logic described above by taking in a filepath as input, and opening the file using python's context manager - 'with'. This is a python function that ensures resources are managed safely and automatically, i.e. no need to close a file explicitly, the context manager does it for you. The file is then read in binary mode by using 'rb' where r = read and b = binary. Once everything is read, the bytes contained in the file are returned. The file_parse(filepath) function uses the following steps to implement this logic:

* Step 1: Read in the filepath
* Step 2: Open the file using python's context manager 'with'
* Step 3: Read the file in binary mode
* Step 4: Read all bytes store them
* Step 5: Return stored bytes 

**Usage in SHA-256**

SHA-256 operates on raw bytes, not text, therefore all messages must be converted to/read as bytes. This makes the file conversion function file_parse(filepath) an essential step in the preprocessing stage.

**Relation to other functions**

This function prepares the content so it can be properly padded/parsed by the other functions implemented in this section.

In [29]:
def file_parse(filepath):
    """
    Purpose: Read a file in binary mode and return its contents as bytes

    Parameters
    ----------
    filepath : str
        Path to the file to read
        
    Returns
    -------
    bytes
        Complete file contents as a bytes object
    """

    # Open the file using context manager 'with'
    with open(filepath, 'rb') as f: # Read the file in binary mode

        # Read all bytes from the file
        msg = f.read()
    # Automatically close the file after reading

    # Return all bytes read from file
    return msg

#### **create_padded_blocks(partial_block, message_length_bits) function**

**Purpose** 

This function implements the SHA-256 message padding stage as specified in Section 5.1.1 of the Secure Hash Standard (FIPS 180-4). It is responsible for 
Its responsibility is to take the remaining bytes of a message that do not form a complete 512-bit block and generate the final padded block(s) required for SHA-256 processing. This function is implemented as a Python generator, meaning it yields padded blocks one at a time. This allows seamless integration with the message parsing stage and produces a continuous stream of valid SHA-256 input blocks.

The function ensures that the padding follows the SHA-256 specification exactly so that:

* A single '1' bit (represented as a byte 0x80) is appended immediately after the message
* The message is padded with k 0 bits where k is the smallest non-negative solution to (l + 1 + k) ≡ 448 (mod 512)
* The original message length (in bits) is appended as a 64-bit big-endian integer
* The final output consists of exactly one or two 512-bit (64-byte) blocks

**Implementation**

The create_padded_blocks() function receives the final partial block in need of padding, the original message length in bits, and the block size (64 bytes) from the block_parse() function. Based on how much space is available in the final block, it applies one of three padding scenarios defined by the SHA-256 standard:

* ***Scenario 1: Sufficient Space (≥9 bytes available)***

This scenario occurs when the partial block has at least 9 bytes of space remaining: 1 byte for 0x80 + 8 bytes for the length encoding

When it applies: Partial block is 55 bytes or less

What happens: All padding fits in a single 64-byte block

* ***Scenario 2: Tight Space (1-8 bytes available)***

This scenario occurs when the partial block has between 1 and 8 bytes of space remaining. There's room for the 0x80 byte but not enough room for the 8-byte length encoding.

When it applies: Partial block is between 56 and 63 bytes

What happens: Padding requires two 64-byte blocks

* ***Scenario 3: Perfect Fit / No Space (0 bytes available)***

This scenario occurs when the message length is exactly a multiple of 64 bytes or when the partial block is empty. There's no space in the current context for any padding.

When it applies: Message length is exactly 64, 128, 192, ... bytes (multiples of 64), i.e. the partial block extracted by block_parse() is empty 

What happens: A complete padding-only block is created

Note: This scenario also handles the empty message case (0 bytes), where the entire message consists of just this one padding block.

**Usage in SHA-256**

SHA-256 requires that:

* The total message length (after padding) is a multiple of 512 bits
* The final 64 bits encode the original message length in bits
* At least one bit of padding (the '1' bit) is always added

This function guarantees that these conditions are met. It ensures that the final padded block(s):

* Are exactly 64 bytes (512 bits) long
* Preserve the original message length in the final 8 bytes
* Contain the mandatory 0x80 padding byte
* Are valid inputs for the SHA-256 compression function

Because the function yields blocks one at a time, SHA-256 can process each padded block immediately without storing all blocks in memory. This is particularly important for: 

* Large files that would consume excessive memory if all blocks were stored
* Streaming applications where data arrives incrementally
* Memory-constrained environments

**Relation to other functions**

This function is called by block_parse(msg) after all full 64-byte blocks have been yielded. It receives input from block_parse(msg): 

* partial_block: The remaining bytes after all complete blocks (0-63 bytes)
* message_length_bits: Original message length in bits


It then applies one of three padding scenarios based on space available and yields an output to the caller of block_parse(msg) (via yield from in block_parse()): 

* One or two 64-byte padded blocks

***Integration:*** When used with yield from, its output is indistinguishable from blocks yielded directly by block_parse(), resulting in a single continuous stream of 512-bit blocks ready for SHA-256 processing.

In [30]:
def create_padded_blocks(partial_block, message_length_bits, BLOCK_SIZE):
    """
    Purpose: Create final padded block(s) for SHA-256.
    
    Parameters
    ----------
    partial_block : bytes
        Remaining message bytes (0-63 bytes)
    message_length_bits : int
        Original message length in bits
        
    Yields
    ------
    bytes
        One or two 64-byte padded blocks
    """

    remaining_space = BLOCK_SIZE - len(partial_block)
    
    # Scenario 1: At least 9 bytes available (1 for 0x80 + 8 for length)
    if remaining_space >= 9:
        padding_zeros = remaining_space - 1 - 8
        yield (partial_block + 
               bytes([0x80]) + 
               bytes([0x00] * padding_zeros) + 
               message_length_bits.to_bytes(8, byteorder='big'))
    
    # Scenario 2: Between 1 and 8 bytes available
    elif remaining_space >= 1:
        # First block: partial + 0x80 + zeros to fill
        yield partial_block + bytes([0x80]) + bytes([0x00] * (remaining_space - 1))
        # Second block: zeros + length
        yield bytes([0x00] * 56) + message_length_bits.to_bytes(8, byteorder='big')
    
    # Scenario 3: No space (empty partial block)
    else:
        yield (bytes([0x80]) + 
               bytes([0x00] * 55) + 
               message_length_bits.to_bytes(8, byteorder='big'))

### **yield_blocks(msg, no_bytes) Function**

**Purpose:** 

This function implements the message parsing stage as described in section 5.2.1 of the Secure Hash Standardand and is responsible for splitting the input message into complete 512-bit (64-byte) blocks and apply SHA-256 padding to them using the create_padded_blocks() helper function. Each block is then yielded one at a time, ready for the SHA-256 compression function. This function is implemented as a Python generator, meaning it uses python's 'yield' statement to produce values on demande instead of returning a full list after all processing is complete, which makes it very efficient for large messages. This mirrors how cryptographic hash functions operate on streams of data, as they must process data as it arrives.

**Implementation** 

To correctly split the input message into 64-byte blocks, the yield_blocks(msg, no_bytes) function uses a while loop to iterate through a message in increments of 64 bytes. Before the loop, two variables are initialised: 

* BLOCK_SIZE: initialised to the block size processed by SHA-256: 64 bytes. 
* A pointer variable 'position': which tracks the current position in the message. Essentially, it tracks how many bytes of the message have been processed so far, and increases in increments of 64 (BLOCK_SIZE). 

The while loop yields full blocks only. It does this by using a condition which ensures that only blocks that are exactly 64 bytes long are yielded. This is done by adding together the 'position' pointer (bytes) and the expected size of the next block 'BLOCK_SIZE' (64 bytes), and checking the result against the total length of the message (bytes). By doing this, the loop checks if the next block is a partial block. 

* If the loop condition is satisfied, then the 64 bytes of the message at the current position are yielded to the caller, and the position is incremented by 64 bytes (BLOCK_SIZE)

* If the loop condition is nto satisfied, i.e. the addition of position + BLOCK_SIZE exceeds the message length, that means that the next block is less than 64 bytes long and therefore a partial block. In this case, the loop ends and the current block is not yielded. 

Once the while loop is broken, it updates a variable partial_block with the position of that partial block in the message. Next, processing is delegated to another generator for padding: the creat_padded_blocks() function. Python's 'yield from' statement is used to iterate over all values produced by create_padded_blocks, and then yield each padded block as if it were yielded directly by block_parse. This means the caller sees a single continuous stream of blocks: both complete blocks and padded blocks in a single continuous sequence.

* Step 1: Initialise the BLOCK_SIZE to 64 bytes and message_length_bits to the total message length calculated in bits (required for SHA-256 padding)
* Step 2: Initialise the position pointer to 0 
* Step 3: Enter the parsing loop by checking whether position + BLOCK_SIZE is less than or equal to the total message length (in bytes), to guarantees that the next slice will be exactly 64 bytes long.
* Step 4: Yield a complete 64-byte block of the message using python's yield. At this point, the function pauses execution and hands the block to the caller, and resumes when the next block is requested.
* Step 5: Increment the position pointer by 64 bytes in preparation for the next iteration.
* Step 6: Exit the loop when a partial block is detected: fewer than 64 bytes remain. The partial block is not yielded at this stage.
* Step 7: Extract and store any remaining bytes in partial_block. This may be empty (if the message length is a multiple of 64
* Step 8: Delegate padding to a helper generator by passing the remaining bytes and message length to create_padded_blocks(). Python’s yield from statement is used to yield all padded blocks produced by the helper generator create_padded_blocks() as part of the same output sequence.


**Usage in SH-256** 

SHA-256 processes messages in 512-bit blocks, one at a time. This function ensures that all full blocks are passed directly to the compression function without modification, and that the final block (or blocks) include the correct padding and length encoding. The total output is always a valid sequence of 512-bit blocks.

**Relation to other functions** 

This function is handles block parsing, and passes any partial blocks to yield_padded_blocks(...)  which applies SHA-256 padding rules and yields the final block or blocks. Together, these functions form a clear separation of responsibilities:
* Parsing full blocks
* Padding the final partial block
* Producing a single continuous stream of valid SHA-256 input blocks

In [31]:
def block_parse(msg):
    """
    Purpose: Parse message into 512-bit blocks with SHA-256 padding

    Generator function that processes a message according to FIPS 180-4
    Sections 5.1.1 (padding) and 5.2.1 (parsing). Yields complete 64-byte
    blocks, with the final block(s) including necessary padding.
    
    Parameters
    ----------
    msg : bytes
        Message to process (can be empty)
        
    Yields
    ------
    bytes
        64-byte (512-bit) blocks, including final padded block(s)
    """
    BLOCK_SIZE = 64  # 512 bits = 64 bytes
    message_length_bits = len(msg) * 8
    position = 0
    
    # Yield all complete 64-byte blocks
    while position + BLOCK_SIZE <= len(msg):
        yield msg[position:position + BLOCK_SIZE]
        position += BLOCK_SIZE
    
    # Get remaining bytes (partial block or empty if msg was multiple of 64)
    partial_block = msg[position:]
    
    # Yield final padded block(s)
    yield from create_padded_blocks(partial_block, message_length_bits, BLOCK_SIZE)

In [32]:
def block_parse_2(msg):
    no_bytes=64
    # Initialize bit counter
    no_bits = len(msg) * 8
    position = 0

    # split msg in 512-bit (64 byte) blocks & yield them
    while position < len(msg):
        block = msg[position:position+no_bytes]
        # check that the block is not a partial block - stops last block from being yielded 
        if len(block) == no_bytes:
            position += no_bytes
            yield block
        else:
            break

    # breaking out of the while loop when a partial block is found means block variable does not contain that partial block
    # get the remaining bytes in the partial block:
    block = msg[position:]  # This will be empty if position == len(msg)
    
    # check that the msg bytes objec is not empty - if it is, assign empty bytes object to last (and only) block
    if no_bits == 0:
        block = bytes()

    # Scenario 1: are there at least 9 bytes available?
    if (64 - len(block)) >= 9:
       yield (block
              + bytes([0x80])
              + bytes([0x00] * (64 - len(block) - 1 - 8))
              + no_bits.to_bytes(8, byteorder='big'))
       
    # Scenario 2: there are between 8 and 1 bytes available, inclusive.
    elif (64 - len(block)) >= 1:
       yield block + bytes([0x80]) + bytes([0x00] * (64 - len(block) - 1))
       yield bytes([0x00] * 56) + no_bits.to_bytes(8, byteorder='big')
       
    # Scenario 3: There were no bytes available in the last block.
    else:
       yield bytes([0x80] + ([0x00] * 55)) + no_bits.to_bytes(8, byteorder='big')

In [33]:
def test_padding_scenarios():
    """Test all three padding scenarios with various message lengths."""
    print("=" * 70)
    print("SHA-256 PADDING AND PARSING TESTS")
    print("=" * 70)
    print()
    
    test_cases = [
        (b'', "Empty message"),
        (b'Hello', "Short message (5 bytes)"),
        (b'x' * 55, "Exactly 55 bytes (perfect fit)"),
        (b'x' * 56, "Exactly 56 bytes (boundary case)"),
        (b'x' * 63, "Exactly 63 bytes (1 byte space)"),
        (b'x' * 64, "Exactly 64 bytes (no space)"),
        (b'x' * 100, "Multiple blocks (100 bytes)"),
        (b'x' * 150, "Large message (150 bytes)"),
    ]
    
    for msg, description in test_cases:
        print(f"Test: {description}")
        print("-" * 70)
        
        blocks = list(block_parse(msg))
        
        print(f"Message length: {len(msg)} bytes ({len(msg) * 8} bits)")
        print(f"Number of blocks: {len(blocks)}")
        
        # Verify all blocks are 64 bytes
        all_correct_size = all(len(block) == 64 for block in blocks)
        print(f"All blocks 64 bytes: {'✓' if all_correct_size else '✗'}")
        
        # Check last block
        last_block = blocks[-1]
        
        # Find padding byte (0x80)
        try:
            padding_index = last_block.index(0x80)
            print(f"Padding byte (0x80) at position: {padding_index}")
        except ValueError:
            print("Warning: 0x80 not found in last block")
        
        # Check encoded length (last 8 bytes)
        encoded_length = int.from_bytes(last_block[-8:], byteorder='big')
        expected_length = len(msg) * 8
        length_correct = encoded_length == expected_length
        
        print(f"Encoded length: {encoded_length} bits")
        print(f"Expected length: {expected_length} bits")
        print(f"Length encoding correct: {'✓' if length_correct else '✗'}")
        
        # Display block contents (abbreviated)
        for i, block in enumerate(blocks):
            if i == len(blocks) - 1:  # Last block
                print(f"\nBlock {i} (final, with padding):")
                if len(msg) < 64:
                    msg_part = block[:len(msg)] if len(msg) > 0 else b''
                    print(f"  Message portion: {msg_part[:20]}{'...' if len(msg_part) > 20 else ''}")
                print(f"  Padding byte (0x80): {hex(block[padding_index])}")
                print(f"  Last 8 bytes (length): {block[-8:].hex()}")
            else:
                print(f"\nBlock {i}: {block[:20].hex()}... (64 bytes total)")
        
        print()
        print()


def verify_block_parse():
    """Comprehensive verification of block_parse function."""
    print("=" * 70)
    print("COMPREHENSIVE VERIFICATION")
    print("=" * 70)
    print()
    
    # Test 1: Empty message produces one block
    blocks = list(block_parse(b''))
    assert len(blocks) == 1, "Empty message should produce 1 block"
    assert len(blocks[0]) == 64, "Block should be 64 bytes"
    assert blocks[0][0] == 0x80, "First byte should be 0x80"
    print("✓ Test 1: Empty message - PASSED")
    
    # Test 2: Short message
    blocks = list(block_parse(b'abc'))
    assert len(blocks) == 1, "Short message should produce 1 block"
    assert blocks[0][:3] == b'abc', "Message should be at start"
    assert blocks[0][3] == 0x80, "Padding byte after message"
    length = int.from_bytes(blocks[0][-8:], byteorder='big')
    assert length == 24, "Length should be 24 bits"
    print("✓ Test 2: Short message - PASSED")
    
    # Test 3: Message exactly 55 bytes
    blocks = list(block_parse(b'x' * 55))
    assert len(blocks) == 1, "55-byte message should fit in 1 block"
    assert blocks[0][55] == 0x80, "Padding byte at position 55"
    print("✓ Test 3: 55-byte message - PASSED")
    
    # Test 4: Message exactly 56 bytes (boundary)
    blocks = list(block_parse(b'x' * 56))
    assert len(blocks) == 2, "56-byte message should need 2 blocks"
    assert blocks[0][56] == 0x80, "Padding byte at position 56 in block 1"
    print("✓ Test 4: 56-byte message - PASSED")
    
    # Test 5: Message exactly 64 bytes
    blocks = list(block_parse(b'x' * 64))
    assert len(blocks) == 2, "64-byte message should need 2 blocks"
    assert blocks[1][0] == 0x80, "Second block should start with 0x80"
    print("✓ Test 5: 64-byte message - PASSED")
    
    # Test 6: Large message
    blocks = list(block_parse(b'x' * 200))
    assert len(blocks) == 4, "200-byte message should produce 4 blocks"
    assert all(len(b) == 64 for b in blocks), "All blocks should be 64 bytes"
    print("✓ Test 6: Large message - PASSED")
    
    print()
    print("=" * 70)
    print("ALL VERIFICATION TESTS PASSED ✓")
    print("=" * 70)


if __name__ == "__main__":
    # Run padding scenario tests
    test_padding_scenarios()
    
    # Run comprehensive verification
    verify_block_parse()

SHA-256 PADDING AND PARSING TESTS

Test: Empty message
----------------------------------------------------------------------
Message length: 0 bytes (0 bits)
Number of blocks: 1
All blocks 64 bytes: ✓
Padding byte (0x80) at position: 0
Encoded length: 0 bits
Expected length: 0 bits
Length encoding correct: ✓

Block 0 (final, with padding):
  Message portion: b''
  Padding byte (0x80): 0x80
  Last 8 bytes (length): 0000000000000000


Test: Short message (5 bytes)
----------------------------------------------------------------------
Message length: 5 bytes (40 bits)
Number of blocks: 1
All blocks 64 bytes: ✓
Padding byte (0x80) at position: 5
Encoded length: 40 bits
Expected length: 40 bits
Length encoding correct: ✓

Block 0 (final, with padding):
  Message portion: b'Hello'
  Padding byte (0x80): 0x80
  Last 8 bytes (length): 0000000000000028


Test: Exactly 55 bytes (perfect fit)
----------------------------------------------------------------------
Message length: 55 bytes (440 bit

## Problem 4: Hashes

**Brief:** Write a function hash(current, block) that calculates the next hash value given the current hash value and the next message block according to section 6.2.2 SHA-256 Hash Computation on page 22 of the Secure Hash Standard.

### Introduction Problem 4:

In [34]:
import warnings
warnings.filterwarnings('ignore', category=RuntimeWarning)

def hash(current, block):
    # Read block as a array of 32-bit unsigned ints in big endian.
    block = np.frombuffer(block, dtype='>u4') 

    # Assign room for a 64 element 32-bit int array.
    W = np.zeros(64, dtype=np.uint32)

    # The first 16 elements of W come from the message block.
    for t in range(16):
        W[t] = block[t]
    #
    for t in range(16, 64):
        W[t] = sigma1(W[t-2]) + W[t-7] + sigma0(W[t-15]) + W[t-16]
        
    # assign current hash values to 8 variables a->h
    a = current[0]
    b = current[1]
    c = current[2]
    d = current[3]
    e = current[4]
    f = current[5]
    g = current[6]
    h = current[7]

    # loop where: T1 & T2 calculated & values of h->a get cascaded down to next next variable
    # this is done to ensure original data (message) is well-mixed with an asymetric hash 
    for t in range(64):
        T1 = h + Sigma1(e) + Ch(e, f, g) + K[t] + W[t]
        T2 = Sigma0(a) + Maj(a, b, c)
        h = g
        g = f
        f = e
        e = d + T1
        d = c
        c = b
        b = a
        a = T1 + T2

    # calculate new hash using hashed variables a->h
    H = np.array([
        a + current[0], b + current[1], c + current[2], d + current[3],
        e + current[4], f + current[5], g + current[6], h + current[7],
    ], dtype=np.uint32)

    return H

# get first 8 hash codes from official hash codes of the secure hash standard (Section 5.3.3)
H = np.array([
    0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
    0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19,
], dtype=np.uint32)

# get message from file
msg = file_to_msg('abc.txt')

# get padded blocks from block_parse() & loop through them
for block in block_parse(msg):
    # for each block, apply the hash
    H = hash(H, block) 

# test
msg = b"abc"
for block in block_parse(msg):
    H = hash(H, block) 

# print the result in hex (more readable format)
result = ''.join(f'{x:08x}' for x in H)
print("Result: ", result)
print("Expected:   ", "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad")
print("Match:", result == "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad")


NameError: name 'file_to_msg' is not defined

## Problem 5: Passwords