## Task 1: Binary Representations

In Task 1 there will be 4 functions used to showcase how a **32-bit number** can be manipulated.
These functions are mostly used in cryptography or low level programming which are all encryption methods.


### Rotating Bits Left (`rotl`)

The `rotl(x, n)` function is used to shift all bits by a number to the left by **n** places.
If a bit reaches the end of the 32 bit it wraps around and starts on the right side for it to continue.
**0xFFFFFFFF** is used to make sure we stay within the 32 bit range. This is important as cryptographic algorithms such as SHA-256 work on 32-bit word blocks.
[Rotating bits of a number](https://www.geeksforgeeks.org/python3-program-to-rotate-bits-of-a-number/).

In [1]:
def rotl(x, n=1):
    """
    Rotates the bits in a 32-bit integer to the left by n places.
    
    Parameters:
    x : The input 32-bit integer.
    n : Number of positions to rotate left.

    Returns:
    int: The result after performing the left rotation.
    """
    n = n % 32                     # Ensure n is within the valid bit range (0–31)
    shifted = (x << n) & 0xFFFFFFFF  # Left shift and mask to 32-bit
    wrapped = x >> (32 - n)       # Get bits that will wrap around
    return shifted | wrapped      # Combine the shifted and wrapped bits

#### **Rotating Bits Right (`rotr`)**

The `rotr(x, n)` function does the complete opposite to **rotl**.
It shifts all bits by a number to the righy by **n** places.
If a bit reaches the end of the 32 bit it wraps around and starts on the left side for it to continue.
Mainly used in cryptography because of how fast it is and is a good way to secure data.

In [2]:
def rotr(x, n=1):
    """
    Rotates the bits in a 32-bit integer to the right by n places.
    
    Parameters:
    x : The input 32-bit integer.
    n : Number of positions to rotate right.

    Returns:
    int: The result after performing the right rotation.
    """
    n = n % 32
    shifted = x >> n
    wrapped = (x << (32 - n)) & 0xFFFFFFFF
    return shifted | wrapped

#### **Bitwise Choice (`ch`)**

The `ch(x, y, z)` function uses x as a value and chooses between y and z.
If the bit in x is 1 it takes from y otherwise it takes from z.
This is used in SHA-256 hashing so it can decide to switch between values. 

In [3]:
def ch(x, y, z):
    """
    Chooses bits from y where x has bits set to 1, and from z where x has bits set to 0.
    Returns:
    int: Resulting 32-bit integer after bitwise choice
    """
    return (x & y) | (~x & z)  # If x bit is 1 -> choose from y, else from z

#### **Bitwise Majority (`maj`)**

The `maj(x, y, z)` function looks at x y and z and looks at each bit in them and then keeps the value that appears the most. 
The majority value is chosen if there a more than 1 chosen.
Is used in secure hashing algorithms when decisions need to be made with multiple values.

In [4]:
def maj(x, y, z):
    """
    Majority votes of bits in x, y, and z.

    Parameters:
    x (int): 32-bit integer
    y (int): 32-bit integer
    z (int): 32-bit integer

    Returns:
    int: Resulting 32-bit integer after bitwise majority
    """
    return (x & y) | (x & z) | (y & z)  # A bit is 1 if at least two of x, y, z have 1s

## Why These Functions Matter

These operations help achieve diffusion, confusion and avalanche effects which are very important for cryptographics main strengths.  

The functions `ch` and `maj` are specifically designed for use in cryptographic hash functions like SHA-256. For more details on their purpose and implementation, see [this explanation on Cryptography Stack Exchange](https://crypto.stackexchange.com/questions/5358/what-does-maj-and-ch-mean-in-sha-256-algorithm).

### Testing the Bitwise Functions

To show the functions being tested we define 3 **32-bit integers**

We use 3 **binary numbers** as inputs:
- `x = 0b10110011100011110000111100001111` → A randomly chosen **32-bit integer**.
- `y = 0b11001100110011001100110011001100` → A pattern of alternating bits.
- `z = 0b00001111000011110000111100001111` → High and low bit sequences.

These values showcase to us that the functions correctly handle the bits at different positions.

In [5]:
# Define 32-bit example values for testing
x = 0b10110011100011110000111100001111  # 32-bit integer
y = 0b11001100110011001100110011001100
z = 0b00001111000011110000111100001111

# Testing the functions
if __name__ == "__main__":
    print(f"Original x: {bin(x)}")
    print(f"rotl(x, 4): {bin(rotl(x, 4))}")
    print(f"rotr(x, 4): {bin(rotr(x, 4))}")
    print(f"ch(x, y, z): {bin(ch(x, y, z))}")
    print(f"maj(x, y, z): {bin(maj(x, y, z))}")

Original x: 0b10110011100011110000111100001111
rotl(x, 4): 0b111000111100001111000011111011
rotr(x, 4): 0b11111011001110001111000011110000
ch(x, y, z): 0b10001100100011000000110000001100
maj(x, y, z): 0b10001111100011110000111100001111


## Task 2: Hash Function

### Applications and Properties of Hash Functions

Hash functions are fundamental to many computer science applications:

- **Cryptography**: Secure hashes like SHA-256 protect passwords and digital signatures. 
- **Caching**: Web browsers and content delivery networks use hashes to identify cached content. 
- **Data Retrieval**: Hash tables provide O(1) lookup time for dictionaries and databases. 
- **Data Integrity**: File checksums ensure data hasn't been corrupted or tampered with.

In this function the hash task demonstrates these concepts at a very basic level, showing how even a simple maths operation can distribute values across a fixed range. As seen in [The Art of Computer Programming, Volume 3: Sorting and Searching](https://www-cs-faculty.stanford.edu/~knuth/taocp.html), "the choice of hash function has a significant impact on performance in practice, with multiplicative methods often providing good distribution properties."

In Task 2 we covert a string which in this instance is `(s)` into a numeric `hash` value.  
Starts at `hashval = 0` which is the first value and then loops through each character.  
The running total is then multiplied by 31 and adds the `ASCII` value of the character.   
The result returned is stored in modulo 101.   

### What is a Hash Function

A hash function is a function that uses input data and to transform it into a fixed size numerical output which is commonly known as a hash.   This maps inputs which are simliar to different outputs and is fast and very efficent.  

The implementation here is based on converting a C-style hash function to Python, as discussed in [this tutorial on migrating from C to Python](https://almarefa.net/blog/tutorial-migrating-from-c-to-python).

In [6]:
def hash_function(s: str) -> int:
    """
    Parameters:
    s (str): The input string.

    Returns:
    int: Hash value mod 101.
    """
    hashval = 0
    for char in s:
        hashval = ord(char) + 31 * hashval
    return hashval % 101

### Testing the Hash Function

A list of words is used to check the output variety.  
`hash_function()` is used on each word.  
Results printed out.

In [7]:
# Testing the function
test_strings = ["john", "smith", "computational", "theory"]
for string in test_strings:
    print(f"Hash of '{string}': {hash_function(string)}")

Hash of 'john': 97
Hash of 'smith': 19
Hash of 'computational': 42
Hash of 'theory': 77


**Why Use 31 and 101?**

`31` is a prime number which is used alot in hashing as it spreads values and gives efficient bitwise operations. 
[Why does Java's `hashCode()` use 31 as a multiplier?](https://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier) 
`101` is a prime number which is used as the modulo in this instance as it reduces the result and minimizes collisions in a small space. 

## How It Works
### Let’s walk through "john":

| Step | Character | ASCII | Hash Calculation                  | Running Total |
|------|-----------|--------|-----------------------------------|----------------|
| 1    | 'j'       | 106    | 106 + 31 × 0                      | 106            |
| 2    | 'o'       | 111    | 111 + 31 × 106 = 111 + 3286       | 3397           |
| 3    | 'h'       | 104    | 104 + 31 × 3397 = 104 + 105307    | 105411         |
| 4    | 'n'       | 110    | 110 + 31 × 105411 = 110 + 3267741 | 3267851        |
| —    | —         | —      | 3267851 % 101                     | **97**         |

## Task 3: SHA256

In Task 3 we read the input file in binary mode so we can get the raw bytes which are stored in a variable.
The original message length is also stored in bits.

- The file is opened in **binary mode** (`rb`).
- The data is read and stored in `data`.
- The original message length is calculated in **bits**.

## What Is SHA-256 Padding?
SHA-256 padding makes sure that the message fits the block size following these rules:

Append a single `1` bit.

Append `0` bits until the total length is 448 mod 512.

Append the original message length in `64-bit` big-endian format.

Total padded message length must be a multiple of `512 bits`.

In [8]:
import os  # Importing OS for file handling

# Define file path
file_path = "test.txt"

try:
    with open(file_path, "rb") as f:
        data = f.read()  # Read file contents as bytes
except FileNotFoundError:
    print(f"Error: File '{file_path}' not found.")
    
# Store variables globally so they can be used in later cells
original_bit_length = len(data) * 8  

print("Step 1: File read successfully!")


Step 1: File read successfully!


SHA-256 padding is done by adding a single '1' bit after the message.
By doing this we ensure the message is beginning padding with a `10000000`** in binary.

- `b'\x80'` is added to the data which creates the padded message.
- This ensures padding starts with a `1` followed by zeros.

SHA-256 is a **cryptographic hash function** that follows the [NIST Secure Hash Standard](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf), ensuring secure and efficient message integrity.

In [9]:
# Append the '1' bit (0x80 in hex) to the message
padded_data = data + b'\x80'

print("Step 2: 1-bit appended.")

Step 2: 1-bit appended.


We then add zero bytes until the length of the message is **448 mod 512 bits**.

- A `while` loop continuously appends **zero bytes (`0x00`)**.
- It stops when the message length **mod 512** equals **448**.
- This makes sure that there is enough space for the final 64-bit length.

This step follows the SHA-256 padding rules as described in [FIPS 180-4](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf).

In [10]:
# Add zero bytes until the length is 448 mod 512
while (len(padded_data) * 8) % 512 != 448:
    padded_data += b'\x00'  

print("Step 3: Padding with zeros done.")

Step 3: Padding with zeros done.


The original message length is then updated as a **64-bit big-endian integer**.  
The SHA-256 algorithm now knows how long the original message was before padding.

- We take the `original_bit_length` and **convert it to a 64-bit number**.
- This number is stored in **big-endian format** (`to_bytes(8, 'big')`).
- SHA-256 is then processed by the final 8 bytes to make sure it is correctly processed.

In [11]:
# Convert the original message length into an 8-byte (64-bit) big-endian integer
padded_data += original_bit_length.to_bytes(8, 'big')  

print("Step 4: Original length appended.")

Step 4: Original length appended.


The final message is then printed in **hex format**.
This shows that the padding is applied correctly.

- We print each byte of the padded message in **hexadecimal format**.

In [12]:
# Print the final padded message in hexadecimal format
print("SHA-256 Padding (Hex):")
print(" ".join(f"{byte:02x}" for byte in padded_data))

SHA-256 Padding (Hex):
61 62 63 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18


## Task 4: Prime Numbers

### The Importance of Prime Numbers in Computing

Prime numbers serve as fundamental building blocks in numerous computing applications:

- **Cryptography**: RSA encryption is dependent on the difficulty of factoring large prime numbers
- **Random Number Generation**: Prime numbers assist in the production of high quality random sequences
- **Hash Functions**: ALot of hash algorithms use prime numbers as multipliers
- **Error Correction**: Prime based code help find and recover from transmission faults

The algorithms implemented show two fundamental approaches to prime generation:
- **Trial Division**: Straightforward and intuitive, but computationally costly
- **Sieve of Eratosthenes**: More complex conceptually, but more efficient

These algorithms illustrate a common trade off in computer science between simplicity and performance, with the mostcomplex approach providing more benefits for larger datasets. As noted in [Introduction to Algorithms](https://mitpress.mit.edu/books/introduction-algorithms-third-edition), "the sieve of Eratosthenes is a practical way to find all small primes."

### Method 1: Trial Division (Brute Force)

**Trial Division method** checks each number to try and find out if that number is a prime number.

- Starting with **2** as it is the first prime number.
- Each number is checked to see if it is divisible by any of the previous primes.
- If the number is not divisible evenly it is then added to the list.

- This is brute force which is slow for large values as it has a **time complexity of \(O(n^2)\)**.

According to [PrimePages](https://primes.utm.edu/glossary/page.php?sort=TrialDivision), trial division is actually the oldest primality test, but its time complexity of O(n²) makes it impractical for large prime sets.

In [13]:
def trial_division_primes(n):
    """
    Finds the first 'n' prime numbers using the Trial Division method.
    """
    primes = []
    num = 2  # Start from the smallest prime number

    while len(primes) < n:
        is_prime = True
        for prime in primes:  
            if num % prime == 0:
                is_prime = False
                break
        if is_prime:
            primes.append(num)
        num += 1  

    return primes

We now run the function which generates the first **1,000 prime numbers**.

- The function goes through each number and checks for primes.
- The first `10` primes found are then printed for verification.

In [14]:
trial_division_result = trial_division_primes(1000)

print(trial_division_result[:10])

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]


**Advantages:**
- Simple and easy to understand.
- Works well with small values.

**Disadvantages:**
- Very slow for working with large values.
- Requires checking divisibility for every number.

### Method 2: Sieve of Eratosthenes (Efficient)

**Sieve of Eratosthenes method** is a faster way of finding primes as its main goal is to mark non-primes in a list.

- A boolean list is created where each index represents a number.
- Starting 2 and mark all multiples of each prime as non-prime.
- This is done until we have collected 1,000 primes.

- It has a **time complexity of \(O(n \log \log n)\)** which makes it much faster* than Trial Division.  

The implementation follows the algorithm described in [GeeksforGeeks' explanation of the Sieve of Eratosthenes](https://www.geeksforgeeks.org/sieve-of-eratosthenes/), which efficiently generates prime numbers by systematically eliminating multiples.

In [15]:
def sieve_of_eratosthenes(n):
   
    limit = 10 * n  # Estimate upper limit
    sieve = [True] * limit  # True means "assumed prime"
    sieve[0] = sieve[1] = False  # 0 and 1 are not primes

    for i in range(2, int(limit**0.5) + 1):
        if sieve[i]:  # If i is prime
            for multiple in range(i * i, limit, i):
                sieve[multiple] = False  # Mark multiples as non-prime

    primes = [num for num, is_prime in enumerate(sieve) if is_prime][:n]
    return primes

We not run the function to generate the first `1,000 prime numbers`.

- We run the Sieve of Eratosthenes function.
- The first `10` primes are printed for verification.

In [16]:
# Generate first 1000 primes using Sieve of Eratosthenes
sieve_result = sieve_of_eratosthenes(1000)

# Print first 10 primes for verification
print(sieve_result[:10])

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]


 **Advantages:**
- A lot faster than Trial Division.
- Large values work well.

 **Disadvantages:**
- Requires extra memory to store the boolean array.

## Trial Division vs. Sieve of Eratosthenes

| Feature              | Trial Division        | Sieve of Eratosthenes   |
|----------------------|------------------------|--------------------------|
| Time Complexity      | O(n²)                  | O(n log log n)           |
| Memory Usage         | Low                    | Higher (uses boolean list) |
| Best For             | Small lists or teaching| Fast prime generation    |
| Code Complexity      | Simple                 | Moderate                 |

- **Trial Division** is more simple but not as effective for large values.
- **Sieve of Eratosthenes** is a lot  faster but does require extra memory.

## Task 5: Roots

## Compute the first 100 prime numbers

There needs to be 100 prime numbers before we start to calculate square roots.

- **Sieve of Eratosthenes** is used to generate the first 100 prime numbers as it is the most efficient method.
- By doing this we make sure we don’t manually list primes.

SHA-256 derives some of its constants from the first 32 bits of the fractional parts of square roots of the first 8 primes which ensures randomness in hashing algorithms which is noted in [Wikipedia – SHA-2 Constants](https://en.wikipedia.org/wiki/SHA-2#Pseudorandom_constants)

In [17]:
from math import sqrt

def sieve_of_eratosthenes(n):
    """
    Finds the first 'n' prime numbers using the Sieve of Eratosthenes.
    """
    limit = 1000  # Estimate a safe upper limit
    sieve = [True] * limit  # True means "assumed prime"
    sieve[0] = sieve[1] = False  # 0 and 1 are not primes

    for i in range(2, int(limit**0.5) + 1):
        if sieve[i]:  # If i is prime
            for multiple in range(i * i, limit, i):
                sieve[multiple] = False  # Mark multiples as non-prime

    primes = [num for num, is_prime in enumerate(sieve) if is_prime][:n]
    return primes

#  Generate first 100 prime numbers
first_100_primes = sieve_of_eratosthenes(100)
print(first_100_primes[:10])  # Print first 10 primes for verification


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]


## Compute square roots & extract fractional part

- We calculate the square root of each prime.
- We isolate the fractional part.
- The fractional part is then converted to binary.

In [18]:
def get_fractional_part(number):
    """
    Extracts the fractional part of a number.
    """
    return number - int(number)  # Remove integer part

# Example: Extract fractional part of sqrt(2)
example_sqrt = sqrt(2)
fractional_part = get_fractional_part(example_sqrt)
print(f"Example: sqrt(2) = {example_sqrt}, Fractional Part = {fractional_part}")

Example: sqrt(2) = 1.4142135623730951, Fractional Part = 0.41421356237309515


## Converting the fractional part to 32-bit binary

- We multiply the fractional part by 2 repeatedly.
- The integer part of each multiplication then gives us `1s` and `0s`.
- We extract `32 bits` from this conversion.

In [19]:
def fractional_to_binary(fraction, bits=32):
    """
    Converts a fractional part to a 32-bit binary representation.
    """
    binary_str = ""
    for _ in range(bits):
        fraction *= 2
        if fraction >= 1:
            binary_str += "1"
            fraction -= 1
        else:
            binary_str += "0"
    return binary_str

binary_fractional_part = fractional_to_binary(fractional_part)
print(f"32-bit binary of sqrt(2) fractional part: {binary_fractional_part}")

32-bit binary of sqrt(2) fractional part: 01101010000010011110011001100111


## Computing and displaying results for the first 100 Primes

- For each prime number we must compute the `square root`.
- Extract the fractional then convert it to binary.
- The print out the first `10` results for verification.

In [20]:
# Compute the first 32 bits of the fractional part of sqrt(primes)
results = {}

for prime in first_100_primes:
    sqrt_value = sqrt(prime)  # Compute square root
    fractional_part = get_fractional_part(sqrt_value)  # Extract fractional part
    binary_representation = fractional_to_binary(fractional_part)  # Convert to binary
    results[prime] = binary_representation

# Print first 10 results
for prime, binary in list(results.items())[:10]:
    print(f"Prime: {prime}, Binary (32 bits): {binary}")

Prime: 2, Binary (32 bits): 01101010000010011110011001100111
Prime: 3, Binary (32 bits): 10111011011001111010111010000101
Prime: 5, Binary (32 bits): 00111100011011101111001101110010
Prime: 7, Binary (32 bits): 10100101010011111111010100111010
Prime: 11, Binary (32 bits): 01010001000011100101001001111111
Prime: 13, Binary (32 bits): 10011011000001010110100010001100
Prime: 17, Binary (32 bits): 00011111100000111101100110101011
Prime: 19, Binary (32 bits): 01011011111000001100110100011001
Prime: 23, Binary (32 bits): 11001011101110111001110101011101
Prime: 29, Binary (32 bits): 01100010100110100010100100101010


## Task 6: Proof of Work

Find words in the English language with the most amount of 0 bits at the beginning of their SHA256 hash digest.

- Demonstrates the core principle behind cryptocurrency mining
- Shows how miners search for hash values with specific patterns
- Known as a "partial hash collision" in blockchain technologies

This concept is fundamental to systems like Bitcoin, as described in [Nakamoto's original Bitcoin whitepaper](https://bitcoin.org/bitcoin.pdf).

### SHA-256 Hash Computation

Function to compute SHA-256 hashes and count leading zero bits.

- SHA-256 produces a fixed-size 256-bit hash value
- Makes it computationally impossible to find two inputs with same output
- Perfect for proof-of-work systems

As specified in [NIST's Secure Hash Standard](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf).

In [21]:
import hashlib
import urllib.request
import requests
from typing import List, Tuple

def get_sha256_hash(word: str) -> str:
    """Calculate the SHA256 hash of a given word."""
    hash_obj = hashlib.sha256(word.encode('utf-8'))
    return hash_obj.hexdigest()

def count_leading_zero_bits(hash_hex: str) -> int:
    """Count the number of leading zero bits in a hash hexadecimal string."""
    # Convert hex to binary string (without '0b' prefix)
    binary = bin(int(hash_hex, 16))[2:].zfill(256)
    
    # Count leading zeros
    for i, bit in enumerate(binary):
        if bit == '1':
            return i
    return len(binary)  # All zeros (extremely unlikely)

### Loading English Dictionary

Load a comprehensive dictionary of English words with multiple fallback sources.

- Requires searching through many potential inputs
- Finds words that produce hash values with specific properties
- Broader dictionary means more thorough search

In [22]:
def load_word_list() -> List[str]:
    """Load a list of English words from a standard dictionary source."""
    try:
        # Try GitHub English words repository (large dictionary)
        url = "https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"
        print("Downloading dictionary from GitHub...")
        with urllib.request.urlopen(url) as response:
            words = response.read().decode('utf-8').splitlines()
        print(f"Downloaded {len(words)} words successfully")
        return [word.strip() for word in words if word.strip()]
    except Exception as e:
        print(f"Failed to download dictionary: {e}")
        # Fall back to system dictionary
        try:
            with open("/usr/share/dict/words") as f:
                words = [line.strip() for line in f if line.strip().isalpha()]
            print(f"Using system dictionary with {len(words)} words")
            return words
        except FileNotFoundError:
            # Final fallback to a modest list
            basic_words = [
                "apple", "banana", "orange", "proof", "of", "work", "zero", "hash", 
                "challenge", "computer", "algorithm", "binary", "digital", "cryptography", 
                "blockchain", "theory", "logic", "password", "secure", "random", "number", 
                "prime", "factor", "integer", "function", "program", "system", "network",
                "database", "memory", "process", "thread", "kernel", "module", "class",
                "object", "string", "array", "list", "queue", "stack", "tree", "graph"
            ]
            print(f"Using fallback word list with {len(basic_words)} words")
            return basic_words

### Dictionary Verification

Verify results are legitimate English words through multiple dictionary sources.

- Ensures inputs are real words, not random character sequences
- Maintains integrity of results
- Shows finding specific hash patterns with meaningful inputs is harder

In [23]:
def verify_word(word: str) -> Tuple[bool, str]:
    """
    Verify if a word exists in an English dictionary.
    
    Returns:
        Tuple[bool, str]: (is_valid, source_of_verification)
    """
    # Make sure this block is at the beginning of the function
    global words_set
    if not 'words_set' in globals():
        words = load_word_list()
        words_set = set(words)
    if word in words_set:
        return True, "Dictionary source"
    # First check: Merriam-Webster dictionary
    try:
        url = f"https://www.merriam-webster.com/dictionary/{word}"
        response = requests.get(url)
        if "The word you've entered isn't in the dictionary" not in response.text:
            return True, "Merriam-Webster"
    except Exception as e:
        print(f"Merriam-Webster check failed: {e}")
    
    # Second check: Dictionary API
    try:
        response = requests.get(f"https://api.dictionaryapi.dev/api/v2/entries/en/{word}")
        if response.status_code == 200:
            return True, "Dictionary API"
    except Exception as e:
        print(f"Dictionary API check failed: {e}")
    
    # Final check: system dictionary
    try:
        with open("/usr/share/dict/words") as f:
            dictionary = {line.strip().lower() for line in f if line.strip().isalpha()}
        if word.lower() in dictionary:
            return True, "System dictionary"
    except FileNotFoundError:
        pass
        
    return False, "Not found in any dictionary"

### Finding Words with Most Leading Zeros

Core algorithm to find words with maximum leading zero bits in SHA-256 hash.

- Simulates cryptocurrency "mining" process
- Finds hash values with specific patterns
- Uses English words as inputs instead of block headers with nonces

In [24]:
def find_words_with_most_leading_zeros(words: List[str], top_n: int = 10) -> List[Tuple[str, int, str]]:
    """
    Find words with the most leading zero bits in their SHA256 hash.
    
    Args:
        words: List of words to check
        top_n: Number of top results to return
    
    Returns:
        List of tuples containing (word, zero_count, hash_hex)
    """
    print(f"Processing {len(words)} words to find maximum leading zeros...")
    word_counts = []
    
    # Process each word
    for i, word in enumerate(words):
        if i % 10000 == 0 and i > 0:
            print(f"Processed {i}/{len(words)} words...")
            
        if word.isalpha():  # Only process alphabetic words
            hash_hex = get_sha256_hash(word)
            zeros = count_leading_zero_bits(hash_hex)
            word_counts.append((word, zeros, hash_hex))

    # Sort by number of leading zeros (descending)
    word_counts.sort(key=lambda x: x[1], reverse=True)
    return word_counts[:top_n]

### Running the Complete Solution

Execute algorithm to find English words with most leading zero bits.

- Processes dictionary in manageable chunks
- Tracks and displays progress
- Illustrates computational effort required for finding patterns in hash functions

In [25]:
def display_results(results: List[Tuple[str, int, str]]):
    """Display the top words with their leading zero counts and hashes."""
    print("\nWords with most leading zero bits in SHA256 hash:")
    print("================================================")

    for i, (word, zeros, hash_hex) in enumerate(results):
        is_valid, source = verify_word(word)
        
        print(f"\n{i+1}. Word: {word}")
        print(f"   Leading zero bits: {zeros}")
        print(f"   SHA256 hash: {hash_hex}")

        # Display the first few bytes in binary to show the leading zeros
        binary = bin(int(hash_hex[:16], 16))[2:].zfill(64)  # First 16 hex chars (64 bits)
        print(f"   Binary (first 64 bits): {binary}")
        print(f"   Dictionary verification: {source if is_valid else 'Not verified'}")
        
        if is_valid:
            print(f"   Can be verified at: https://www.merriam-webster.com/dictionary/{word}")

### Running the Complete Solution

Now we'll execute the full algorithm to find English words with the most leading zero bits in their SHA-256 hash.

This process illustrates the computational effort required to find specific patterns in cryptographic hash functions, which forms the basis of blockchain mining difficulty adjustment mechanisms.

In [26]:
def find_proof_of_work_words():
    """Main function to find English words with most leading zeros in SHA256 hash."""
    # Load dictionary
    words = load_word_list()
    
    # Process full dictionary - don't limit it
    test_words = words  # Use ALL words, not just first 10,000
    print(f"Processing full dictionary with {len(test_words)} words...")
    
    # Process in chunks to show progress
    chunk_size = 50000
    max_zeros = 0
    best_words = []
    
    # Process each chunk
    for i in range(0, len(test_words), chunk_size):
        chunk = test_words[i:i+chunk_size]
        print(f"Processing words {i} to {min(i+chunk_size, len(test_words))}...")
        
        # Find best words in this chunk
        for word in chunk:
            if word.isalpha():
                hash_hex = get_sha256_hash(word)
                zeros = count_leading_zero_bits(hash_hex)
                
                if zeros > max_zeros:
                    max_zeros = zeros
                    best_words = [(word, zeros, hash_hex)]
                    print(f"New best: {word} with {zeros} leading zeros")
                elif zeros == max_zeros:
                    best_words.append((word, zeros, hash_hex))
                    print(f"Found another with {zeros} zeros: {word}")
    
    # Display simple results immediately
    print("\nResults - Words with most leading zeros:")
    print("========================================")
    for i, (word, zeros, hash_hex) in enumerate(best_words):
        is_valid, source = verify_word(word)
        print(f"{i+1}. Word: {word}")
        print(f"   Leading zero bits: {zeros}")
        print(f"   SHA256 hash: {hash_hex}")
        print(f"   Verification: {source if is_valid else 'Not verified'}")
        print(f"   Binary: {bin(int(hash_hex[:16], 16))[2:].zfill(64)}")
        print()
    
    # Display conclusion
    print("\nTask 6 Conclusions:")
    print("===================")
    print("Finding words with the most leading zero bits in their SHA256 hash")
    print("demonstrates the core principle behind proof-of-work systems.")
    print(f"The maximum leading zeros found was {max_zeros}.")

# Run the function
find_proof_of_work_words()

Downloading dictionary from GitHub...
Downloaded 370105 words successfully
Processing full dictionary with 370105 words...
Processing words 0 to 50000...
Found another with 0 zeros: a
Found another with 0 zeros: aa
Found another with 0 zeros: aaa
New best: aah with 3 leading zeros
New best: aahs with 4 leading zeros
New best: ababua with 5 leading zeros
New best: abac with 6 leading zeros
Found another with 6 zeros: abadia
Found another with 6 zeros: abarambo
Found another with 6 zeros: abasers
New best: abbe with 7 leading zeros
New best: abdicate with 8 leading zeros
New best: abohm with 9 leading zeros
New best: absinthine with 10 leading zeros
Found another with 10 zeros: acronymically
New best: agriculturists with 11 leading zeros
Found another with 11 zeros: airbrasive
New best: alpestral with 14 leading zeros
Processing words 50000 to 100000...
Found another with 14 zeros: courteously
New best: duppa with 15 leading zeros
Processing words 100000 to 150000...
New best: goaltender

## Task 7: Turing Machines

### The Significance of Turing Machines in Computing Theory

- **Universal Computation**: All algorithms can be computed by a Turing Machine
- **Computational Limits**: Turing Machines help decide what is and isn't computationally possible
- **Halting Problem**: Turing proved some problems are undecidable, which shows there is boundaries of computation
- **Foundation for Modern Computing**: Modern computers follow the same basic principles

This simple binary increment operation illustrates how even complex calculations can be broken down into a series of simple state transitions and symbol manipulations. As explained in [Stanford's Encyclopedia of Philosophy entry on Turing Machines](https://plato.stanford.edu/entries/turing-machine/), Turing's model serves as "the foundation for the definition of computability and computational complexity theory."

In this task we created a simple Turing Machine which adds `1` to a binary number.

This is done by:

- **Tape Input:** A binary number is represented as a string (e.g. `100111`).  
- **LSB Handling:** Addition begins at the least significant bit.  
- **Carry Logic:** If a bit is 1, write 0 and move left; if 0, write 1 and halt.  
- **Blank Tape Handling:** If blank (B) is reached, write 1 at front.  


The implementation follows the Turing Machine described in [GeeksforGeeks' article on incrementing binary numbers](https://www.geeksforgeeks.org/construct-turing-machine-for-incrementing-binary-number-by-1/), which shows the state transitions required for binary addition.

## State Transition Table

| Current State | Read Symbol | Write Symbol | Move | Next State |
|---------------|-------------|--------------|------|------------|
| `q0`          | `1`         | `0`          | L    | `q0`       |
| `q0`          | `0`         | `1`          | R    | `HALT`     |
| `q0`          | `B`         | `1`          | R    | `HALT`     |

**Explanation**:
- The main state is q0 which is where the machine does the carry logic
q0 = main carry-propagation state.  
- When there is a `1` read it then writes `0` and moves on to the left.  
- If there is a `0` seen then the carry is ended and a `1` is then written and it stops.  
- When the machine sees a blank which is a (B) that means the number was at `1` so it then writes a new `1` at the start.  

This implementation follows Turing Machine principles described in [TutorialsPoint's guide on binary incrementation with Turing Machines](https://www.tutorialspoint.com/design-a-tm-that-increments-a-binary-number-by-1), demonstrating the formal state-machine approach to computation.

In [27]:
def turing_add_one(tape_input):
    # Convert input string to list (tape), and add blank symbol at the beginning
    tape = ['B'] + list(tape_input)
    head = len(tape) - 1  # Start at right-most symbol (LSB)
    state = 'q0'

    while state != 'HALT':
        symbol = tape[head]

        if state == 'q0':
            if symbol == '1':
                tape[head] = '0'
                head -= 1
            elif symbol == '0':
                tape[head] = '1'
                state = 'HALT'
            elif symbol == 'B':
                tape[head] = '1'
                state = 'HALT'

    # Remove leading blank if not needed
    result = ''.join(tape).lstrip('B')
    return result

# Test case
input_tape = '100111'
output_tape = turing_add_one(input_tape)

print(f"Input:  {input_tape}")
print(f"Output: {output_tape}")

Input:  100111
Output: 101000


## Task 8: Computational Complexity

### Bubble Sort with Comparison Counting
Created a function that does a normal bubble sort and after every time it compares `2` elements it then counts that comparison.  

### Why Bubble Sort?
Bubble sort is very simple but has inefficiency for large datasets. According to [Simplilearn's tutorial on bubble sort](https://www.simplilearn.com/tutorials/data-structure-tutorial/bubble-sort-algorithm), this algorithm clearly demonstrates how the initial arrangement of data significantly impacts computational efficiency, making it ideal for teaching complexity analysis. Can show worst case vs best case for behavior and undestands how input order can be effective.  

In [28]:
def bubble_sort_with_count(arr):
    """
    Sorts the list using Bubble Sort and counts the comparisons made.
    """
    n = len(arr)
    count = 0
    a = arr.copy()  # To avoid changing the original permutation
    for i in range(n):
        for j in range(0, n - i - 1):
            count += 1  # Each comparison is counted
            if a[j] > a[j + 1]:
                a[j], a[j + 1] = a[j + 1], a[j]
    return count

### Generate All Permutations

Created all `120` permutations of [1, 2, 3, 4, 5] using Python's itertools.permutations.   
Used the bubble sort function for every permutation and displayed the outcome.  
Each result is stored as a tuple: (permutation, comparison_count) in a list called results.

In [29]:
from itertools import permutations

L = [1, 2, 3, 4, 5]
all_perms = list(permutations(L))

results = []

for perm in all_perms:
    comparison_count = bubble_sort_with_count(list(perm))
    results.append((perm, comparison_count))

### Calculates key statistics from the results:

Minimum comparisons → Best-case scenario.

Maximum comparisons → Worst-case scenario.

Average comparisons → How Bubble Sort performs across all `120` permutations.

In [30]:
from statistics import mean

comparison_values = [count for _, count in results]
min_comparisons = min(comparison_values)
max_comparisons = max(comparison_values)
avg_comparisons = mean(comparison_values)

print(f"\n----Summary Statistics----")
print(f"Total permutations: {len(results)}")
print(f"Minimum comparisons (best case): {min_comparisons}")
print(f"Maximum comparisons (worst case): {max_comparisons}")
print(f"Average comparisons: {avg_comparisons:.2f}")


----Summary Statistics----
Total permutations: 120
Minimum comparisons (best case): 10
Maximum comparisons (worst case): 10
Average comparisons: 10.00


### Identify Best and Worst Case Inputs
Displaying the specific permutations that resulted in:

The minimum number of comparisons.

The maximum number of comparisons.

This helps understand which types of input orders cause the most or least work for Bubble Sort.

In [31]:
print("\n🔝 Best-case Permutations (min comparisons):")
for perm, count in results:
    if count == min_comparisons:
        print(f"{perm} -> {count} comparisons")

print("\n🔻 Worst-case Permutations (max comparisons):")
for perm, count in results:
    if count == max_comparisons:
        print(f"{perm} -> {count} comparisons")



🔝 Best-case Permutations (min comparisons):
(1, 2, 3, 4, 5) -> 10 comparisons
(1, 2, 3, 5, 4) -> 10 comparisons
(1, 2, 4, 3, 5) -> 10 comparisons
(1, 2, 4, 5, 3) -> 10 comparisons
(1, 2, 5, 3, 4) -> 10 comparisons
(1, 2, 5, 4, 3) -> 10 comparisons
(1, 3, 2, 4, 5) -> 10 comparisons
(1, 3, 2, 5, 4) -> 10 comparisons
(1, 3, 4, 2, 5) -> 10 comparisons
(1, 3, 4, 5, 2) -> 10 comparisons
(1, 3, 5, 2, 4) -> 10 comparisons
(1, 3, 5, 4, 2) -> 10 comparisons
(1, 4, 2, 3, 5) -> 10 comparisons
(1, 4, 2, 5, 3) -> 10 comparisons
(1, 4, 3, 2, 5) -> 10 comparisons
(1, 4, 3, 5, 2) -> 10 comparisons
(1, 4, 5, 2, 3) -> 10 comparisons
(1, 4, 5, 3, 2) -> 10 comparisons
(1, 5, 2, 3, 4) -> 10 comparisons
(1, 5, 2, 4, 3) -> 10 comparisons
(1, 5, 3, 2, 4) -> 10 comparisons
(1, 5, 3, 4, 2) -> 10 comparisons
(1, 5, 4, 2, 3) -> 10 comparisons
(1, 5, 4, 3, 2) -> 10 comparisons
(2, 1, 3, 4, 5) -> 10 comparisons
(2, 1, 3, 5, 4) -> 10 comparisons
(2, 1, 4, 3, 5) -> 10 comparisons
(2, 1, 4, 5, 3) -> 10 comparisons
(2,