Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES-CTR PRNG with external git submodule #600

Closed

Conversation

Knogle
Copy link
Contributor

@Knogle Knogle commented Sep 8, 2024

Ahoy,

Referring to #559 (comment)

I've implemented the OpenSSL development dependency using a git submodule

Everything else is just the same, even though, the build time is increased.

BR,

…th OpenSSL and AES-NI support

This commit introduces significant improvements to `nwipe`'s random number generation by implementing AES-CTR mode using OpenSSL with AES-NI hardware acceleration. The changes include:

- **AES-CTR PRNG Implementation**: A new pseudo-random number generator (PRNG) based on AES-128 in counter (CTR) mode has been added. This PRNG uses OpenSSL’s EVP API for cryptographic operations, ensuring strong random number generation.

- **AES-NI Hardware Support**: The PRNG now detects if the system supports AES-NI and uses hardware acceleration when available for improved performance, particularly on 64-bit systems. A new function `has_aes_ni()` checks for AES-NI support, ensuring that the fastest available option is used.

- **Default PRNG Selection**: For 64-bit systems, AES-CTR is now the default PRNG due to its higher performance and security. On 32-bit systems, the XORoshiro-256 PRNG remains the default due to performance considerations.

- **Improved Error Handling**: Error handling has been extensively revised. The AES-CTR PRNG now provides detailed error messages, including the detection of potential OpenSSL failures, and uses structured logging (`NWIPE_LOG_DEBUG` and `NWIPE_LOG_NOTICE`) to report PRNG initialization and errors.

- **Memory Management**: Smart pointers have been introduced to manage memory automatically within the AES-CTR PRNG, preventing memory leaks. All memory used by the OpenSSL contexts (EVP_MD_CTX and EVP_CIPHER_CTX) is now properly cleaned up after use.

- **Performance Enhancements**: The PRNG’s performance has been optimized, especially for 64-bit systems. Additionally, the code now checks for edge cases when reading remaining bytes, and memset is used to handle uninitialized memory areas.

- **Formatting and Code Cleanup**: Deprecated SHA256 calls have been replaced with OpenSSL’s EVP API. Multiple formatting issues in various source files (`pass.c`, `options.c`, `gui.c`, etc.) have been fixed, improving code readability and compliance with the coding standards.

- **Regression Fixes**: Several small regressions from previous changes have been fixed, including uninitialized variables, implicit function declarations, and incorrect function calls that previously led to warnings and potential crashes (e.g., segfaults in `pass.c`).

- **Logging and Debugging**: Improved logging messages and detailed debug information have been added, particularly around the AES-CTR initialization and error handling paths.

This implementation significantly improves the security, performance, and reliability of random number generation in `nwipe` and ensures that the tool is better equipped for modern cryptographic standards.

Squashed commit of the following:

commit 03284d7
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Fri Aug 23 19:17:05 2024 +0200

    Fixed formatting

commit 31df3eb
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Fri Aug 23 19:15:01 2024 +0200

    Fixed type error on i686 - uint64_t for AES-CTR

commit e56a47c
Merge: 3465260 d1edd05
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Tue Aug 20 10:39:26 2024 +0100

    Merge branch 'master' into aes-ctr

commit 3465260
Merge: 1bf5ff1 5140f92
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Fri May 10 01:41:09 2024 +0200

    Merge branch 'master' into aes-ctr

commit 1bf5ff1
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Fri May 10 01:14:11 2024 +0200

    Added check for return value in prng.c and nwipe_log accordingly.

commit 9015a32
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Fri May 10 00:11:59 2024 +0200

    Removed error handling using goto, now returning instead. Changed from void to int functions, removed cleanup(); on error , instead relying on cleanup routine on exit.

commit 10ea2bb
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Mon Apr 15 13:47:05 2024 +0200

    Adapted the aes-ctr-prng accordingly, to report SANITY level errors, in case of failure.

commit 71e7f8f
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Mon Apr 15 13:46:35 2024 +0200

    Added case handling for NWIPE_LOG_LEVEL_SANITY, providing the github issue link, if a SANITY level error occurs.

commit 0f3e7f5
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Mon Apr 15 00:40:22 2024 +0200

    Added check in pass.c wether nwipe_aes_ctr_prng is being used or not, if not a segfault was the result. Now fixed.

commit ce09d8e
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sat Apr 13 00:58:34 2024 +0200

    Missing NWIPE_LOG_NOTICE for AES-CTR init changed to DEBUG level.

commit 8b284f3
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sat Apr 13 00:50:25 2024 +0200

    Changed notification for successful PRNG init for AES-CTR to NWIPE_LOG_DEBUG.

commit 8702cc3
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sat Apr 13 00:13:51 2024 +0200

    Added missing cleanup routine in prior commit in aes_ctr_prng.c and header definitions.

commit 35cd055
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sat Apr 13 00:12:01 2024 +0200

    Fixed formatting in pass.c

commit beff746
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sat Apr 13 00:11:28 2024 +0200

    Added cleanup routine aes_ctr_prng_general_cleanup() after nwipe_random_pass and nwipe_random_verify in order to cleanup PRNG state.

commit 1a95202
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Fri Apr 12 23:25:04 2024 +0200

    Part of the comments were missing, fixed.

commit a65410a
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Fri Apr 12 23:20:41 2024 +0200

    Added extensive error handling, in order to check for OpenSSL library malfunction.

commit 6518963
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Thu Apr 11 00:20:54 2024 +0200

    Improved PRNG description for AES-CTR-256 in gui.c

commit fe493cf
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Wed Apr 10 23:51:12 2024 +0200

    Added function has_aes_ni() in order to check for AES-Ni support, and set the PRNG accordingly.

commit adcd442
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Wed Apr 10 15:40:24 2024 +0200

    Handle edge case for remaining bytes in nwipe_aes_ctr_prng_read using memset.

commit ce2db63
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Wed Apr 10 15:08:25 2024 +0200

    Fixed comments, indicating AES-CTR-128 instead of 256 bit

commit da53ee0
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Wed Apr 10 14:53:45 2024 +0200

    Reverted by mistake nwipe_random back to nwipe_dodshort, now compliant with master

commit e479239
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Wed Apr 10 13:23:36 2024 +0200

    Fixed formatting

commit e0d9584
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Wed Apr 10 13:22:25 2024 +0200

    Minor changes, added comments for further explanation.

commit 7410d21
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Tue Apr 9 12:49:02 2024 +0200

    Fixed uninitialized temp_buffer in aes_ctr_prng_genrand_uint256_to_buf

commit cd5f071
Merge: 3cb78ca 2809580
Author: PartialVolume <22084881+PartialVolume@users.noreply.github.com>
Date:   Sun Apr 7 22:22:55 2024 +0100

    Merge branch 'master' into aes-ctr

commit 3cb78ca
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sun Mar 31 14:03:38 2024 +0200

    Added error checking and nwipe_log to aes_ctr_prng.c

commit 20fea0d
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sun Mar 31 13:54:54 2024 +0200

    Only C implementation, removed CPP here.

commit d5b39f6
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sun Mar 31 00:43:58 2024 +0100

    Introduced smart pointers to manage memory for EVP_MD_CTX and EVP_CIPHER_CTX within the AES CTR PRNG C++ implementation. This ensures automatic resource release, preventing memory leaks and enhancing code safety.

commit f6aeae3
Author: Fabian Druschke <fdruschke@outlook.com>
Date:   Sun Mar 31 00:34:15 2024 +0100

    Created seamless integrated .cpp AES-CTR-PRNG in order to avoid memory issues.

commit eecddb2
Author: Knogle <fdruschke@outlook.com>
Date:   Sun Mar 24 22:48:06 2024 +0000

    aes_ctr_prng_init was missing in header, causing implicit declaration warnings.

commit 8fe4db4
Author: Knogle <fdruschke@outlook.com>
Date:   Sun Mar 24 18:30:52 2024 +0000

    Replaced traditional deprecated SHA256 declarations with EVP-API infrastructure

commit 55472fb
Author: Knogle <fdruschke@outlook.com>
Date:   Sun Mar 24 03:43:36 2024 +0000

    To consider, AES-128-CTR as default option for 64-Bit, and Xoroshiro-256 as default option for 32-Bit due to performance and quality reasons.

commit 1a964bc
Author: Fabian Druschke <fabian@knogle.industries>
Date:   Sat Mar 23 19:12:24 2024 -0300

    Fixed missing XORoshiro-256 in options.c bottom section. Added AES-128-CTR OpenSSL descriptions.

commit cf9822a
Author: Fabian Druschke <fabian@knogle.industries>
Date:   Sat Mar 23 18:48:38 2024 -0300

    Several changes, adding AES-128 using libssl in CTR mode as new PRNG, in experimental state. Fixed formatting, fixed AES PRNG header.
- Added automatic initialization and update of git submodules during the configure step.
- The script now checks if the project is within a Git repository and contains submodules.
- If submodules exist, they are initialized and updated using 'git submodule init' and 'git submodule update --recursive'.
- If the project is not a Git repository or contains no submodules, appropriate warnings are displayed.
Copy link
Collaborator

@PartialVolume PartialVolume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you are using the openssl-3.0 branch rather than the openssl-3.4 branch?

Also where does the Openssl libraries that nwipe uses get stored on the distro assuming they are dynamically linked or is Openssl statically linked into nwipe's executable? If statically linked how big is the executable.

@Knogle
Copy link
Contributor Author

Knogle commented Sep 8, 2024

Is there a reason you are using the openssl-3.0 branch rather than the openssl-3.4 branch?

Also where does the Openssl libraries that nwipe uses get stored on the distro assuming they are dynamically linked or is Openssl statically linked into nwipe's executable? If statically linked how big is the executable.

Ahoy, i went for the OpenSSL 3.0 version because in the other thread someone suggested it. OpenSSL is the LTS release and provides compatiblity for 5 years to come.

I've screwed a little up, still having an issue with the linked library here.
The build time is increased to 8 minutes instead of 10 seconds. So tbh i think this is not a feasible solution.
Also does it require if openssl is statically linked, also gcc to be statically linked.
The size of the binary is increased to 11M instead of 962k.

So in conclusion i think, if we wish to do so for the project, having the libssl-dev headers on the build system, and using them, is the only feasible option. Or otherwise, it would mean a huge binary with it's downsides.

@PartialVolume
Copy link
Collaborator

You don't think extracting just the relevent AES code from Openssl we need to do the job is an option so we don't need to link to Openssl at all is an option?

I know that may be really difficult because you have to spend time understanding their code and it may be structured in such a way that makes it really difficult to do but we have done that in the past.

@Knogle
Copy link
Contributor Author

Knogle commented Sep 8, 2024

You don't think extracting just the relevent AES code from Openssl we need to do the job is an option so we don't need to link to Openssl at all is an option?

I know that may be really difficult because you have to spend time understanding their code and it may be structured in such a way that makes it really difficult to do but we have done that in the past.

Ahoy :)

I think we could instead borrow code from one of the older OpenSSL versions like 1.1.0.
The code in OpenSSL 3.x is structured as a kind of framework, and all the function deeply rely on each other as it's an integral part of the OpenSSL watchdog and error-reporting functions.

Maybe we can discuss this topic with the other guys, because i think it's also a discussion on what's best for the nwipe project, not only the single PR.

In case of OpenSSL 1.1.0 we could copy them out easier, but it's an outdated and known vulnerable version.
I have created also a version which does not rely at all on OpenSSL, but is capped at around 50MB/s unfortunately.

BR,

/*
 * AES CTR PRNG Implementation
 * Author: Fabian Druschke
 * Date: 2024-03-13
 *
 * This header file contains definitions for the AES (Advanced Encryption Standard)
 * implementation in CTR (Counter) mode for pseudorandom number generation, utilizing
 * OpenSSL for cryptographic functions.
 *
 * As the author of this work, I, Fabian Druschke, hereby release this work into the public
 * domain. I dedicate any and all copyright interest in this work to the public domain,
 * making it free to use for anyone for any purpose without any conditions, unless such
 * conditions are required by law.
 *
 * This software is provided "as is", without warranty of any kind, express or implied,
 * including but not limited to the warranties of merchantability, fitness for a particular
 * purpose and noninfringement. In no event shall the authors be liable for any claim,
 * damages or other liability, whether in an action of contract, tort or otherwise, arising
 * from, out of or in connection with the software or the use or other dealings in the software.
 *
 * USAGE OF OPENSSL IN THIS SOFTWARE:
 * This software uses OpenSSL for cryptographic operations. Users are responsible for
 * ensuring compliance with OpenSSL's licensing terms.
 */

#include <stdint.h>
#include <string.h>
#include <assert.h>
#include <stdio.h>

// SHA-256 helper macros
#define ROTRIGHT(a,b) (((a) >> (b)) | ((a) << (32-(b))))

// SHA-256 specific functions as defined in the algorithm
#define CH(x,y,z) (((x) & (y)) ^ (~(x) & (z)))  // Choose
#define MAJ(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))  // Majority

// SHA-256 Sigma and Epsilon functions
#define EP0(x) (ROTRIGHT(x,2) ^ ROTRIGHT(x,13) ^ ROTRIGHT(x,22))  // Big Sigma 0
#define EP1(x) (ROTRIGHT(x,6) ^ ROTRIGHT(x,11) ^ ROTRIGHT(x,25))  // Big Sigma 1
#define SIG0(x) (ROTRIGHT(x,7) ^ ROTRIGHT(x,18) ^ ((x) >> 3))     // Small Sigma 0
#define SIG1(x) (ROTRIGHT(x,17) ^ ROTRIGHT(x,19) ^ ((x) >> 10))   // Small Sigma 1


#define AES_BLOCK_SIZE 16
#define AES_KEY_SIZE 32

/* SHA-256 Constants for the SHA-256 transformation. These values are used in each step of the hashing process.
   These constants are the first 32 bits of the fractional parts of the cube roots of the first 64 prime numbers. */
static const uint32_t k[64] = {
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

/* The SHA-256 context structure holds the current state of the SHA-256 computation.
   - data: Buffer to hold the current block of data being hashed (64 bytes).
   - datalen: Length of data currently in the buffer.
   - bitlen: Total length of the input data (in bits).
   - state: The state (hash values) that will hold the final output after all blocks are processed. */
typedef struct {
    uint8_t data[64];
    uint32_t datalen;
    uint64_t bitlen;
    uint32_t state[8];
} SHA256_CTX;

/* The sha256_transform function performs the main SHA-256 compression. It takes a 512-bit block of input
   data and processes it using the current state (hash values). The state is updated after this process. */
void sha256_transform(SHA256_CTX *ctx, const uint8_t data[]) {
    uint32_t a, b, c, d, e, f, g, h, i, j, t1, t2, m[64];

    /* Prepare the first 16 words (512 bits) of the message schedule array from the input data. */
    for (i = 0, j = 0; i < 16; ++i, j += 4)
        m[i] = (data[j] << 24) | (data[j + 1] << 16) | (data[j + 2] << 8) | (data[j + 3]);

    /* Expand the message schedule array for the remaining 48 words by applying
       the SHA-256-specific word expansion formulas. */
    for (; i < 64; ++i)
        m[i] = SIG1(m[i - 2]) + m[i - 7] + SIG0(m[i - 15]) + m[i - 16];

    /* Initialize the working variables with the current hash state. */
    a = ctx->state[0];
    b = ctx->state[1];
    c = ctx->state[2];
    d = ctx->state[3];
    e = ctx->state[4];
    f = ctx->state[5];
    g = ctx->state[6];
    h = ctx->state[7];

    /* Main SHA-256 transformation loop:
       - For each of the 64 rounds, update the working variables by performing the core SHA-256 operations.
       - The constants 'k[i]' and the prepared message schedule 'm[i]' are used in each round. */
    for (i = 0; i < 64; ++i) {
        t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
        t2 = EP0(a) + MAJ(a,b,c);
        h = g;
        g = f;
        f = e;
        e = d + t1;
        d = c;
        c = b;
        b = a;
        a = t1 + t2;
    }

    /* Update the final hash state with the results of the transformation. */
    ctx->state[0] += a;
    ctx->state[1] += b;
    ctx->state[2] += c;
    ctx->state[3] += d;
    ctx->state[4] += e;
    ctx->state[5] += f;
    ctx->state[6] += g;
    ctx->state[7] += h;
}

/* The sha256_init function initializes the SHA-256 context by setting the state variables
   to the initial hash values as defined in the SHA-256 specification. */
void sha256_init(SHA256_CTX *ctx) {
    ctx->datalen = 0;
    ctx->bitlen = 0;
    ctx->state[0] = 0x6a09e667;
    ctx->state[1] = 0xbb67ae85;
    ctx->state[2] = 0x3c6ef372;
    ctx->state[3] = 0xa54ff53a;
    ctx->state[4] = 0x510e527f;
    ctx->state[5] = 0x9b05688c;
    ctx->state[6] = 0x1f83d9ab;
    ctx->state[7] = 0x5be0cd19;
}

/* The sha256_update function processes chunks of input data by adding them to the context.
   It updates the state in 512-bit blocks (64 bytes), storing partial blocks until enough
   data is available for a full block. */
void sha256_update(SHA256_CTX *ctx, const uint8_t data[], size_t len) {
    for (size_t i = 0; i < len; ++i) {
        ctx->data[ctx->datalen] = data[i];
        ctx->datalen++;
        /* When the buffer is full (64 bytes), process it and reset the buffer. */
        if (ctx->datalen == 64) {
            sha256_transform(ctx, ctx->data);
            ctx->bitlen += 512;
            ctx->datalen = 0;
        }
    }
}

/* The sha256_final function pads the input data, processes the final block, and generates the final hash.
   It ensures the correct padding is applied, then processes the padded data block, and finally extracts
   the hash value from the context state. */
void sha256_final(SHA256_CTX *ctx, uint8_t hash[]) {
    uint32_t i = ctx->datalen;

    /* Pad the remaining data in the buffer with a single '1' bit, followed by enough '0' bits to make
       the total length of the final block 512 bits (64 bytes), except for the last 64 bits which store
       the length of the input data. */
    if (ctx->datalen < 56) {
        ctx->data[i++] = 0x80;  // Append the '1' bit
        while (i < 56)
            ctx->data[i++] = 0x00;  // Pad with '0's
    } else {
        /* If there's not enough room for the padding, append the '1' bit and '0's, then process the block. */
        ctx->data[i++] = 0x80;
        while (i < 64)
            ctx->data[i++] = 0x00;
        sha256_transform(ctx, ctx->data);  // Process the full block
        memset(ctx->data, 0, 56);  // Prepare for the next block
    }

    /* Append the length of the input data in bits, encoded as a 64-bit big-endian integer. */
    ctx->bitlen += ctx->datalen * 8;
    ctx->data[63] = ctx->bitlen;
    ctx->data[62] = ctx->bitlen >> 8;
    ctx->data[61] = ctx->bitlen >> 16;
    ctx->data[60] = ctx->bitlen >> 24;
    ctx->data[59] = ctx->bitlen >> 32;
    ctx->data[58] = ctx->bitlen >> 40;
    ctx->data[57] = ctx->bitlen >> 48;
    ctx->data[56] = ctx->bitlen >> 56;
    sha256_transform(ctx, ctx->data);  // Process the final block

    /* Copy the resulting hash from the state to the output buffer (hash). */
    for (i = 0; i < 4; ++i) {
        hash[i] = (ctx->state[0] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 4] = (ctx->state[1] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 8] = (ctx->state[2] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 12] = (ctx->state[3] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 16] = (ctx->state[4] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 20] = (ctx->state[5] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 24] = (ctx->state[6] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 28] = (ctx->state[7] >> (24 - i * 8)) & 0x000000ff;
    }
}

/* The sha256 function is a wrapper that initializes the SHA-256 context, processes the input data,
   and produces the final SHA-256 hash. */
void sha256(const uint8_t* data, size_t len, uint8_t* hash) {
    SHA256_CTX ctx;
    sha256_init(&ctx);
    sha256_update(&ctx, data, len);
    sha256_final(&ctx, hash);
}

// --- AES-256 CTR Mode Implementation ---

/* AES-CTR PRNG State Structure holds the context for AES-256 encryption in CTR mode.
   - ivec: Initialization vector (counter for the CTR mode).
   - ecount: A counter to store the number of processed blocks.
   - num: Number of processed bytes in the current block.
   - key: The AES encryption key (256 bits). */
typedef struct {
    uint8_t ivec[AES_BLOCK_SIZE];  // Initialization vector
    uint8_t ecount[AES_BLOCK_SIZE];  // Encryption counter
    uint32_t num;  // Number of processed bytes in current block
    uint8_t key[AES_KEY_SIZE];  // AES encryption key
} aes_ctr_state_t;

// AES S-box
static const uint8_t sbox[256] = { /* S-box values */ };

// Rcon array for AES key expansion
static const uint8_t Rcon[11] = {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36};

/* RotWord function rotates the 4-byte word left by one byte.
   For example, [A0, A1, A2, A3] becomes [A1, A2, A3, A0].
   This function is used in the AES key expansion process. */
void RotWord(uint8_t* word) {
    uint8_t tmp = word[0];
    word[0] = word[1];
    word[1] = word[2];
    word[2] = word[3];
    word[3] = tmp;
}

/* SubWord function applies the AES S-box substitution to each byte in a 4-byte word.
   This is part of the AES key expansion process where each byte in the word is replaced
   by the corresponding value in the S-box. */
void SubWord(uint8_t* word) {
    for (int i = 0; i < 4; i++) {
        word[i] = sbox[word[i]];
    }
}

/* KeyExpansion generates the round keys for AES encryption. 
   It starts with the original 256-bit key and expands it to generate round keys for each encryption round.
   - key: The original 256-bit key (32 bytes).
   - roundKeys: The expanded round keys used in the AES encryption process. */
void KeyExpansion(uint8_t* key, uint8_t* roundKeys) {
    int i, j;
    uint8_t temp[4];

    /* Copy the original key into the first 32 bytes of the round keys. */
    for (i = 0; i < AES_KEY_SIZE; i++) {
        roundKeys[i] = key[i];
    }

    /* Generate the remaining round keys using the AES key expansion algorithm. */
    for (i = AES_KEY_SIZE / 4; i < 4 * (14 + 1); i++) {
        /* Copy the previous 4-byte word. */
        for (j = 0; j < 4; j++) {
            temp[j] = roundKeys[(i - 1) * 4 + j];
        }

        /* Every 8th word, apply the RotWord and SubWord transformations, and XOR with Rcon. */
        if (i % (AES_KEY_SIZE / 4) == 0) {
            RotWord(temp);
            SubWord(temp);
            temp[0] ^= Rcon[i / (AES_KEY_SIZE / 4)];
        }

        /* XOR the word with the word 32 bytes before it to produce the next round key word. */
        for (j = 0; j < 4; j++) {
            roundKeys[i * 4 + j] = roundKeys[(i - (AES_KEY_SIZE / 4)) * 4 + j] ^ temp[j];
        }
    }
}

/* AddRoundKey XORs the current state of the AES block with the current round key.
   This is a core part of the AES encryption process, performed in every encryption round. */
void AddRoundKey(uint8_t* state, uint8_t* roundKey) {
    for (int i = 0; i < AES_BLOCK_SIZE; i++) {
        state[i] ^= roundKey[i];
    }
}

/* AES_Encrypt encrypts a single 16-byte block of input using the AES-256 algorithm.
   It takes the input block, the output buffer, and the expanded round keys.
   - input: The 16-byte plaintext block to encrypt.
   - output: The encrypted ciphertext block (also 16 bytes).
   - key: The 256-bit encryption key. */
void AES_Encrypt(uint8_t* input, uint8_t* output, uint8_t* key) {
    uint8_t state[AES_BLOCK_SIZE];  // State array holds the intermediate results
    uint8_t roundKeys[240];  // Space for 14 rounds plus the initial key
    int round;

    /* Initialize the state with the input block. */
    memcpy(state, input, AES_BLOCK_SIZE);

    /* Generate the round keys from the input key. */
    KeyExpansion(key, roundKeys);

    /* Initial AddRoundKey step. */
    AddRoundKey(state, roundKeys);

    /* Main encryption rounds (SubBytes, ShiftRows, MixColumns are omitted here for simplicity). */
    for (round = 1; round < 14; round++) {
        // SubBytes(state);
        // ShiftRows(state);
        // MixColumns(state);
        AddRoundKey(state, roundKeys + round * AES_BLOCK_SIZE);
    }

    /* Final round (without MixColumns). */
    AddRoundKey(state, roundKeys + 14 * AES_BLOCK_SIZE);

    /* Copy the state to the output buffer. */
    memcpy(output, state, AES_BLOCK_SIZE);
}

/* increment_counter increases the AES-CTR counter by 1.
   It treats the counter as a 128-bit big-endian integer and increments it. */
void increment_counter(uint8_t* counter) {
    for (int i = AES_BLOCK_SIZE - 1; i >= 0; i--) {
        counter[i]++;
        if (counter[i] != 0) break;  // Stop when there's no overflow
    }
}

/* AES_CTR_Encrypt encrypts a variable-length input using AES-256 in Counter (CTR) mode.
   It generates the keystream by encrypting an incrementing counter and XORs it with the plaintext.
   - input: The input data to encrypt.
   - output: The output buffer where encrypted data will be written.
   - length: Length of the input data (in bytes).
   - key: The 256-bit AES key.
   - iv: The initialization vector (acts as the counter in CTR mode). */
void AES_CTR_Encrypt(uint8_t* input, uint8_t* output, uint32_t length, uint8_t* key, uint8_t* iv) {
    uint8_t counter[AES_BLOCK_SIZE];  // Counter for CTR mode
    uint8_t keystream[AES_BLOCK_SIZE];  // Buffer for the generated keystream
    uint32_t num_blocks = length / AES_BLOCK_SIZE;  // Number of full blocks
    uint32_t remaining_bytes = length % AES_BLOCK_SIZE;  // Remaining bytes after full blocks

    /* Initialize the counter with the IV (initialization vector). */
    memcpy(counter, iv, AES_BLOCK_SIZE);

    /* Process full 16-byte blocks. */
    for (uint32_t i = 0; i < num_blocks; i++) {
        /* Encrypt the counter to generate the keystream. */
        AES_Encrypt(counter, keystream, key);

        /* XOR the plaintext with the keystream to produce ciphertext. */
        for (int j = 0; j < AES_BLOCK_SIZE; j++) {
            output[i * AES_BLOCK_SIZE + j] = input[i * AES_BLOCK_SIZE + j] ^ keystream[j];
        }

        /* Increment the counter for the next block. */
        increment_counter(counter);
    }

    /* Process any remaining bytes (less than 16 bytes). */
    if (remaining_bytes > 0) {
        AES_Encrypt(counter, keystream, key);
        for (uint32_t j = 0; j < remaining_bytes; j++) {
            output[num_blocks * AES_BLOCK_SIZE + j] = input[num_blocks * AES_BLOCK_SIZE + j] ^ keystream[j];
        }
    }
}

/* aes_ctr_prng_init initializes the AES-CTR pseudorandom number generator.
   It takes a seed (init_key), hashes it with SHA-256 to derive a 256-bit AES key,
   and initializes the AES state.
   - state: The AES-CTR PRNG state.
   - init_key: The seed for key generation.
   - key_length: The length of the seed (in bytes). */
int aes_ctr_prng_init(aes_ctr_state_t* state, unsigned long init_key[], unsigned long key_length) {
    assert(state != NULL && init_key != NULL && key_length > 0);

    uint8_t key[32];  // Buffer for the derived 256-bit key
    memset(state->ivec, 0, AES_BLOCK_SIZE);  // Clear the IV (initial counter)
    state->num = 0;  // Reset the number of processed bytes
    memset(state->ecount, 0, AES_BLOCK_SIZE);  // Clear the encryption counter

    /* Hash the seed (init_key) with SHA-256 to generate the AES key. */
    sha256((const uint8_t*) init_key, key_length * sizeof(unsigned long), key);

    /* Copy the derived key into the state. */
    memcpy(state->key, key, AES_KEY_SIZE);
    return 0;
}

/* aes_ctr_prng_genrand_uint256_to_buf generates pseudorandom numbers using AES-CTR mode
   and writes them to the provided buffer.
   - state: The initialized AES-CTR PRNG state.
   - bufpos: The buffer where pseudorandom numbers will be written. */
int aes_ctr_prng_genrand_uint256_to_buf(aes_ctr_state_t* state, unsigned char* bufpos) {
    assert(state != NULL && bufpos != NULL);

    uint8_t temp_buffer[32];  // Temporary buffer for generated pseudorandom bytes
    memset(temp_buffer, 0, sizeof(temp_buffer));  // Clear the temporary buffer

    /* Encrypt the counter and generate the pseudorandom bytes. */
    AES_CTR_Encrypt(state->ivec, temp_buffer, sizeof(temp_buffer), state->key, state->ivec);

    /* Copy the generated bytes to the output buffer. */
    memcpy(bufpos, temp_buffer, sizeof(temp_buffer));

    return 0;
}

/* aes_ctr_prng_general_cleanup clears and resets the AES-CTR PRNG state.
   It securely wipes the key, IV, and any sensitive information from memory. */
int aes_ctr_prng_general_cleanup(aes_ctr_state_t* state) {
    if (state != NULL) {
        /* Clear sensitive information from memory. */
        memset(state->ivec, 0, AES_BLOCK_SIZE);
        memset(state->ecount, 0, AES_BLOCK_SIZE);
        memset(state->key, 0, AES_KEY_SIZE);
        state->num = 0;
    }
    return 0;
}

@PartialVolume
Copy link
Collaborator

PartialVolume commented Sep 9, 2024

I have created also a version which does not rely at all on OpenSSL, but is capped at around 50MB/s unfortunately.

If I'd spent time writing my own AES function I would spent some extra time on optimising the code, after all you must have already spent a considerable amount of time on it, so some extra time optimising may pay off.

Have you run any speed analysis on it to see if there is a particular section that's slow? Also maybe GCC or linker optimisations might help. Can any multiplications or divisions be replaced by C shift instructions? etc. A lot depends on what GCC is doing. But I would first profile the code to figure out where you can optimise it. I might have to dig out some old books that go into how the gaming world optimise their code for maximum speed.

@PartialVolume
Copy link
Collaborator

PartialVolume commented Sep 9, 2024

One of the first things I would do is look at rotword and subword functions. Is it really necessary to put these in function calls considering how tiny they are. Consider how many millions of function calls that would save by placing the code in line in the key_expansion function. I don't know if GCC is clever enough to figure out that optimisation but that's certainly one place I would start.

@PartialVolume
Copy link
Collaborator

Is this code in a branch that can be tested?

@Knogle
Copy link
Contributor Author

Knogle commented Sep 9, 2024

Hey! Unfortunately not yet. But you can simply take the aes-ctr branch and replace all the content of the aes_ctr_prng.c It's a kind of retrofit.

It still has some issues with segfaults unfortunately, so i think it requires a little debugging before working properly. I got it working somehow, but had segfault during verify.

@Knogle
Copy link
Contributor Author

Knogle commented Sep 9, 2024

Does it run for you? For me it runs, but sometimes really slow.
Screenshot from 2024-09-09 18-13-26

@PartialVolume
Copy link
Collaborator

PartialVolume commented Sep 9, 2024

Does it run for you? For me it runs, but sometimes really slow. Screenshot from 2024-09-09 18-13-26

I've not tried it yet. As you were getting segfaults on verify and intermittent performance issues I would say you have a bug in there that's corrupting memory/stack that's maybe out of bounds errors on loops hence the segfaults and intermittent slow performance.

@Knogle
Copy link
Contributor Author

Knogle commented Sep 9, 2024

As we found out, the submodule approach is not feasible, so i will close the PR :)

@Knogle Knogle closed this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants