Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AES-GCM] cipher.AEAD is no-longer safe for concurrent use #187

Closed
lyoung-confluent opened this issue May 10, 2024 · 5 comments · Fixed by golang-fips/openssl#146 or #201
Closed
Labels
bug Something isn't working

Comments

@lyoung-confluent
Copy link

In the Go 1.21 branch, the cipher.AEAD returned for AES GCM was safe for concurrent use, matching the (undocumented) behavior from Go's stdlib and boringcrypto implementation (golang/go#25882). This was because the aesGCM struct cached the raw AES key, creating a new EVP_CIPHER_CTX (which is not thread-safe) for each seal/open operation. Now, the pointer to EVP_CIPHER_CTX is cached on the cipherGCM struct and re-used for every seal/open operation.

This means if an application caches a returned cipher.AEAD and then calls it concurrently, eventually it will crash. I have observed this crash in a real-world application, specifically Hashicorp vault caches a returned cipher.AEAD and makes use of it across multiple goroutines, eventually resulting in errors like:

decryption failed: cipher: message authentication failed

The following application can reliably reproduce the crash on Go 1.22:

package main

import (
	"crypto/aes"
	"crypto/cipher"
	"crypto/rand"
	"log"
)

func main() {
	var key [32]byte
	if _, err := rand.Read(key[:]); err != nil {
		panic(err)
	}
	log.Printf("key: %#v", key)

	block, err := aes.NewCipher(key[:])
	if err != nil {
		panic(err)
	}

	gcm, err := cipher.NewGCM(block)
	if err != nil {
		panic(err)
	}

	nonce := make([]byte, gcm.NonceSize())
	if _, err := rand.Read(nonce[:]); err != nil {
		panic(err)
	}
	log.Printf("nonce: %#v", nonce)

	ciphertext := gcm.Seal(nil, nonce, []byte("hunter2"), nil)
	log.Printf("ciphertext: %#v", ciphertext)

	for parallel := 0; parallel < 8; parallel++ {
		go func() {
			for {
				if _, err := gcm.Open(nil, nonce, ciphertext, nil); err != nil {
					panic(err)
				}
			}
		}()
	}

	<-make(chan struct{})
}
@lyoung-confluent
Copy link
Author

@derekparker this regression appears to have been introduced as part of the openssl v2 migration in #131

@lyoung-confluent
Copy link
Author

lyoung-confluent commented May 10, 2024

I have locally verified that the mutex based fix I suggested in golang-fips/openssl#145 successfully prevents the crash from occurring, though I am unsure of the performance implications of the change.

@derekparker derekparker added the bug Something isn't working label May 10, 2024
@derekparker
Copy link
Contributor

Thanks for the bug report and the fix! We will review and get that merged and tagged.

@dagood
Copy link

dagood commented May 23, 2024

Has anyone formatted the reproducer as a Go standard library test yet? I'm thinking it would make sense to include it in our Go fork patches.

derekparker added a commit to golang-fips/openssl that referenced this issue May 24, 2024
Fixes golang-fips/go#187

Co-authored-by: Derek Parker <deparker@redhat.com>
@xnox
Copy link

xnox commented Jun 3, 2024

Here is an improved test case which doesn't continue running forever, as improved by @smoser

// Test case from https://github.com/golang-fips/go/issues/187
package main

import (
	"crypto/aes"
	"crypto/cipher"
	"crypto/rand"
	"flag"
	"log"
	"os"
	"time"
)

func main() {
	numThreads := flag.Int("threads", 8, "number of threads")
	timeout := flag.Int("timeout", 10,
		"time to run for (considered success if no panics before then")
	flag.Parse()

	log.Printf("Starting parallel GCM decrypt with %d threads for %ds", *numThreads, *timeout)

	var key [32]byte
	if _, err := rand.Read(key[:]); err != nil {
		panic(err)
	}
	block, err := aes.NewCipher(key[:])
	if err != nil {
		panic(err)
	}
	gcm, err := cipher.NewGCM(block)
	if err != nil {
		panic(err)
	}
	nonce := make([]byte, gcm.NonceSize())
	if _, err := rand.Read(nonce[:]); err != nil {
		panic(err)
	}
	ciphertext := gcm.Seal(nil, nonce, []byte("hunter2"), nil)

	for parallel := 0; parallel < *numThreads; parallel++ {
		go func() {
			for {
				if _, err := gcm.Open(nil, nonce, ciphertext, nil); err != nil {
					panic(err)
				}
			}
		}()
	}
	<-time.After(time.Duration(*timeout) * time.Second)
	log.Printf("Ran successfully for %d seconds. Exiting 0.", *timeout)
	os.Exit(0)
}

This seems to reproduce with broken toolchains, and pass with exit 0 on fixed toolchains.

This could be wrapped into a racy unit test to be caught by go test -race or simply have timeouts on the unit test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants