overlord/devicestate: switch to ssh-keygen for device key generation #3130

Merged
merged 36 commits into from Apr 13, 2017

Conversation

Projects
None yet
9 participants
Contributor

vosst commented Apr 3, 2017

Switch to ssh-keygen for device key generation.

Please note that placement of key files in /tmp is temporary and we are awaiting advice on where to put the files.

vosst added some commits Apr 3, 2017

overlord/devicestate/handlers.go
+ deviceAPIBase = deviceAPIBaseURL()
+ requestIDURL = deviceAPIBase + "request-id"
+ serialRequestURL = deviceAPIBase + "devices"
+ sshKeyFile = "/tmp/snapd.key.tmp"
@pedronis

pedronis Apr 3, 2017

Contributor

we need to think where to best put these, though we know ssh-keygen will make sure they are root only accessible

overlord/devicestate/handlers.go
)
+func sshGenerateKey() (*rsa.PrivateKey, error) {
@pedronis

pedronis Apr 3, 2017

Contributor

an approach we use when we have things that invoke external commands is having:

sshGenerateKeyImpl this function
var sshGenerateKey = sshGenerateKeyImpl
MockSshGenarateKey

so most tests would Mock this function with something that doesn't fork/exec,
and a couple would test the real thing. It's worth only if the whole devicestate suite is much slower than on master

@vosst

vosst Apr 3, 2017

Contributor

I did a couple of test runs and both master and this branch take mostly the same time, differences are in the range of milliseconds.

@chipaca

chipaca Apr 4, 2017

Member

FWIW we also have testutil.MockCommand

overlord/devicestate/handlers.go
+
+ _, err := cmd.CombinedOutput()
+ if err != nil {
+ return nil, err
@pedronis

pedronis Apr 3, 2017

Contributor

we probably want to use osutil.OutputErr here:

out, err := cmd.CombinedOutput()
if err != nil {
return nil, osutil.OutputErr(out, err)
}

@vosst

vosst Apr 3, 2017

Contributor

Thanks, fixed.

overlord/devicestate/handlers.go
+ }
+
+ blk, _ := pem.Decode(d)
+ if err != nil {
@pedronis

pedronis Apr 3, 2017

Contributor

this is a left over from before noticing that Decode doesn't return an error actually

@vosst

vosst Apr 3, 2017

Contributor

Thanks, fixed.

Contributor

tyhicks commented Apr 3, 2017

Hello - I was asked to take a look at this and was told that the change was for first-boot performance reasons on rpi2 and others.

I'm not a fan of this change since ssh-keygen is an external program and has a configuration file that we're not in control of. I feel like we should enable khwrng since most of these ARM chips have hardware random number generators and then take a better look into why Go's RSA keygen routine is taking so long.

I hear that @vosst is working on gathering some performance numbers between Go and OpenSSH keygen routines.

Contributor

vosst commented Apr 3, 2017

@tyhicks Thanks for your remarks. We are putting this PR up for review early as we treat it as a fallback right now. We already did the work of enabling hwrng and have it participate in the kernel's entropy gathering process. I'm deep diving into Go's rsa key generation algorithm and compare it to the one used by ssh-keygen.

Contributor

ogra1 commented Apr 3, 2017

@tyhicks even with the khwrng enabled the key generation in Go takes minutes while ssh-keygen takes seconds ... apparently there are some deep changes in Go needed to fix this as i understand. the ssh-keygen switch is only thought as an interim solution til Go itself can be fixed.

Contributor

ogra1 commented Apr 3, 2017

testable images with this change included are at http://people.canonical.com/~ogra/snappy/all-snaps/ssh-keygen-test/

Is there any reason why 25519 keys aren't being generated instead? IIRC those are just 32 bytes of randomness with a few bits set specifically to ensure that the resulting key is on the curve. No primality testing, no probabilistic generation, etc.

Thanks

vosst added some commits Apr 3, 2017

Contributor

mwhudson commented Apr 3, 2017

then take a better look into why Go's RSA keygen routine is taking so long

My understanding is that it's just less sophisticated. IIRC (and this maybe wrong), the performance of generating RSA keys is dominated by the performance of running the Miller-Rabin primality test which is dominated by the performance of Montgomery multiplication. openssl has hyper-optimized assembly montgomery multiplication routines, Go doesn't.

Contributor

vosst commented Apr 4, 2017

@mwhudson yup, I'm taking cpu profiles right now, and Go's BigInt operations that are repeatedly used in the primality tests seem to be the bottleneck here. I will follow up with more data.

See for example golang/go#15833.

Simple profiling is here: https://gist.github.com/vosst/cb3d16e1f1596baf6ccb989d6b1897a7
Results for x86_64 (Core i7): https://www.NoFile.io/f/2ExcAuaQpgO

Contributor

ogra1 commented Apr 4, 2017

http://paste.ubuntu.com/24312862/ has a set of captured data from 16 installs in a row (freshly flashed SD card each) with an image using this patch, the key generation with this takes between 10 and 20 seconds (compared to 5-15 minutes using the Go routine)

Contributor

zyga commented Apr 4, 2017

There's a failure in one of the tests. That test is pretty racy and I saw it fail many times recently:

FAIL: overlord_test.go:360: overlordSuite.TestEnsureLoopPrune

overlord_test.go:384:
    c.Assert(st.Change(chg1.ID()), Equals, chg1)
... obtained *state.Change = (*state.Change)(nil)
... expected *state.Change = &state.Change{state:(*state.State)(0xc820369dc0), id:"1", kind:"abort", summary:"...", status:0, clean:false, data:state.customData{}, taskIDs:[]string{"1"}, lanes:0, ready:(chan struct {})(0xc82020a6c0), spawnTime:time.Time{sec:63626902304, nsec:782578764, loc:(*time.Location)(0x110fa80)}, readyTime:time.Time{sec:63626902304, nsec:884419528, loc:(*time.Location)(0x110fa80)}}
Contributor

zyga commented Apr 4, 2017

There's some formatting errors as well:

Formatting wrong in following files:
/home/travis/gopath/src/github.com/snapcore/snapd/overlord/devicestate/handlers.go

I suspect you want to try gofmt -s (simplify)

Contributor

vosst commented Apr 4, 2017

Applied gofmt -s (did not know about -s).

zyga and others added some commits Apr 5, 2017

overlord/devicestate/handlers.go
- out, err := cmd.CombinedOutput()
- if err != nil {
- return nil, osutil.OutputErr(out, err)
+ switch C.rsa_generate_key(C.uint64_t(keyLength), C.CString(sshKeyFile), C.CString(sshPublicKeyFile)) {
@chipaca

chipaca Apr 7, 2017

Member

does rsa_generate_key take ownership of the strings? otherwise you're leaking them

@vosst

vosst Apr 7, 2017

Contributor

Good catch, forgot to push the respective commit.

overlord/devicestate/handlers.go
+ os.Remove(sshKeyFile)
+ os.Remove(sshPublicKeyFile)
+
+ switch C.rsa_generate_key(C.uint64_t(keyLength), C.CString(sshKeyFile), C.CString(sshPublicKeyFile)) {
@pedronis

pedronis Apr 7, 2017

Contributor

why still using files on disk now?

@vosst

vosst Apr 7, 2017

Contributor

Mainly because building up an rsa.PrivateKey is non-trivial and I'm using the parsing functions to get the job done. I haven't looked too deeply if I could leverage the openssl PEM encoding functions to place the encoded key into memory.

overlord/devicestate/handlers.go
+ case C.RSA_KEY_GENERATION_IO_FAILURE:
+ return nil, errors.New("Failed to generate RSA key: Could not persist keys.")
+ case C.RSA_KEY_GENERATION_SUCCESS:
+ break
@zyga

zyga Apr 7, 2017

Contributor

Is this exhaustive? Do we need a default "unknown error" type case?

@vosst

vosst Apr 7, 2017

Contributor

I modelled the enumeration as exhaustive as possible, aiming to capture all possible error conditions. For that, I left out the generic error code.

overlord/devicestate/rsa_generate_key.c
+#include <stdint.h>
+#include <stdio.h>
+
+RSAKeyGenerationResult rsa_generate_key(uint64_t bits, const char *private_key_file, const char *public_key_file) {
@zyga

zyga Apr 7, 2017

Contributor

How about calling this snapd_... so that various snapd C functions have a common prefix?

@vosst

vosst Apr 7, 2017

Contributor

Done.

packaging/ubuntu-16.04/changelog
- -- Thomas Voß <thomas.voss@canonical.com> Mon, 03 Apr 2017 16:49:34 +0200
-
-snapd (2.23.6) xenial; urgency=medium
+napd (2.23.6) xenial; urgency=medium
@zyga

zyga Apr 7, 2017

Contributor

I suspect this is undesired

@vosst

vosst Apr 7, 2017

Contributor

Yes, fixed.

overlord/devicestate/rsa_generate_key.h
+#include <stdint.h>
+
+typedef enum {
+ RSA_KEY_GENERATION_SUCCESS,
@zyga

zyga Apr 7, 2017

Contributor

Ditto for prefixing those with some snapd specific prefix.

@vosst

vosst Apr 7, 2017

Contributor

Done.

overlord/devicestate/rsa_generate_key.c
+ goto free_all;
+ }
+
+ bp_public = BIO_new_file(public_key_file, "w+");
@pedronis

pedronis Apr 7, 2017

Contributor

seems there is something like BIO_new_mem_buf we could use instead

@vosst

vosst Apr 7, 2017

Contributor

Indeed, I adjusted the code to use it and avoid any sort of out-of-band file operations.

Contributor

ogra1 commented Apr 7, 2017

thomas asked me to do another round of tests after switching to direct openssl calls from shelling out to ssh-keygen ... http://paste.ubuntu.com/24334125/ has the results of 15 installs from the "Generating device key" step to "Mark system seeded". numbers do not seem much different from the ssh-keygen case, in average the key generation is around 10sec with 4sec being the fastest and about 30sec being the slowest.

vosst added some commits Apr 7, 2017

@vosst vosst changed the title from overlord/devicestate: switch to keygen for device key generation to overlord/devicestate: switch to lib(open)ssl for device key generation Apr 10, 2017

overlord/devicestate/handlers.go
)
+func rsaGenerateKey() (*rsa.PrivateKey, error) {
@pedronis

pedronis Apr 10, 2017

Contributor

I would call this one generateRSAKey(keyLength int) ...
and at this point probably better to split this out also to a generate_rsa_key.go file

@vosst

vosst Apr 10, 2017

Contributor

Done.

overlord/devicestate/handlers.go
+
+ switch C.snapd_rsa_generate_key(C.uint64_t(keyLength), &privateKey) {
+ case C.SNAPD_RSA_KEY_GENERATION_SEED_FAILURE:
+ return nil, errors.New("Failed to generate RSA key: RNG not seeded.")
@pedronis

pedronis Apr 10, 2017

Contributor

s/Failed to/cannot/, no final dot, same for all errors here

@vosst

vosst Apr 10, 2017

Contributor

Done.

overlord/devicestate/handlers.go
+ case C.SNAPD_RSA_KEY_GENERATION_SEED_FAILURE:
+ return nil, errors.New("Failed to generate RSA key: RNG not seeded.")
+ case C.SNAPD_RSA_KEY_GENERATION_ALLOCATION_FAILURE:
+ return nil, errors.New("Failed to generate RSA key: Could not allocate memory.")
@pedronis

pedronis Apr 10, 2017

Contributor

should be lowercase except for RSA etc, "...: could not"

@vosst

vosst Apr 10, 2017

Contributor

Done.

overlord/devicestate/handlers.go
+ defer C.free(unsafe.Pointer(privateKey.memory))
+ blk, _ := pem.Decode(C.GoBytes(unsafe.Pointer(privateKey.memory), C.int(privateKey.size)))
+ if blk == nil {
+ return nil, errors.New("Failed to decode PEM block")
@pedronis

pedronis Apr 10, 2017

Contributor

cannot

@vosst

vosst Apr 10, 2017

Contributor

Done.

overlord/devicestate/rsa_generate_key.c
+ BIO_clear_flags(bp_private, BIO_FLAGS_MEM_RDONLY);
+
+ if (PEM_write_bio_RSAPrivateKey(bp_private, rsa, empty_cipher, empty_passphrase, empty_passphrase_len, NULL, NULL) != 1) {
+ result = SNAPD_RSA_KEY_GENERATION_IO_FAILURE;
@pedronis

pedronis Apr 10, 2017

Contributor

this is more of a serialization/memory error now than I/O, no?

@vosst

vosst Apr 10, 2017

Contributor

We cannot really tell. Might fail due to an issue in allocating memory by the underlying BIO memory writer.

overlord/devicestate/rsa_generate_key.c
+ // allocation functions and just free'ing it might not be correct.
+ BUF_MEM *buf_mem = NULL;
+ BIO_get_mem_ptr(bp_private, &buf_mem);
+ private_key->memory = malloc(buf_mem->length);
@pedronis

pedronis Apr 10, 2017

Contributor

we should check that ->memory is not NULL ?

@vosst

vosst Apr 10, 2017

Contributor

Done.

overlord/devicestate/rsa_generate_key.c
+ }
+
+ BIO_set_mem_buf(bp_private, BUF_MEM_new(), BIO_CLOSE);
+ BIO_clear_flags(bp_private, BIO_FLAGS_MEM_RDONLY);
@pedronis

pedronis Apr 10, 2017

Contributor

what's the purpose of this?

@vosst

vosst Apr 10, 2017

Contributor

It's obsolete now, removed.

vosst added some commits Apr 10, 2017

thanks, a few more small comments

overlord/devicestate/export_test.go
oldKeyLength := keyLength
+ if n < 1024 {
+ n = 1024
@pedronis

pedronis Apr 10, 2017

Contributor

this applies also to the lib version?

overlord/devicestate/generate_rsa_key.go
+ "unsafe"
+)
+
+func generateRSAKey(keyLength uint64) (*rsa.PrivateKey, error) {
@pedronis

pedronis Apr 10, 2017

Contributor

can keyLength be just an int?

overlord/devicestate/generate_rsa_key.go
+ case C.SNAPD_RSA_KEY_GENERATION_KEY_GENERATION_FAILURE:
+ return nil, errors.New("cannot generate RSA key")
+ case C.SNAPD_RSA_KEY_GENERATION_IO_FAILURE:
+ return nil, errors.New("cannot generate RSA key: could not persist keys")
@pedronis

pedronis Apr 10, 2017

Contributor

s/IO/MARSHAL/ s/persist/marshal/

otherwise it sounds this is about disk which is not the case anymore

@vosst

vosst Apr 10, 2017

Contributor

Done.

overlord/devicestate/handlers.go
@@ -85,7 +82,7 @@ func (m *DeviceManager) doGenerateDeviceKey(t *state.Task, _ *tomb.Tomb) error {
return nil
}
- keyPair, err := rsa.GenerateKey(rand.Reader, keyLength)
+ keyPair, err := generateRSAKey(keyLength)
@pedronis

pedronis Apr 10, 2017

Contributor

this is something we should have done before, we can unlock around this:
st.Unlock()
keyPari, err := ...
st.Lock()

@vosst

vosst Apr 10, 2017

Contributor

Done.

overlord/devicestate/rsa_generate_key.c
+ BIO_set_mem_buf(bp_private, BUF_MEM_new(), BIO_CLOSE);
+
+ if (PEM_write_bio_RSAPrivateKey(bp_private, rsa, empty_cipher, empty_passphrase, empty_passphrase_len, NULL, NULL) != 1) {
+ result = SNAPD_RSA_KEY_GENERATION_IO_FAILURE;
@pedronis

pedronis Apr 10, 2017

Contributor

s/IO/MARSHAL/

@vosst

vosst Apr 10, 2017

Contributor

Done.

vosst added some commits Apr 10, 2017

@vosst vosst changed the title from overlord/devicestate: switch to lib(open)ssl for device key generation to overlord/devicestate: switch to ssh-keygen for device key generation Apr 10, 2017

overlord/devicestate/generate_rsa_key.go
+var (
+ sshKeyFile = "snapd.key.tmp"
+ sshPublicKeyFile = sshKeyFile + ".pub"
+)
@tyhicks

tyhicks Apr 10, 2017

Contributor

What directory are we in when creating this? Should an absolute path be specified instead of blindly going with the current working directory?

@vosst

vosst Apr 10, 2017

Contributor

I will happily rely on security's guidance here.

@tyhicks

tyhicks Apr 10, 2017

Contributor

I don't have a specific suggestion as I'm not sure what directory snapd typically uses for this sort of thing. Ideally, it should only be readable/writable by root.

@niemeyer any suggestions?

@tyhicks

tyhicks Apr 10, 2017

Contributor

/run/snapd/ feels like the right place but I noticed that it isn't always created. For example, a test system of mine with no snaps installed does not have /run/snapd/ but it is created once a snap is installed.

I see it is referenced in dirs/dirs.go with the SnapRunNsDir variable. It is probably best that someone more active in snapd development advises you on if it is appropriate to use for your case.

@ogra1

ogra1 Apr 10, 2017

Contributor

this is moot, this function is used during the firstboot process of Ubuntu Core IoT images to generate the device key ... i dont think when using snapd on a desktop this function is used at all, you can be sure that /run/snapd exists and snapd is running at that point.

vosst added some commits Apr 11, 2017

PR changed again

overlord/devicestate/generate_rsa_key.go
+
+ os.Remove(sshKeyFile)
+ os.Remove(sshPublicKeyFile)
+
@pedronis

pedronis Apr 11, 2017

Contributor

what ogra said is not fully true, we will use/might use this also on classic at some point, it's safer to create that dir here as well if it doesn't exist

@vosst

vosst Apr 11, 2017

Contributor

Done.

overlord/devicestate/generate_rsa_key.go
- return nil, errors.New("cannot generate RSA key: could not marshal key")
- case C.SNAPD_RSA_KEY_GENERATION_SUCCESS:
- break
+ cmd := exec.Command("ssh-keygen", "-t", "rsa", "-b", strconv.FormatInt(int64(keyLength), 10), "-N", "", "-f", sshKeyFile)
@zyga

zyga Apr 11, 2017

Contributor

Do we properly depend on this in our packaging?

@vosst

vosst Apr 11, 2017

Contributor

Both runtime and build time dependency list openssh-client (for Ubuntu at least).

Contributor

zyga commented Apr 11, 2017

@morphis FYI: potential new dependency for packaging

vosst added some commits Apr 11, 2017

A few trivial comments below.

Please get a review by @tyhicks on this one as well.

overlord/devicestate/export_test.go
@@ -28,6 +28,9 @@ import (
func MockKeyLength(n int) (restore func()) {
oldKeyLength := keyLength
+ if n < 1024 {
+ n = 1024
@niemeyer

niemeyer Apr 11, 2017

Contributor

That's misleading. If we can't take < 1024, let's panic instead and force the test to read correctly.

@vosst

vosst Apr 12, 2017

Contributor

Done.

overlord/devicestate/generate_rsa_key.go
@@ -0,0 +1,77 @@
+// -*- Mode: Go; indent-tabs-mode: t -*-
@niemeyer

niemeyer Apr 11, 2017

Contributor

This file would be better as crypto.go or similar, so it implies a better scope than this one function.

@vosst

vosst Apr 12, 2017

Contributor

Done.

overlord/devicestate/generate_rsa_key.go
+)
+
+func generateRSAKey(keyLength int) (*rsa.PrivateKey, error) {
+ sshKeyFile := filepath.Join(dirs.SnapRunDir, "snapd.key.tmp")
@niemeyer

niemeyer Apr 11, 2017

Contributor

It would be better to have this as a dynamically named temporary file instead of a fixed one. Otherwise we can get hard to debug issues because of concurrently running logic.

@vosst

vosst Apr 12, 2017

Contributor

Done.

overlord/devicestate/generate_rsa_key.go
+ os.Remove(sshKeyFile)
+ os.Remove(sshPublicKeyFile)
+
+ cmd := exec.Command("ssh-keygen", "-t", "rsa", "-b", strconv.FormatInt(int64(keyLength), 10), "-N", "", "-f", sshKeyFile)
@niemeyer

niemeyer Apr 11, 2017

Contributor

This is strconv.Itoa.

@vosst

vosst Apr 12, 2017

Contributor

Done.

overlord/devicestate/generate_rsa_key.go
+
+ blk, _ := pem.Decode(d)
+ if blk == nil {
+ return nil, errors.New("cannot decode PEM block")
@niemeyer

niemeyer Apr 11, 2017

Contributor

Is the second parameter there an error?

If so, might be useful to have the hint of what was broken:

if err != nil {
        return nil, fmt.Errorf("cannot decode PEM block: %v", err)
}
@vosst

vosst Apr 11, 2017

Contributor

The second parameter is the remainder of the input after decoding one block.

vosst added some commits Apr 11, 2017

A couple of trivials, but LGTM anyway.

Thanks for all the work on this fix!

overlord/devicestate/crypto.go
+
+ defer func() {
+ os.RemoveAll(tempDir)
+ }()
@niemeyer

niemeyer Apr 12, 2017

Contributor

This may now be defer os.RemoveAll(tempDir) directly.

@vosst

vosst Apr 12, 2017

Contributor

Done.

overlord/devicestate/devicestate_test.go
@@ -71,7 +72,10 @@ type deviceMgrSuite struct {
restoreOnClassic func()
}
-var _ = Suite(&deviceMgrSuite{})
+var (
+ _ = Suite(&deviceMgrSuite{})
@niemeyer

niemeyer Apr 12, 2017

Contributor

This reads slightly odd. Having them in their own var lines might be nicer in this particular case.

@vosst

vosst Apr 12, 2017

Contributor

Done.

vosst added some commits Apr 12, 2017

zyga approved these changes Apr 13, 2017

Looks good, I'll review remaining discussion and merge if all feedback is addressed.

@@ -134,7 +135,8 @@ func SetRootDir(rootdir string) {
SnapMetaDir = filepath.Join(rootdir, snappyDir, "meta")
SnapBlobDir = filepath.Join(rootdir, snappyDir, "snaps")
SnapDesktopFilesDir = filepath.Join(rootdir, snappyDir, "desktop", "applications")
- SnapRunNsDir = filepath.Join(rootdir, "/run/snapd/ns")
+ SnapRunDir = filepath.Join(rootdir, "/run/snapd")
+ SnapRunNsDir = filepath.Join(SnapRunDir, "/ns")
@zyga

zyga Apr 13, 2017

Contributor

Thank you for handling this detail :-)

@@ -85,7 +82,9 @@ func (m *DeviceManager) doGenerateDeviceKey(t *state.Task, _ *tomb.Tomb) error {
return nil
}
- keyPair, err := rsa.GenerateKey(rand.Reader, keyLength)
+ st.Unlock()
@zyga

zyga Apr 13, 2017

Contributor

Why do we unlock the state here? To run an external program?

@@ -58,6 +59,7 @@ Depends: adduser,
apparmor (>= 2.10.95-0ubuntu2.2),
ca-certificates,
gnupg1 | gnupg,
+ openssh-client,
squashfs-tools,
@zyga

zyga Apr 13, 2017

Contributor

@morphis ^^ FYI

@zyga zyga merged commit b8e07b5 into snapcore:master Apr 13, 2017

6 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
xenial-amd64 autopkgtest finished (success)
Details
xenial-i386 autopkgtest finished (success)
Details
xenial-ppc64el autopkgtest finished (success)
Details
yakkety-amd64 autopkgtest finished (success)
Details
zesty-amd64 autopkgtest finished (success)
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment