- 1. Introduction
- 2. Overview
- 3. Glossary
- 4. KBMAPI
- 5. KBMAPI Endpoints
- 6. Recovery Configuration(s)
- 6.1. Recovery configurations FSM
- 6.2. End-points
- 6.3. AddRecoveryConfig (POST /recovery_configs)
- 6.4. WatchRecoveryConfigTransition (GET /recovery_configs/:uuid/watch?transition=\<name\>)
- 6.5. ListRecoveryConfigs (GET /recovery_configs)
- 6.6. ShowRecoveryConfig (GET /recovery_configs/:uuid)
- 6.7. UpdateRecoveryConfig (PUT /recovery_configs/:uuid?action=stage|unstage|activate|deactivate|reactivate)
- 6.8. Inventory: Recovery Configs associated with PIV tokens
- 6.9. Inventory Update
- 7. kbmctl
- 8. kbmd
RFD 77 provides a high level overview of how cryptography will be used to provide authentication for Triton services as well as protecting the data residing on compute nodes. This RFD is focused on the details of the mechanisms that will be used to protect data at rest (also known as EDAR—encrypted data at rest). Specific functionality to provide secured credential storage and authentication to instances (within kbmd, KBMAPI, as well as other parts of the Triton stack) will be addressed in a future RFD.
To protect the contents of machines running Triton, we will utilize the new encryption support in ZFS. During the initial setup of an encryption machine, a randomly-generated key is created and used to create the encrypted zpool. An uninitialized PIV token (which must be present for the encrypted setup to proceed) is also initialized during the setup process. The PIV token initialization performs several things:
-
Several private keys are created by the PIV token (which it will not reveal).
-
A randomly generated secret PIN is set on the PIV token. This PIN is required by the PIV token before certain keys generated in step 1 can be used.
Information about each initialization PIV token (including the randomly generated PIN) is saved in a trusted datacenter service (KBMAPI). The zpool key is then saved in an ECDH box. This ECDH box requires the PIV token to "open" the box and provide the key. For the PIV token to open the ECDH box, it must be given the PIN created during initialization. A service on each machine (kbmd) handles all the functions necessary for creating the encrypted zpool, unlocking the encrypted zpool during boot, as well as recovering from a lost of damaged PIV token.
This work introduces several new concepts and related vocabulary to the Triton ecosystem. To facilitate effective communication key terms are defined below. Most of the terms have in-depth descriptions later in this document.
ebox: A container (i.e. data structure) for encrypted data and one or more sets of keys called configurations that may be used to decrypt the encrypted data.
ebox template: An ebox with no encrypted data that is used to describe the configuration sections of an ebox. A recovery configuration is commonly held and/or distributed in an ebox template.
PIV token: A hardware device, such as a smart card or YubiKey, that meets the Personal Identity Verification standard set forth in [FIPS 201](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.201-2.pdf).
primary token: The PIV token that is typically physically attached to a computer, such as by being inserted into a USB slot. It is typically used in combination with a PIN or password to access secrets stored in one or more eboxes.
recovery: The process by which the protected content of an ebox is accessed using a recovery configuration. This is a manual operation involving M challenges and responses as a result of a primary token failure.
recovery configuration: Two integers, M and N (M ⇐ N), and a set of N public keys, M of which may be used to perform recovery in the even that a primary token is not available. Each public key corresponds to a recovery PIV token.
recovery PIV token: A PIV token that is used during recovery
recovery registry: If a machine has many eboxes on it, a recovery registry may be used to keep track of the location of all eboxes and to allow a single set of M challenges and responses to perform recovery on all eboxes. A recovery registry is not present on a system that encrypts only one zpool (e.g. a system that has no zone soft tokens).
recovery token: A 32-byte random number used by a compute node to authenticate the compute node during recovery. Do not confuse with recovery PIV token.
wrapping key:
Each ZFS dataset is encrypted with one or more keys, depending on the size of
the dataset. Those keys are protected by a wrapping key, which is the key
that is provided to zfs load-key
and similar interfaces. The wrapping key
may be changed with zfs change-key
.
YubiKey: A cryptographic token produced by [Yubico](https://www.yubico.com/). This document is only concerned with those that implement the PIV standard and as such are considered PIV tokens.
As mentioned in Overview, a trusted node service will be needed in a datacenter to store the pins and recovery data for all of the PIV tokens in the datacenter. This service is the Key Backup and Management API (KBMAPI).
KBMAPI will be a fairly simple and minimal REST service. API endpoints provide the means for adding new PIV tokens, removing PIV tokens, recovering PIV tokens (i.e. replacing a PIV token), creating new recovery tokens for a PIV token, and providing the PIN of a PIV token.
When a PIV token is added, the KBMAPI service generates a recovery token (a 32 byte blob of random data) that will be stored on the CN. The recovery token is limited to 32 bytes due to limitations in the Shamir secret sharing code used during recovery. As described in XXX, normally the PIV token’s 9E key is used to authenticate requests. The recovery token acts as a second authentication token used only when replacing a PIV token.
When PIV tokens are deleted or reinitialized, the old PIV token data should be kept in a
KBMAPI-maintained history. This history maintains the PIV token data for an
amount of time defined by the KBMAPI_HISTORY_DURATION
SAPI variable. The
default shall be 15 days. The purpose is to provide a time-limited backup
against accidental PIV token deletion.
Some PIV tokens have extensions that allow for attestation — that is a method to show that a given key was created by the device and was not imported onto the PIV token. For YubiKeys, this is done by creating a special x509 certificate as detailed here.
If an operator wishes to require attestation, they must set the
KBMAPI_REQUIRE_ATTESTATION
SAPI parameter to true
. In addition, the
KBMAPI_ATTESTATION_CA
SAPI parameter must be set to the CA certificate
used for attestation.
Additionally, an operator may wish to limit the PIV tokens that are allowed to
be used with KBMAPI to a known set of PIV tokens. To do so, an operator
sets the SAPI parameter KBMAPI_REQUIRE_TOKEN_PRELOAD
to true
. A command
line tool (working name kbmctl
) is then used by the operator to load the
range of serial numbers into KBMAPI. This is only supported for PIV tokens that
support attestation (e.g. YubiKeys). In other words, enabling
KBMAPI_REQUIRE_TOKEN_PRELOAD
requires KBMAPI_REQUIRE_ATTESTATION
to also
be enabled (but not necessarily vice versa).
It should be noted that since both the attestation and device serial numbers are non-standard PIV extensions. As such, support for either feature will require kbmd / piv-tool and potentially kbmapi to support a particular device’s implementation. Similarly, enabling the feature requires the use of PIV tokens that implement the corresponding feature (attestation or a static serial number). The initial scope will only include support for YubiKey attestation and serial numbers.
In both cases, enforcement of the policy occurs during the provisioning process (i.e. at the time of a CreatePivtoken call). Changes to either policy do not affect existing PIV tokens in KBMAPI.
The PIV token data is stored persistently by the KBMAPI service. A moray bucket is used for this purpose. The JSON config of the bucket is:
{
"desc": "token data",
"name": "tokens",
"schema": {
"index": {
"guid": { "type": "string", "unique": true },
"cn_uuid": { "type": "uuid", "unique": true }
}
}
}
The PIV token object itself is represented using JSON similar to:
{
"model": "Yubico YubiKey 4",
"serial": 5213681,
"cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5",
"guid": "97496DD1C8F053DE7450CD854D9C95B4",
"pin": "123456",
"recovery_tokens": [{
"created": 123456789,
"token": "jmzbhT2PXczgber9jyOSApRP337gkshM7EqK5gOhAcg="
}, {
"created": 2233445566,
"token": "QmUgc3VyZSB0byBkcmluayB5b3VyIG92YWx0aW5l"
}]
"pubkeys": {
"9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...",
"9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...",
"9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..."
},
"attestation": {
"9e": "-----BEGIN CERTIFICATE-----....",
"9d": "-----BEGIN CERTIFICATE-----....",
"9a": "-----BEGIN CERTIFICATE-----....."
}
}
Field | Required | Description |
---|---|---|
model |
No |
The model of the PIV token. |
serial |
No |
The serial number of the PIV token (if available). |
cn_uuid |
Yes |
The UUID of the compute node that contains this PIV token |
guid |
Yes |
The GUID of the provisioned PIV token. |
pin |
Yes |
The pin of the provisioned PIV token. |
recovery_tokens |
Yes |
An array of recovery tokens. Used as an alternate authentication key when replacing a PIV token on a machine (usually due to loss or damage of the original PIV token). They also serve as proof to KBMAPI that a recovery operation was performed. When the recovery configuration is updated, a new recovery token is generated and added to the list. A history of previous tokens is kept to allow for propagation delays of new recovery configurations. The recovery token is a random binary value, displayed as well as sent over the wire as a base64 encoded string. |
pubkeys |
Yes |
A JSON object containing the public keys of the PIV token |
pubkeys.9a |
Yes |
The public key used for authentication after the PIV token has been unlocked. |
pubkeys.9d |
Yes |
The public key used for encryption after the PIV token has been unlocked. |
pubkeys.9e |
Yes |
The public key used for authenticating the PIV token itself without a pin (e.g. used when requesting the pin of a PIV token). |
attestation |
No |
The attestation certificates for the corresponding pubkeys. |
Note that when provisioning a PIV token, if any of the optional fields are known,
(e.g. attestation
or serial
) they should be supplied during provisioning.
As a failsafe measure, when a PIV token is deleted, the entry from the PIV token
bucket is saved into a history bucket. This bucket retains up to
KBMAPI_HISTORY_DURATION
days of PIV token data (see [kbmapi-history]).
The history bucket looks very similar to the PIV token bucket:
{
"desc": "token history",
"name": "token_history",
"schema": {
"index": {
"guid": { "type": "string" },
"cn_uuid": { "type": "uuid" },
"active_range": { "type": "daterange" }
}
}
}
The major difference is that the index fields are not unique as well as the
active_range
index. An accidentally deleted PIV token that’s restored might end
up with multiple history entries, and a CN which has had a PIV token replacement
will also have multiple history entries.
The moray entry in the history bucket also looks similar, but not quite the same as the PIV token bucket:
{
"active_range": "[2019-01-01 00:00:00, 2019-03-01 05:06:07]",
"model": "Yubico YubiKey 4",
"serial": 5213681,
"cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5",
"guid": "97496DD1C8F053DE7450CD854D9C95B4",
"pin": "123456",
"recovery_tokens": [{
"created": 123456789,
"token": "jmzbhT2PXczgber9jyOSApRP337gkshM7EqK5gOhAcg="
}, {
"created": 2233445566,
"token": "QmUgc3VyZSB0byBkcmluayB5b3VyIG92YWx0aW5l"
}],
"pubkeys": {
"9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...",
"9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...",
"9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..."
},
"attestation": {
"9e": "-----BEGIN CERTIFICATE-----....",
"9d": "-----BEGIN CERTIFICATE-----....",
"9a": "-----BEGIN CERTIFICATE-----....."
},
"comment": ""
}
The major difference is the addition of the active_range
property as well as
the comment
property. The active_range
property represents the (inclusive)
start and end dates that the provisioned PIV token was in use.
It’s permitted that the same provisioned PIV token might have multiple entries in the history table. An example would be a PIV token accidentally deleted and restored would have an entry for the deletion, and then a second entry when the PIV token is retired (or reprovisioned).
The comment
field is an optional field that contains free form text. It is
intended to note the reason for the deletion.
To protect the PIV token data in Moray, we will rely on the headnode disk encryption.
QUESTION: Even though the HN PIV token will not use the GetPivtokenPin API call to obtain its pin, should we still go ahead and store the data for the HN PIV token in KBMAPI? We cannot do it when we initialize the HN PIV token because we are running the HN setup (this there is no KBMAPI up and running), and we must use a different method to provide the PIN for a PIV token on a headnode.
To support an operator preloading unprovisioned PIV tokens, we track ranges of serial numbers that are allowed to be provisioned. We use a separate moray bucket for tracking these ranges of serial numbers:
{
"desc": "token serials",
"name": "token_serial",
"schema": {
"index": {
"ca_dn": { "type": "string" },
"serial_range": { "type": "numrange" }
}
}
}
The entries look similar to:
{
"serial_range": "[111111, 123456]",
"allow": true,
"ca_dn": "cn=my manf authority",
"comment": "A useful comment here"
}
Field | Description |
---|---|
serial_range |
An range of serial numbers. This range is inclusive. |
allow |
Set to true if this range is allowed, or false is this range is blacklisted. |
ca_dn |
The distinguished name (DN) of the attestation CA for this PIV token. Used to disambiguate any potential duplicate serial numbers between vendors. |
comment |
An operator supplied free form comment |
The kbmctl
command is used to manage this data.
Given the critical nature of the PIV token data, we want to provide an audit trail of activity. While there is discussion of creating an AuditAPI at some point in the future, it currently does not look like it would be available to meet the current deadlines. Once available, we should look at the effort to migrate this functionality to AuditAPI.
In the meantime, we will provide the option of uploading the KBMAPI logs to a Manta installation using hermes or possibly the new log archiver service described in (../0163/README.md)[RFD163].
All response objects are application/json
encoded HTTP bodies. In addition,
all responses will have the following headers:
Header | Description |
---|---|
Date |
When the response was send (RFC 1123 format) |
Api-Version |
The exact version of the KBMAPI server that processed the request |
Request-Id |
A unique id for this request. |
If the response contains content, the following additional headers will be present:
Header | Description |
---|---|
Content-Length |
How much content, in bytes |
Content-Type |
The format of the response (currently always |
Content-MD5 |
An MD5 checksum of the response |
KBMAPI returns one of the following codes on an error:
Code | Description | Details |
---|---|---|
401 |
Unauthorized |
Either no Authorization header was send, or the credentials used were invalid |
405 |
Method Not Allowed |
Method not supported for the given resource |
409 |
Conflict |
A parameter was missing or invalid |
500 |
Internal Error |
An unexpected error occurred |
If an error occurs, KBMAPI will return a standard JSON error response object in the body of the response:
{
"code": "CODE",
"message": "human readable string"
}
Where code
is one of:
Code | Description |
---|---|
BadRequest |
Bad HTTP was sent |
InternalError |
Something went wrong in KBMAPI |
InvalidArgument |
Bad arguments or a bad value for an argument |
InvalidCredentials |
Authentication failed |
InvalidHeader |
A bad HTTP header was sent |
InvalidVersion |
A bad |
MissingParameter |
A required parameter was missing |
ResourceNotFound |
The resource was not found |
UnknownError |
Something completely unexpected happened |
These are the proposed endpoints to meet the above requirements. They largely document the behavior of the existing KBMAPI prototype (though in a few places describe intended behavior not yet present in the prototype).
In each case, each request should include an Accept-Version
header indicating
the version of the API being requested. The initial value defined here shall
be '1.0'.
XXX: This is largely based on the behavior of CloudAPI. Check what the behavior of CloudAPI is if no version is supplied.
Add a new initialized PIV token. Included in the request should be an
Authorization
header with a method of 'Signature' with the date header
signed using the PIV token’s 9e
key. The payload is a JSON object with the
following fields:
Field | Required | Description |
---|---|---|
guid |
Yes |
The GUID of the provisioned PIV token |
cn_uuid |
Yes |
The UUID if the CN that contains this PIV token |
pin |
Yes |
The pin for the PIV token generated during provisioning |
model |
No |
The model of the PIV token (if known) |
serial |
No |
The serial number of the PIV token (if known) |
pubkeys |
Yes |
The public keys of the PIV token generated during provisioning |
pubkeys.9a |
Yes |
The |
pubkeys.9d |
Yes |
The |
pubkeys.9e |
Yes |
The |
attestation |
No |
The attestation certificates corresponding to the |
Note: for the optional fields, they should be supplied with the request when known. Unfortunately, there is no simple way to enforce this optionality on the server side, so we must depend on the CN to supply the optional data when appropriate.
If the signature check fails, a 401 Unauthorized error + NotAuthorized code is returned.
If any of the required fields are missing, a 409 Conflict + InvalidArgument error is returned.
If the guid
or cn_uuid
fields contain a value already in use in the
tokens
bucket, a new entry is not created. Instead, the 9e
public key
from the request is compared to the 9e
key in the stored PIV token data. If
the keys match, and the signature check succeeds, then the recovery_token
value of the existing entry is returned and a 200 response is returned. This
allows the CN to retry a request in the event the response was lost.
If the 9e
key in the request does not match the 9e
key for the existing
token in the tokens
bucket, but either (or both) the guid
or cn_uuid
fields match an existing entry, a 409 Conflict + NotAuthorized error
is returned. In such an instance, an operator must manually verify if the
information in the PIV token bucket is out of date and manually delete it before
the PIV token provisioning can proceed.
If an operator has hardware with duplicate UUIDs, they must contact their hardware vendor to resolve the situation prior to attempting to provision the PIV token on the system with a duplicate UUID. While we have seen such instances in the past, they are now fairly rare. Our past experience has shown that attempting to work around this at the OS and Triton level is complicated and prone to breaking. Given what is at stake in terms of the data on the system, we feel it is an unacceptable risk to try to work around such a situation (instead of having the hardware vendor resolve it).
If the request does not generate any of the above errors, the request is
If the attestation section is supplied, the attestation certs must agree
with the pubkeys supplied in the request. If they do not agree, or if
KBMAPI_ATTESTATION_REQUIRED
is true and no attestation certs are provided, a
409 Conflict + InvalidArgument error is returned.
If KBMAPI_REQUIRE_TOKEN_PRELOAD
is true
, the serial number of
the PIV token as well as the attestation certificates of the PIV token in question
must be present in the CreatePivtoken request. KBMAPI performs a search for
a range of allowed serial numbers in the token_serial
bucket whose
attestation CA DN matches the attestation CA of the PIV token in the request.
If the serial number is not part of an allowed range, a
409 Conflict + InvalidArgument error is returned.
In addition, a recovery_token is generated by KBMAPI and stored as part of the token object. This should be a random string of bytes generated by a random number generator suitable for cryptographic purposes.
Once the entry is updated or created in moray, a successful response is returned (201) and the generated recovery token is included in the response. The recovery token is encoded as base64.
Example request (with attestation)
POST /pivtokens Host: kbmapi.mytriton.example.com Date: Thu, 13 Feb 2019 20:01:02 GMT Authorization: Signature <Base64(rsa(sha256($Date)))> Accept-Version: ~1 Accept: application/json { "model": "Yubico YubiKey 4", "serial": 5213681, "cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5", "guid": "97496DD1C8F053DE7450CD854D9C95B4", "pin": "123456", "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." }, "attestation": { "9e": "-----BEGIN CERTIFICATE-----....", "9d": "-----BEGIN CERTIFICATE-----....", "9a": "-----BEGIN CERTIFICATE-----....." } }
An example response might look like:
HTTP/1.1 201 Created Location: /pivtokens/97496DD1C8F053DE7450CD854D9C95B4 Content-Type: application/json Content-Length: 12345 Content-MD5: s5ROP0dBDWlf5X1drujDvg== Date: Fri, 15 Feb 2019 12:34:56 GMT Server: Joyent KBMAPI 1.0 Api-Version: 1.0 Request-Id: b4dd3618-78c2-4cf5-a20c-b822f6cd5fb2 Response-Time: 42 { "recovery_token": "jmzbhT2PXczgber9jyOSApRP337gkshM7EqK5gOhAcg=" }
In order to make the request/response retry-able w/o generating and saving a new
recovery_token
each time (to prevent a single recovery configuration update
from creating multiple recovery_tokens
due to network/retry issues), any
requests made after the initial PIV token creation to the same Location
(i.e.
POST /pivtokens/:guid
) will result into the same PIV token object being
retrieved.
This can be used too in order to generate new recovery tokens when a request is
made at a given time after recovery_token
creation. This time interval will
be configurable in SAPI through the variable KBMAPI_RECOVERY_TOKEN_DURATION
.
By default, this value will be set to 1 day.
When the POST
request is received for an existing PIV token, KBMAPI will
verify the antiquity of the newest member of recovery_tokens
and in case it
exceeds the aforementioned KBMAPI_RECOVERY_TOKEN_DURATION
value, it will
generate a new recovery_token
.
On all of these cases, the status code will be 200 Ok
instead of the
201 Created
used for the initial PIV token creation.
Update the current fields of a PIV token. Currently, the only field that can be
altered is the cn_uuid
field (e.g. during a chassis swap). If the new
cn_uuid
field is already associated with an assigned PIV token, or if any of
the remaining fields differ, the update fails.
This request is authenticated by signing the Date header with the PIV token’s 9e key (same as CreatePivtoken). This however does not return the recovery token in it’s response.
Example request:
PUT /pivtokens/97496DD1C8F053DE7450CD854D9C95B4 Host: kbmapi.mytriton.example.com Date: Thu, 13 Feb 2019 20:01:02 GMT Authorization: Signature <Base64(rsa(sha256($Date)))> Accept-Version: ~1 Accept: application/json { "model": "Yubico YubiKey 4", "serial": 5213681, "cn_uuid": "99556402-3daf-cda2-ca0c-f93e48f4c5ad", "guid": "97496DD1C8F053DE7450CD854D9C95B4", "pin": "123456", "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." }, "attestation": { "9e": "-----BEGIN CERTIFICATE-----....", "9d": "-----BEGIN CERTIFICATE-----....", "9a": "-----BEGIN CERTIFICATE-----....." } }
Example response:
HTTP/1.1 200 OK Location: /pivtokens/97496DD1C8F053DE7450CD854D9C95B4 Content-Type: application/json Content-Length: 1122 Content-MD5: s5ROP0dBDWlf5X1drujDvg== Date: Sun, 17 Feb 2019 10:27:43 GMT Server: Joyent KBMAPI 1.0 Api-Version: 1.0 Request-Id: 7e2562ba-731b-c91b-d7c6-90f2fd2d36a0 Response-Time: 23
When a PIV token is no longer available (lost, damaged, accidentally reinitialized,
etc.), a recovery must be performed. This allows a new PIV token to replace the
unavailable PIV token. When a recovery is required, an operator initiates the
recovery process on the CN. This recovery process on the CN will decrypt the
current recovery_token
value for the lost PIV token that was created during the
lost PIV token’s CreatePivtoken request or a subsequent CreatePivtoken
request.
For some TBD amount of time, earlier recovery_token
values may also be allowed
to account for propagation delays when updating recovery configurations using
changefeed. KBMAPI may also optionally periodically purge members of
a PIV token’s recovery_tokens
array that are sufficiently old to no longer
be considered valid (even when accounting for propagation delays).
The CN submits a RecoverPivtoken request to replace the unavailable PIV token
with a new PIV token. The :guid
parameter is the guid of the unavailable PIV token.
The data included in the request is identical to that of a CreatePivtoken request.
The major difference is that instead of using a PIV token’s 9e key to sign the date
field, the decrypted recovery_token
value is used as the signing key (in
conjunction with some HMAC mechanism).
Instead of HTTP Signature auth using the SSH key, HMAC signature using the
recovery_token
as value will be used. Note that the http signature method
requires that the resulting signature value is base64 encoded.
If the lost PIV token does not exists in KBMAPI we should reject the request with
a 404 Not Found
response.
If the request fails the authentication requests, a 401 Unauthorized
error
is returned.
If all the checks succeed, the information from the old PIV token (:guid
) is
moved to a history entry for that PIV token. Any subsequent requests to
/pivtokens/:guid
should either return a 404 Not found
reply or, in case
we add some kind of replaced_by: :new_guid
attribute to the archived PIV token,
we could also return 301 Moved Permanently
with the new PIV token location.
The newly created PIV token will then be returned, together with the proper
Location
header (/pivtokens/:new_guid
). In case of network/retry issues,
additional attempts to retrieve the new PIV token information should be made
through CreatePivtoken
end-point for the new PIV token, and these requests should
be signed by the new PIV token 9e key, instead of using HMAC with the old PIV token
recovery_token
.
An example request:
POST /pivtokens/97496DD1C8F053DE7450CD854D9C95B4/replace Host: kbmapi.mytriton.example.com Date: Thu, 13 Feb 2019 20:01:02 GMT Authorization: Signature keyId="xxxx",algorithm="hmac-sha512",headers="date",signature="<Base64(hmac-sha512($Date, $recovery_token))>" Accept-Version: ~1 Accept: application/json { "model": "Yubico YubiKey 4", "serial": 6324923, "cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5", "guid": "75CA077A14C5E45037D7A0740D5602A5", "pin": "424242", "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." }, "attestation": { "9e": "-----BEGIN CERTIFICATE-----....", "9d": "-----BEGIN CERTIFICATE-----....", "9a": "-----BEGIN CERTIFICATE-----....." } }
And an example response:
HTTP/1.1 201 Created Location: /pivtokens/75CA077A14C5E45037D7A0740D5602A5 Content-Type: application/json Content-Length: 12345 Content-MD5: s5ROP0dBDWlf5X1drujDvg== Date: Fri, 15 Feb 2019 12:54:56 GMT Server: Joyent KBMAPI 1.0 Api-Version: 1.0 Request-Id: 473bc7f4-05cf-4edb-9ef7-8b61cdd8e6b6 Response-Time: 42 { "model": "Yubico Yubikey 4", "serial": 5213681, "cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5", "guid": "75CA077A14C5E45037D7A0740D5602A5", "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." }, "recovery_tokens": [ { created: 1563348710384, token: 'cefb9c2001b535b697d5a13ba6855098e8c58feb800705092db061343bb7daa10e52a97ed30f2cf1' }] }
Note that the location contains the guid of the new PIV token.
Gets all provisioned PIV tokens. The main requirement here is no sensitive information of a PIV token is returned in the output.
Filtering by at least the cn_uuid
as well as windowing functions should be
supported.
An example request:
GET /pivtokens Host: kbmapi.mytriton.example.com Date: Wed, 12 Feb 2019 02:04:45 GMT Accept-Version: ~1 Accept: application/json
An example response:
HTTP/1.1 200 Ok Location: /pivtokens Content-Type: application/json Content-Length: 11222333 Content-MD5: s5ROP0dBDWlf5X1drujDvg== Date: Wed, 12 Feb 2019 02:04:45 GMT Server: Joyent KBMAPI 1.0 Api-Version: 1.0 Request-Id: af32dafe-b9ed-c2c1-b5e5-f5fefc40aba4 Response-Time: 55 { [ { "model": "Yubico YubiKey 4", "serial": 5213681, "cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5", "guid": "97496DD1C8F053DE7450CD854D9C95B4" "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." } }, { "model": "Yubico YubiKey 5", "serial": 12345123, "cn_uuid": "e9498ab2-d6d8-ca61-b908-fb9e2fea950a", "guid": "75CA077A14C5E45037D7A0740D5602A5", "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." } }, .... ] }
Gets the public info for a specific PIV token. Only the public fields are returned.
Example request:
GET /pivtokens/97496DD1C8F053DE7450CD854D9C95B4 Host: kbmapi.mytriton.example.com Date: Wed, 12 Feb 2019 02:10:32 GMT Accept-Version: ~1 Accept: application/json
Example response:
HTTP/1.1 200 Ok Location: /pivtokens/97496DD1C8F053DE7450CD854D9C95B4 Content-Type: application/json Content-Length: 12345 Content-MD5: s5REP1dBDWlf5X1drujDvg== Date: Wed, 12 Feb 2019 02:10:35 GMT Server: Joyent KBMAPI 1.0 Api-Version: 1.0 Request-Id: de02d045-f8df-cf51-c424-a21a7984555b Response-Time: 55 { "model": "Yubico YubiKey 4", "serial": 5213681, "cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5", "guid": "97496DD1C8F053DE7450CD854D9C95B4" "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." } }
Like GetPivtoken, except it also includes the pin
. The recovery_token
field
is not returned. This request must be authenticated using the 9E key of the
token specified by :guid
to be successful. An Authorization
header should
be included in the request, the value being the signature of the Date
header
(very similar to how CloudAPI authenticates users);
This call is used by the CN during boot to enable it to unlock the other keys on the PIV token.
An example request:
GET /pivtokens/97496DD1C8F053DE7450CD854D9C95B4/pin Host: kbmapi.mytriton.example.com Date: Wed, 12 Feb 2019 02:11:32 GMT Accept-Version: ~1 Accept: application/json Authorization: Signature <Base64(rsa(sha256($Date)))>
An example reply:
HTTP/1.1 200 OK Location: /pivtokens/97496DD1C8F053DE7450CD854D9C95B4/pin Content-Type: application/json Content-Length: 2231 Date: Thu, 13 Feb 2019 02:11:33 GMT Api-Version: 1.0 Request-Id: 57e46450-ab5c-6c7e-93a5-d4e85cd0d6ef Response-Time: 1 { "model": "Yubico YubiKey 4", "serial": 5213681, "cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5", "guid": "97496DD1C8F053DE7450CD854D9C95B4", "pin": "123456", "pubkeys": { "9e": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9d": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA...", "9a": "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYA..." }, "attestation": { "9e": "-----BEGIN CERTIFICATE-----....", "9d": "-----BEGIN CERTIFICATE-----....", "9a": "-----BEGIN CERTIFICATE-----....." } }
Deletes information about a PIV token. This would be called during the decommission process of a CN. The request is authenticated using the 9e key of the PIV token.
Sample request:
DELETE /pivtokens/97496DD1C8F053DE7450CD854D9C95B4 HTTP/1.1 Host: kbmapi.mytriton.example.com Accept: application/json Authorization: Signature <Base64(rsa(sha256($Date)))> Api-Version: ~1 Content-Length: 0
Sample response:
HTTP/1.1 204 No Content Access-Control-Allow-Origin: * Access-Control-Allow-Headers: Accept, Accept-Version, Content-Length, Content-MD5, Content-Type, Date, Api-Version, Response-Time Access-Control-Allow-Methods: GET, HEAD, POST, DELETE Access-Control-Expose-Headers: Api-Version, Request-Id, Response-Time Connection: Keep-Alive Date: Thu, 21 Feb 2019 11:26:19 GMT Server: Joyent KBMAPI 1.0.0 Api-Version: 1.0.0 Request-Id: f36b8a41-5841-6c05-a116-b517bf23d4ab Response-Time: 997
Note: alternatively, an operator can manually run kbmctl to delete an entry.
We need to support the following features related to recovery config propagation: 1. A mechanism to ensure that we do not push recovery config X until recovery config X-1 has been sucessfully activated on all consumers. 2. An override mechanism that allows recovery config X to be pushed to consumers before earlier configs are known to be active. 3. A means to test the most recent recovery config before activation across the general population. 4. The ability to not activate a recovery configuration that has been staged.
Which was translated into: 1. KBMAPI must maintain an inventory of where each configuration is present and whether it is staged or active. This inventory needs to be robust in the face of down or rebooting nodes at any point during the staging and activation phases. 2. There should be a way to unstage a staged recovery configuration. 3. There should be a way to replace a staged recovery configuration. 4. There must be a way to unstage or replace a staged recovery configuration. 5. A mechanism for activating a staged configuration on a single compute node must exist.
Each configuration object contains a template, which is a base64 encoded string created by the cmd pivy-box template create -i <name> …
.
Here is how a template is created using pivy-box
interactive mode:
$ pivy-box tpl create -i backup
-- Editing template --
Select a configuration to edit:
Commands:
[+] add new configuration
[-] remove a configuration
[w] write and exit
Choice? +
Add what type of configuration?
[p] primary (single device)
[r] recovery (multi-device, N out of M)
Commands:
[x] cancel
Choice? r
-- Editing recovery config 1 --
Select a part to edit:
Commands:
[n] 0 parts required to recover data (change)
[+] add new part/device
[&] add new part based on local device
[-] remove a part
[x] finish and return
Choice? +
GUID (in hex)? E6FB45BDE5146C5B21FCB9409524B98C
Slot ID (hex)? [9D]
Key? ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBADLQ8fNp4/+aAg7S/nWrUU6nl3bd3eajkk7LJu42qZWu8+b218MspLSzpwv3AMnwQDaIhM7kt/HhXfYgiQXd30zYAC/xZlz0TZP2XHMjJoVq4VbwZfqxXXAmySwtm6cDY7tWvFOHlQgF3SofE5Fd/6gupHy59+3dtLKwZMMU1ewcPm8sg== kbmapi test one token
-- Editing part 1 --
Read-only attributes:
GUID: E6FB45BDE5146C5B21FCB9409524B98C
Slot: 9D
Key: ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBADLQ8fNp4/+aAg7S/nWrUU6nl3bd3eajkk7LJu42qZWu8+b218MspLSzpwv3AMnwQDaIhM7kt/HhXfYgiQXd30zYAC/xZlz0TZP2XHMjJoVq4VbwZfqxXXAmySwtm6cDY7tWvFOHlQgF3SofE5Fd/6gupHy59+3dtLKwZMMU1ewcPm8sg==
Select an attribute to change:
[n] Name: (null)
[c] Card Auth Key: (none set)
Commands:
[x] finish and return
...
This is the final result, after adding several keys to the recovery config:
$ pivy-box tpl show backup
-- template --
version: 1
configuration:
type: recovery
required: 2 parts
part:
guid: E6FB45BDE5146C5B21FCB9409524B98C
name: xk1
slot: 9D
key: ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBADLQ8fNp4/+aAg7S/nWrUU6nl3bd3eajkk7LJu42qZWu8+b218MspLSzpwv3AMnwQDaIhM7kt/HhXfYgiQXd30zYAC/xZlz0TZP2XHMjJoVq4VbwZfqxXXAmySwtm6cDY7tWvFOHlQgF3SofE5Fd/6gupHy59+3dtLKwZMMU1ewcPm8sg==
part:
guid: 051CD9B2177EB12374C798BB3462793E
name: xk2
slot: 9D
key: ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBAA6H1gT8uJBMc7mknW7Wi0M2/2x/65lKZy9DLM9x60pU6wt8KsBI2PKJoUY/7Jq6dyIRckVzNh15z78agjshPu9aQHiKVRn8lEbNTuAuCr6NbEx62yQbAamf85qpQMaUT47hjHhP5srMMGb7cjBTCO1rTsVOxYcIc7bmnLEy69nRmpxaA==
part:
guid: D19BE1E0660AECFF0A9AF617540AFFB7
name: xk3
slot: 9D
key: ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBABrFyNJvVBr80bWBE9Df/b/GOnIypNxURgD0D64Nt7iT6oF163shFWLXJ04TPPSAgSX57/8e7lohol9pSczXMQaQQGaefYZKMfUvyeXpcNsu1m47axaq/HwKpwGGW0LgQ2VZQhWDQjDPP8Yr3s/krNXoV/ArwWJT7HwHocL5y7eN4TUcQ==
Here is how to get the values used by KBMAPI for a given template:
const crypto = require('crypto');
const fs = require('fs');
const input = fs.readFileSync('/path/to/.ebox/tpl/name');
// This is the template:
input.toString();
// => '6wwBAQECAgMBCG5pc3RwNTIxQwIAy0PHzaeP/mgIO0v51q1FOp5d23d3mo5JOyybu\nNqmVrvPm9tfDLKS0s6cL9wDJ8EA2iITO5Lfx4V32IIkF3d9M2AEEOb7Rb3lFGxbIf\ny5QJUkuYwCA3hrMQABCG5pc3RwNTIxQwIAOh9YE/LiQTHO5pJ1u1otDNv9sf+uZSm\ncvQyzPcetKVOsLfCrASNjyiaFGP+yaunciEXJFczYdec+/GoI7IT7vWkEEAUc2bIX\nfrEjdMeYuzRieT4CA3hrMgABCG5pc3RwNTIxQwMAaxcjSb1Qa/NG1gRPQ3/2/xjpy\nMqTcVEYA9A+uDbe4k+qBdet7IRVi1ydOEzz0gIEl+e//Hu5aIaJfaUnM1zEGkEEEN\nGb4eBmCuz/Cpr2F1QK/7cCA3hrMwA=\n'
const hash = crypto.createHash('sha512');
hash.update(input.toString());
// And this is the hash value, used as identifier:
hash.digest('hex')
// => 'f85b894ed02cbb1c32ea0564ef55ee2438a86c5a4988ca257dd7c71953f349d9cf0472838099967d9ec4ca15603efad17f6ac6b3f434c9080f99d6f2041799d7'
// Instead of the hash (or together with), we can also generate a UUID
// using the following procedure:
var buf = hash.digest();
// variant:
buf[8] = buf[8] & 0x3f | 0xa0;
// version:
buf[6] = buf[6] & 0x0f | 0x50;
var hex = buf.toString('hex', 0, 16);
var uuid = [
hex.substring(0, 8),
hex.substring(8, 12),
hex.substring(12, 16),
hex.substring(16, 20),
hex.substring(20, 32)
].join('-');
Recovery configurations will go through a Finite State Machine during their expected lifecycles. The following are the definitions of all the possible states for recovery configurations:
-
new
: This state describes the raw parameters for the recovery configuration (mostlytemplate
) before the HTTP request to create the recovery configuration record in KBMAPI has been made. -
created
: Once the recovery configuration has been created into KBMAPI through the HTTP request toPOST /recovery_configurations
. The recovery configuration now has a uniqueuuid
, the attributecreated
has been added and, additionally, the process to stage this configuration through all the Compute Nodes using EDAR has been automatically started. (TBD: Shall this really be automatic or should we make it require a explicit HTTP request, just in case we want to just stage + activate on a single CN for testing before we proceed with every CN?) -
staged
: The recovery configuration has been spread across all the CNs using EDAR (or at least to all the CNs using EDAR available at the moment we made the previous HTTP request). Confirmation has been received by KBMAPI that the "staging" process has been finished. -
active
: The request to activate the configuation across all the CNs where it has been previously staged has been sent to KBMAPI. The transtion fromstaged
toactive
will take some time. We need to keep track of the transition until it finishes. -
expired
: When a given recovery configuration has been replaced by some other and we no longer care about it being deployed across the different CNs using EDAR. This stage change for recovery configurations is a side effect of another configuration transitioning toactive
.
+-----------+ +-------------| unstaging |--------------+ | +-----------+ | | unstage() | v | +------+ POST +---------+ stage() +---------+ +--------+ | new | --------> | created | --------> | staging | -----> | staged | +------+ +---------+ +---------+ +--------+ ^ | ^ reactivate() | | | +-------------------+ activate() | | | | | +---------+ expire() +---------+ +-------------+ | | | expired | <--------- | active | <----- | activating | <------+ | +---------+ +---------+ +-------------+ | | | | | destroy() | deactivate() +--------------+ | v +---------------> | deactivating |--------+ +---------+ +--------------+ | removed | +---------+
While there is an expired
state, a given recovery configuration can only reach such state only when another one has been activated. There’s no other value in keeping around an "expired" recovery configuration than allowing operators to reuse the same configuration several times w/o having to remove previous records due to the requirement for UUID uniqueness and the way it’s generated through template hash. This configuration needs to be re-staged to all the CNs again, exactly the same way as if it were a new one.
Requirements:
-
We need to be able to recover from CNAPI being down either at the beginning or in the middle of a transition.
-
We need to be able to recover from KBMAPI going down in the middle of a transition.
-
We need to be able to provide information regarding a transition not only to the client which initiated the process with an HTTP request, but to any other client instance, due to eventual console sessions abruptly finished or just for convenience.
-
We need to be able to "undo" transitions. It’s to say, "unstage" a work in progress
staging
process or "deactivate" a work in progressactivation
process. -
We agree that it’s OK to begin these "undo" processes when the process we’re trying to rollback has reached an acceptable level of progress. For example, if we want to deactivate a recovery configuration whose activation is in progress, taking batches of 10 CNs at time, and we have already processed 20 CNs and are in the middle of the process of the next 10, it’ll be OK to wait until the activation of those 10 CNs has been completed before we stop the activation of any more CNs and begin the deactivation of the 30 CNs we are already done with.
-
We may have more than one KBMAPI instance (HA-Ready) and each one of these instances may receive requests to report either progress on the transition or current list of CNs with one or other recovery configuration active.
With all these requirements, we need to have a persistent cache which can be accessed not only by the process currently orchestrating the transition between two possible recovery configuration state, but by any other process or instance trying to provide information regarding such process or the consequences of it. We need to have a process which will orchestrate the transition, updating this persistent cache with progress as needed. This process will also lock the transition so there isn’t any other attempt to run it from more than one process at time.
This persistent cache will store, for each transition, the following information:
-
The recovery configuration this transition belongs to.
-
List of CNs/PIV Tokens to take part into the transition process (probably will be just the CNs using EDAR which are running at the moment the transition has been started)
-
List of CNs where the transition has been completed and, in case of failure, as much information as possible regarding such failures.
-
List of
taskid
for each CN where the transition is in progress. These will match withtaskid
for cn-agent into each CN which can be accessed through CNAPI using eitherGET /tasks/:task_id
orGET /tasks/:task_id/wait
. -
An indicator of wether or not the transition has been aborted.
-
An indicator of whether or not the transition is running (possibly the unique identifier of the process orchestrating the transtion)
KBMAPI should provide:
-
A process to orchestrate (run) the transtions (possibly backed up by a transient SMF service, which will come up handy in case of process exiting)
-
An end-point to watch transitions progress.
We will have a moray bucket called kbmapi_recovery_configs
with the following JSON config:
{
"desc": "Recovery configuration templates",
"name": "kbmapi_recovery_configs",
"schema": {
"index": {
"uuid": { "type": "uuid", "unique": true },
"hash": { "type": "string", "unique": true },
"template": { "type": "string" },
"state": { "type": "string" },
"created": {"type": "date"},
"staged": {"type": "date"},
"activated": {"type": "date"},
"expired": {"type": "date"}
}
}
}
Note the state
field will include not only the final FSM states, but also the transitioning states so possible values are: created
, staging
, unstaging
, staged
, activating
, deactivating
, active
, expired
and reactivating
. There’s no transition associated with expire
status, b/c that happens as a result of another configuration becoming the active one.
We may want to keep a list of configurations for historical purposes.
The persistent transition cache will be stored into another moray bucket with the following structure:
{
"desc": "Recovery configuration transitions",
"name": "kbmapi_recovery_config_transitions",
"schema": {
"index": {
"recovery_config_uuid": { "type": "uuid" },
"name": { "type": "string" },
"targets" : {"type": ["uuid"] },
"completed" : {"type": ["uuid"] },
"wip": { "type": ["uuid"] },
"taskids": { "type": ["string"] },
"concurrency": { "type": "integer" },
"locked_by": { "type": "uuid" },
"aborted": {"type": "boolean"}
}
}
}
Where targets
is the collection of CNs which need to be updated, completed
is the list of those we’re already done with, wip
are the ones we’re modifying right now and taskids
are the CNAPI’s provided taskid
for each one of the CNs included in wip
so we can check progress of such tasks using CNAPI. locked_by
should be the UUID of the process which is currently orchestrating the transition.
We need to provide a way to check for stale processes leaving a transition locked. Having a way to periodically check for such processes sanity would be ideal. Looking for moray’s mtime
for the transtion object and compare against a default timeout would be a fine starting point.
KBMAPI needs end-points to support the following command:
kbmctl recovery <add|show|list|activate|deactivate|stage|unstage|remove>
The following end-point and routes will be created:
/recovery_configs
:-
GET /recovery_configs
(ListRecoveryConfigs) -
POST /recovery_configs
(AddRecoveryConfig) -
GET /recovery_configs/:uuid
(ShowRecoveryConfig) -
PUT /recovery_configs/:uuid?action=stage
(StageRecoveryConfig) -
PUT /recovery_configs/:uuid?action=unstage
(UnstageRecoveryConfig) -
PUT /recovery_configs/:uuid?action=activate
(ActivateRecoveryConfig) -
PUT /recovery_configs/:uuid?action=deactivate
(DeactivateRecoveryConfig) -
PUT /recovery_configs/:uuid?action=reactivate
(ReactivateRecoveryConfig) -
GET /recovery_configs/:uuid/watch
(WatchRecoveryConfigTransition) -
DELETE /recovery_configs/:uuid
(RemoveRecoveryConfig)
Note that all the PUT
requests will share the same URL and parameters.
Field | Required | Description | |
---|---|---|---|
template |
Yes |
Base64 encoded recovery configuration template. |
|
concurrency |
No |
Number of ComputeNodes to update concurrently (default 10). |
|
force |
No |
Boolean, allow the addition of a new recovery config even if the latest one hasn’t been staged (default false). |
|
stage |
No |
Boolean, automatically proceed with the staging of the recovery configuration across all nodes using EDAR w/o waiting for the HTTP request for |
Field | Required | Description | |
---|---|---|---|
uuid |
Yes |
The uuid of the recovery configuration to watch. |
|
transition |
Yes |
The name of the transition to watch for the given config. |
Watch the transition from one recovery config state to the next one into the FSM.
This end-point will provide details regarding the transition progress using a JSON Stream of CNs which are or have already completed the transition, together with an eventual error message in case the transition failed for any of these CNs. When the transition has finished for all the CNs a final END
event will be sent and the connection will be closed.
The format of these Transition Progress Events
is still TBD.
In case a configuration has already finished a the given transition, the stream will be automatically closed right after the first response has been sent.
Get a list of recovery configurations. Note that both, this and the ShowRecoveryConfig end-points will grab all the existing PIV tokens in KBMAPI and provide a counter of how many PIV tokens are using each config. Additionally, the show recovery config will provide the uuids (hostnames too?) of the CNs using a given recovery configuration.
Field | Required | Description | |
---|---|---|---|
uuid |
Yes |
The uuid of the recovery configuration to retrieve. |
This returns a JSON object containing the selected recovery configuration. This is a JSON object like:
{
"uuid": "f85b894e-d02c-5b1c-b2ea-0564ef55ee24",
"template": "AAAewr22sdd...",
"hash": "0123456789abcdef",
"created": "ISO 8601 Date",
["activated": "ISO 8601 Date",]
["expired": "ISO 8601 Date",]
}
6.7. UpdateRecoveryConfig (PUT /recovery_configs/:uuid?action=stage|unstage|activate|deactivate|reactivate)
Field | Required | Description | |
---|---|---|---|
uuid |
Yes |
The uuid of the recovery configuration to stage. |
|
action |
Yes |
The transition to apply to the recovery configuration. |
|
concurrency |
No |
Number of ComputeNodes to update concurrently (default 10). |
|
pivtoken |
No |
In case we want to stage this configuration just for a given pivtoken (on a given Compute Node) |
Note that in case pivtoken
guid is provided, the recovery configuration state will not change.
Field | Required | Description | |
---|---|---|---|
uuid. |
Yes |
The uuid of the recovery configuration to remove. |
Only a recovery configuration that isn’t in use by any CN can be removed.
Note that we need at least one recovery config for everything to work properly. We’ll need to figure out a way to provide such configuration either during initial headnode setup or during initial kbmapi install …
At first pass we’ll assume that there are no encrypted CNs at all and that if we want to encrypt some, we’ll provide a mechanism to grab this config from the CN before we move ahead with the setup.
For now, we’ll just ensure that KBMAPI will reply with a hint regarding the need of adding a recovery configuration before we can add new PIV tokens.
There are different possible options to keep an up2date inventory of which recovery configuration is already staged and/or active into each CN with encrypted zpools (and therefore which recovery tokens associated witht those recovery configurations have been generated for the PIV tokens associated with these CNs).
The list of PIV Tokens stored by KBMAPI can be used as a cache of which configurations are present into each CN using EDAR. Each one of these PIV tokens have one or more recovery tokens associated with a given recovery configuration.
For example, for a CN with UUID 15966912-8fad-41cd-bd82-abe6468354b5
which has been created when a recovery configuration with hash f85b894ed0…
was active, we’ll initially have the following object with one associated recovery token:
{
"model": "Yubico YubiKey 4",
"serial": 5213681,
"cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5",
"guid": "97496DD1C8F053DE7450CD854D9C95B4",
"pin": "123456",
"recovery_tokens": [{
"created": 123456789,
"activated": 123456789,
"token": "jmzbhT2PXczgber9jyOSApRP337gkshM7EqK5gOhAcg...",
"config": "recovery config template ..."
}],
"pubkeys": {
"9e": "...",
"9d": "...",
"9a": "..."
},
"attestation": {
"9e": "....",
"9d": "....",
"9a": "...."
}
}
Note that on this initial case, the values for recovery_tokens[0].created
and recovery_tokens[0].activated
are the same, b/c this is the value we used for the initial CN setup.
If we have the need to generate another recovery token for this same PIV token, while the same configuration object is active, we’ll have the following modification to the PIV token’s recovery_tokens
member:
{
"cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5",
"guid": "97496DD1C8F053DE7450CD854D9C95B4",
...,
"recovery_tokens": [{
"created": 123456789,
"activated": 123456789,
"expired": 134567890,
"token": "jmzbhT2PXczgber9jyOSApRP337gkshM7EqK5gOhAcg...",
"config": "recovery config template ..."
}, {
"created": 134567890,
"activated": 134567890,
"token": "ecf1fc337276047347c0fdb167fb241b89226f58c95d...",
"config": "another recovery config template ..."
}],
...
}
The moment the new recovery_token has been activated, the previous one will be expired.
Then, when we add a new recovery configuration, a new recovery token will be added to each KBMAPI’s PIV token and this information will be stored into the CN too. We’ll call this latest recovery token to be "staged".
{
"cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5",
"guid": "97496DD1C8F053DE7450CD854D9C95B4",
...,
"recovery_tokens": [{
"created": 123456789,
"activated": 123456789,
"expired": 134567890,
"token": "jmzbhT2PXczgber9jyOSApRP337gkshM7EqK5gOhAcg...",
"config": "recovery config template ..."
}, {
"created": 134567890,
"activated": 134567890,
"token": "ecf1fc337276047347c0fdb167fb241b89226f58c95d...",
"config": "another recovery config template ..."
}, {
"created": 145678901,
"token": "aff4fbb14b3de5c7e9986...",
"config": "yet another recovery config template ..."
}],
...
}
Once we activate a recovery configuration already staged into all our active CNs using EDAR, each CN will update its local information accordingly and the KBMAPI’s PIV token object will look as follows:
{
"cn_uuid": "15966912-8fad-41cd-bd82-abe6468354b5",
"guid": "97496DD1C8F053DE7450CD854D9C95B4",
...,
"recovery_tokens": [{
"created": 134567890,
"activated": 134567890,
"expired": 145678911,
"token": "ecf1fc337276047347c0fdb167fb241b89226f58c95d...",
"config": "another recovery config template ..."
}, {
"created": 145678901,
"activated": 145678911,
"token": "aff4fbb14b3de5c7e9986...",
"config": "yet another recovery config template ..."
}],
...
}
Note there is no need to keep more than the recovery tokens asociated with the currently active and staged configurations. Previous recovery tokens can be removed as part of the process of adding/activating a new one, given the information they may provide will be useless at this point and in the future.
In order to provide reasonable search options for client applications trying to figure out which recovery configuration is active or staged into each Compute Node, storing the recovery tokens as an array within the PIV Tokens moray bucket is not the better approach. Instead, we’ll use a specific bucket where we’ll save each token’s properties and references to the PIV token that owns the recovery token, and the recovery configuration used for that token.
{
"desc": "Recovery tokens",
"name": "kbmapi_recovery_tokens",
"schema": {
"index": {
"pivtoken_uuid": { "type": "uuid" },
"configuration_uuid": { "type": "uuid" }
"token": { "type": "string"},
"created": {"type": "number"},
"activated": {"type": "number"},
"expired": {"type": "number"}
}
}
}
These recovery tokens will be then fetched from the PIV tokens model and loaded sorted by created
value.
For new recovery config staging
the CNs will be interested into the recovery config hash and template so those values should be provided together with the recovery token in order to avoid the need for another HTTP request.
For other actions like activate
, cancel
, remove
… the recovery config uuid would do just fine (or the hash, since it can also be used to refer the same resource).
TODO: Shall we use date
type for all these dates instead of numbers? I dunno which was the original reason for using timestamps here.
During the add/activate new config phase, there are different possible ways to keep inventory "up to date", meaning that PIV tokens stored into KBMAPI DB cache should reflect the reality of what it’s already present into the CNs using EDAR.
Of these, the most simple one is to just wait for each addition/activation/removal (… whatever the KBMAPI task) to be completed. Using this approach there will be no need at all for changefeed publisher or subscribers.
+--------+ Add recovery cfg task +-------+ run task +----------+
| KBMAPI | ----------------------> | CNAPI | ---------> | cn-agent |--+
+--------+ +-------+ +----------+ |
^ provide taskid to | ^ provide information |
| wait for completion | | about task progress |
+-------------------------------+ +-----------------------------+
Here, the "add recovery config" CN-Agent task consists of:
-
Either we’ll send the recovery_token’s details when we call the
POST /servers/:server_uuid/recovery_config
end-point, or we’ll let the cn_agent know that it has to perform an HTTP request toPOST /pivtokens/:guid
authenticated with the9e
key of the Yubikey attached to the CN in order to retrieve such information. Let’s assume at first that the simplest path will be used and, in order to save the extra HTTP request for each one of the CN agents, we’ll provide the information on the original HTTP request to CNAPI. Params:recovery token
,hash
,PIV token guid
,action
(add|activate|…
). -
The cn_agent will store then the values for the new recovery config and the new recovery token.
-
The cn_agent will refresh local sysinfo to include the information about the new config hash.
-
KBMAPI will wait for task completion.
Drawbacks/Advantages regarding using changefeed pub/sub:
-
We need to block awaiting tasks completion while running the task from KBMAPI into multiple CNs. Given we want to run this task into a configurable number of CNs in parallel, we should provide some kind of
TASK_TIMEOUT
which will be fired, for example, when CNAPI "thinks" that a server is running, but either the server isn’t or cn-agent instance there is down. Failure into a single node shouldn’t result into failure for all nodes, specially if it’s a known failure like "node is down" or "cn-agent" is down. On these cases, we should still have the new recovery tokens created into KBMAPI or some other flag for later usage of a CN which, due to whatever reason, has been unable to complete the given recovery config task. -
When a node hasn’t been able to complete the requested task due to whatever the reason (node down, cn-agent down, task execution failure) we need to provide a mechanism for the node to automatically try to get the latest configuration during the next boot of cn-agent. On these cases, we can add a task to cn-agent’s init (similar to the current sysinfo or status report ones), where the agent will perform a check against KBMAPI end-point for its own CN and verify that the local information is consistent with whatever is expected into KBMAPI and, in case it’s not, initiate a process similar to the one run during the aforementioned process.
HTTP Request /pivtokens/:cn_uuid/pin.
This is an HTTP Signature signed request
+----------+ Tusing 9e key from Yubikey. +--------+
| cn-agent | -------------------------------------------> | KBMAPI |<-+
+----------+ <------------------------------------------ +--------+ |
| PIV token including recovery tokens. |
| ^
v |
Compare local config and token |
against received information. | Once the task has been finished ^
In case of differences, init a new | update PIV token in KBMAPI |
"recovery config" related task. |------->------>------>------->----+
Note this task will be executed only when cn-agent detects that it’s running at a server where EDAR is in use (encrypted zpool information, available from sysinfo).
-
This approach has no issues with a possible flow or concurrent requests to either CNAPI or KBMAPI from the different cn-agents, since the tasks will run in batches of configurable number of CNs and we’ll wait for completion, using a known size queue.
-
Changefeed, either usig cn-agent or a custom kbm-agent means having publishers and subscribers keeping connections and processes up for something which shouldn’t happen very frequently (recovery config modifications).
This is a command line tool that exists in the KBMAPI zone used to manage
the KBMAPI data by an operator. In earlier revisions, this was called
kbmadm
, but that could cause confusion with kbmd’s kbmctl
, so a different
name was chosen.
Usage: kbmctl serials add -d CA_dn start [end]
Adds the range [start
, end
] (i.e. inclusive) that use CA_dn
as their
attestation CA to the list of PIV tokens that can be provisioned. If end
is
omitted, the range is treated as [start
, start
] (i.e. a single entry).
Usage: kbmctl serials delete -d CA_dn start [end]
Removes the serial number range [start
, end
] which use CA_dn
as their
attestation CA to the list of PIV tokens that can be provisioned. If end
is
omitted the range is treated as [start
, start
] (i.e. a single entry).
Usage: kbmctl pivtoken delete guid
Deletes the PIV token with the given guid
Usage: kbmctl restore [-f] [-c cn_uuid] guid [timestamp]
Restores the data for the PIV token with the given uuid from the history table.
If multiple entries for the same GUID are present, timestamp
must be
supplied to identify the entry to restore (the entry whose active range
contains timestamp is chosen). Optionally, restore the PIV token to the given
CN (if different from the history entry).
If the given CN already has a provisioned PIV token assigned to it, this fails unless the -f flag is provided.
-
How is the recovery configuration provided?
kbmctl recovery add -f <ebox-template>
Where template is generated with pivy box.
-
Interactive mode could exist that invokes pivy box, but not required.
-
Makes a call to KBMAPI
-
No special authentication, beyond having access to admin network.
kbmctl recovery add -n 10 - do 10 at time
See which configurations are in use
$ kbmctl recovery list HASH INUSE STATE abcdef 7 old 123456 1 active abc123 0 stage
See who is using those that are in use
$ kbmctl recovery list 123456 CN uuid1 or hostname...
Obvious KMBAPI endpoints
-
will fail if not forced when not all compute nodes are on the active config or the stage configuration
kbmctl recovery activate [-f]
-
Does pushing out all very quickly cause a cn-agent → cnapi storm that hurts cnapi or moray?
kbmd (read: kaboom-dee) has 3 big areas of responsibility:
Firstly, it’s responsible for the "recovery" process — when a server has lost its primary YubiKey/PIV token, it is responsible for providing the interface an administrator uses (either on the console or a pty) to recover encryption keys, set up a new YubiKey, and get the system back on track. Since this logically requires it to be able to set up new YubiKeys from scratch, it’s also involved in the initial setup process to keep all the responsibility for that together.
Secondly, it’s responsible for the "unlock" process at boot — determining whether the primary YubiKey is available, getting the PIN (from boot-time module or pool config for standalone, or spawning a client to talk to KBMAPI), and if those fail, deciding whether to enter "recovery".
Thirdly, it’s responsible for everything during normal runtime that’s required to make those two processes work. This mostly means keeping track of the encrypted data boxes on the machine and the "recovery registry" (getting to that in a sec). It also means operating a door server and accepting requests from a commandline admin tool, "kbmadm".
The name "kbmd" reflects this — "Key Backup and Management Daemon". (Definitely not a backronym so we can pronounce it "kaboom". Definitely not.)
Encrypted boxes on the system fundamentally come in two forms — there’s the boxes associated with the zpool (one set for the primary YubiKey and one set for recovery), and then there are boxes for each of the keys stored by the RFD77 soft-token (recall that the soft-token individually encrypts its keys even when zpool encryption is enabled, as part of the effort to make a "class break" that compromises all of the keys on the system in one single operation, as difficult as possible).
The boxes themselves are stored as a zfs property (rfd77:config
). The
currently size limitations of zfs properties should allow for a single
property to store approximately 8 boxes worth of data.
The soft-token keys have to be boxed individually to the primary PIV token (so that the primary PIV token can’t unlock all of them in a single operation), but they do not have to be boxed individually to the backup keys. In fact, it would be pretty inconvenient if they were, because we would have to do the challenge-response process at least N times for a machine with N zones on it.
So instead, the soft-token keys' backup comes in the form of a single large box (keyed only to the backup keys) which unlocks all of them. Every time we need to add or remove something from that box, we have to regenerate it from scratch using the individual boxes targeted to the primary YubiKey. So we keep a plaintext record next to it of the locations of all of the primary YubiKey boxes on disk. We call this whole structure together the "recovery registry".
This implies that the storage of these keys is somewhat managed by the system, and it is. When the soft-token wants to generate a new key, it has to coordinate with kbmd (via its door) to let it know the correct filesystem paths to find the primary boxes, and make sure the entries are added to the recovery registry and everything there is dealt with.
Since this happens when a new zone is provisioned, and an attacker is generally assumed to be able to provision things in the system, we don’t really want this to cause us to bring keys belonging to existing zones into RAM in a predictable controllable fashion. So the recovery registry is in fact split into two parts — the "old generation" and "new generation". When we add new keys we add them to the "new generation" and regenerate that only. Then, every 6-12 hours or so (completely at random) we combine the old and new generations together and regenerate the whole thing. This avoids an attacker being able to control the timing and nature of this operation easily (and it also means we don’t have to regenerate the whole registry every time we make a change — we basically bulk a bunch of changes up).
kbmd is managed using the kbmadm
command. This communicates with kbmd
via a private channel (currently a door) to send requests and receive
responses. The behavior and format of the data sent across the door between
kbmd and kbmadm is considered a private interface. Mixing versions of kbmd
and kbmadm is explicitly not supported — they should always be updated in
tandem. Since initial delivery of both programs is targeted to be a part
of the platform image, this shouldn’t impose any additional maintenance burden.
kbmadm create-zpool args…
Creates an encrypted zpool. args
are the same arguments as zpool create
.
This initializes an attached PIV token (must be present), registers the PIV
token with KBMAPI (receiving a recovery token in the process), creates a
random encryption key for the pool, creates an ebox with the zpool key and
recovery token (using the current recovery template for the recovery
configuration) and then runs zpool create
.
kbmadm unlock dataset
Opens the ebox associated with the given dataset, loads the key for the dataset,
and if the dataset corresponds to the topmost dataset of a pool, mounts all
the normal datasets that are typically mounted during a zpool import
. If
the dataset is the top most dataset in a pool, and is the system zpool (as
denoted by the presence of /pool/.system_pool), the PIV token used to
unlock the dataset’s ebox is designated as the system PIV token.
Note: we currently only create eboxes for the top most dataset in a pool, but since it would actually be more work to restrict the unlock to a top-most dataset, we leave the ability to unlock any dataset with an ebox for possible future use.
kbmadm recover
Start a recovery of an ebox (see In depth: recovery below).
kbmadm update-recovery
Update the recovery configuration of an ebox. This is currently for testing purposes, but may be retained for use in standalone (non-Triton) installations.
A recovery instance is created when another program running as root with
full privs connects to the kbmd door and sends a "begin recovery"
request (kbmadm recover
). If kbmd decides it needs to initiate recovery on
the console (e.g. during boot), it forks a child to start kbmadm to do this and
places it on the console.
The "begin recovery" request is followed by a "conversation" similar to a PAM conversation: kbmd gives the client some text and instructions on what to ask the user and what options to allow them to reply with, the client replies with the user’s response, kbmd gives more questions to ask the user etc.
At the end of the conversation, kbmd does not reply to the final response until recovery is complete.
kbmd does the following before replying to the final response:
-
A new token value is added to the
rfd77:config
zfs property on the primary zpool (i.e. zones). -
New managed box files with the GUID of the new token are created.
-
Remove the old primary token from the
rfd77:config
zfs property on the primary zpool. -
Cleanup old managed box files: any box for a GUID not in
rfd77:config
or otherwise not known are deleted.