Knox -- the high level overview
Knox is a service for storing and rotation of secrets, keys, and passwords used by other services.
The Problem Knox is Meant to Solve
Pinterest has a plethora of keys or secrets doing things like signing cookies, encrypting data, protecting our network via TLS, accessing our AWS machines, communicating with our third parties, and many more. If these keys become compromised, rotating (or changing our keys) used to be a difficult process generally involving a deploy and likely a code change. Keys/secrets within Pinterest were stored in git repositories. This means they were copied all over our company's infrastructure and present on many of our employees laptops. There was no way to audit who accessed or who has access to the keys. Knox was built to solve these problems.
The goals of Knox are:
- Ease of use for developers to access/use confidential secrets, keys, and credentials
- Confidentiality for secrets, keys, and credentials
- Provide mechanisms for key rotation in case of compromise
- Create audit log to keep track of what systems and users access confidential data
Ease of Use for Developers
One of the goals of the project is ease of use. This means a focus on building a very usable client side for developers to create, rotate, and use keys. Since there are a variety of ways keys are used across our company, we hope to provide basic interfaces for the most common of these and provide a general platform to allow for other use cases.
All secrets are stored using authenticated encryption to guarantee the information stored in the database has not been read or altered. The master key for this authenticated encryption should be protected by hardware or an HSM such as through Amazon KMS. Keys are only accessible by users and machines listed on the key specific access control list.
Each key is comprised of a set of versions. Every key will have exactly one primary version at any point in time. Keys can also have a number of active and inactive versions. Active versions mean that key version can still be used, but the primary should be preferred. Inactive means that key version should not be used; it is only around in case of a needed rollback and for auditing purposes. The reason there are three different types of keys is to allow for rotation to happen over a period of time.
For example, assume a service has two servers, A and B. This service has a signing key XYZ that it signs its sessions with to verify between the servers. If a server signs the session on A, it should verify on B. When keys are swapped however, old sessions would immediately become invalid. If the old key version was still marked as active however, this allows clients of these two servers to be slowly migrated to the new key without terminating their existing sessions.
Every request to Knox is logged. Since all requests require authentication data, this can be used to track which keys either users and machines have accessed. Combining this with visibility tools like elasticsearch, logsearch, and kibana will let security respond in case of an incident or breach. These logs could also be included in anomaly detection systems to detect incidents before we would otherwise know about them. Confidentiality will not be breached by audit-ability since key data will not be stored in the logs. Each key version has an immutable and unique key id and creation date to identify it in the logs without leaking sensitive information.
Knox consists of several parts. From the lowest level, Knox depends on another service (like mysql) to store the data and provide methods for secure cryptography. Knox has a server component that can access this data store and cryptor. Knox also has a client layer that lives on machines across the fleet. This client layer caches keys on the local file system for use by services on those machines.