-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more detail security risks and mitigation strategies for container checkpoints #41667
Changes from all commits
60779e2
9821f90
79cd9f8
44884c0
cdce05a
c2eb6f8
ee014d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,8 +7,8 @@ weight: 10 | |
|
||
{{< feature-state for_k8s_version="v1.25" state="alpha" >}} | ||
|
||
Checkpointing a container is the functionality to create a stateful copy of a | ||
running container. Once you have a stateful copy of a container, you could | ||
Checkpointing a container is the functionality to create a stateful copy of | ||
a running container. Once you have a stateful copy of a container, you could | ||
move it to a different computer for debugging or similar purposes. | ||
|
||
If you move the checkpointed container data to a computer that's able to restore | ||
|
@@ -25,6 +25,87 @@ should create the checkpoint archive to be only accessible by the `root` user. I | |
is still important to remember if the checkpoint archive is transferred to another | ||
system all memory pages will be readable by the owner of the checkpoint archive. | ||
|
||
## Security risks and mitigation strategies | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it will be really nice to make this page a little more actionable. Maybe even convert to the task under the https://kubernetes.io/docs/tasks/administer-cluster/ and have an example instruction on how exactly each item can be achieved. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am undecided about these changes. They definitely sound like good ideas, but are also not very specific. Not sure how to describe it better. |
||
|
||
1. **Exposure of sensitive data**: When a container is checkpointed, all memory pages, | ||
including private data and encryption keys, are saved to the local disk. If the | ||
checkpoint archive is accessed by unauthorized users, it can lead to data exposure | ||
and potential security breaches.You can mitigate this through: | ||
|
||
- Restricting access: Ensure that the checkpoint archive is accessible only | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One note here - acces control may not prevent from an access to the file if the whole disk is being backed up somewhere and somebody has an access to this backup. Worth mentioning here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CRI-O creates the checkpoint archive with root:root 600. The containerd PR does the same. Kubernetes also creates the checkpoint directory with 700. |
||
by authorized users. Set appropriate file permissions and access controls | ||
to limit access to the archive. | ||
|
||
- Encryption: Encrypt the checkpoint archive to protect the data stored | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not built-in, correct? Maybe mention that there will be time between the checkpoint is available and encrypted. If there a technique that will allow to write encrupted, this will be best There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a PR open to encrypt the checkpoint before being written to disk: checkpoint-restore/criu#2297 This will happen on the CRIU level and the data will never hit storage unencrypted. If it is merged in CRIU we need to add it to runc/crun and CRI-O/containerd. The support on the layers above CRIU will basically be calling the layer below with the right options. All the actual encryption work will happen in CRIU. |
||
within it. This adds an additional layer of security in case the archive | ||
falls into the wrong hands. | ||
|
||
2. **Transfer of checkpoint archives**: Moving checkpoint archives to another | ||
system introduces risks during the transfer process. If the archive is | ||
intercepted or tampered with, the sensitive data it contains may be compromised. | ||
Consider the following ways to protect checkpoint data in transit: | ||
|
||
- Secure file transfer: Use secure transfer protocols such as SSH or encrypted | ||
file transfer protocols (SFTP, SCP) to transfer the checkpoint archive between | ||
systems.This ensures that the data remains encrypted during transit. | ||
|
||
- Verification mechanisms: Implement mechanisms to verify the integrity and | ||
authenticity of the checkpoint archive during transfer. Cryptographic checksums | ||
or digital signatures can be used to validate the archive's integrity, ensuring | ||
that it hasn't been modified or tampered with. | ||
|
||
3. **Access control and authorization**: Controlling access to the Kubelet Checkpoint API | ||
is crucial to prevent unauthorized checkpointing operations. Consider the following | ||
security practices: | ||
|
||
- Role-based access control (RBAC): Implement RBAC policies to restrict access to the | ||
Kubelet Checkpoint API. Only authorized users or service accounts should have the | ||
necessary permissions to initiate checkpoint operations. | ||
Nitishupkr marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- Attribute-based access control (ABAC): ABAC allows access control decisions to be | ||
based on attributes associated with the user, request, or other relevant factors. | ||
Consider using ABAC policies to define fine-grained access rules for the Kubelet | ||
Checkpoint API. | ||
|
||
- Webhook authentication and authorization: Kubernetes supports webhook mechanisms for | ||
authentication and authorization. You can integrate external authentication and | ||
authorization systems by configuring webhooks to make access control decisions for the | ||
Kubelet Checkpoint API. | ||
|
||
- Network segmentation: Deploy the Kubernetes cluster in a network environment with proper | ||
segmentation and firewall rules. Limiting access to the Kubelet's API endpoints reduces | ||
the attack surface and protects against unauthorized access. | ||
|
||
4. **Secure storage of checkpoint archives**: Storing checkpoint archives securely is essential | ||
to prevent unauthorized access and tampering. Consider the following measures: | ||
|
||
- Secure storage location: Store checkpoint archives in a secure directory with restricted | ||
access permissions. The underlying CRI implementation should ensure that the checkpoint | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it true for CRI-O today? Worth mentioning or providing a link There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also this may need to be added to the CRI API method comment so every implementation will know this is needed |
||
archive is only accessible by the root user. | ||
|
||
- Monitoring and auditing: Implement monitoring and auditing mechanisms to track access to | ||
the checkpoint archive storage location. This helps detect any unauthorized access attempts | ||
and provides an audit trail for accountability. | ||
|
||
5. **Secure deletion of checkpoint archives**: When checkpoint archives are no longer needed, | ||
securely delete them to prevent unauthorized recovery of sensitive data. Ensure that deletion | ||
processes comply with secure deletion standards and overwrite the data with random values to | ||
make it unrecoverable. | ||
|
||
By implementing these security measures, you can mitigate the risks associated with checkpointing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this note for item 5 or for every item before 5? Should it be after item 7? |
||
containers and protect sensitive data from unauthorized access or exposure. | ||
|
||
6. **Integrity protection**:If the checkpoint includes sensitive data or data that requires protection against | ||
unauthorized modifications, integrity protection measures should be implemented. This typically involves using | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The implementation of authentication and integrity protection is part of the encryption scheme. It is not something that Kubernetes users should implement themselves. As Adrian mentioned above, we decided to implement this as a built-in mechanism in CRIU that would be available with different container runtimes (e.g., CRI-O, containerd). |
||
cryptographic mechanisms such as digital signatures or message authentication codes (MACs) to ensure the integrity of | ||
the checkpoint archive. These mechanisms verify that the checkpoint has not been tampered with during storage or | ||
transit. | ||
|
||
7. **Determine sensitivity**:Before proceeding with integrity protection measures, it is essential to evaluate the | ||
sensitivity of the data within the container checkpoint. Confirm whether the checkpoint contains any sensitive or | ||
confidential information that needs to be protected. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Point 1 is already solved by the implementation. Point 4 is more or less the same. Point 5 is very vague. Not sure it is helpful. The integrity protection parts are a good idea, but I think that it should happen automatically and not be left to the user. |
||
|
||
## Operations {#operations} | ||
|
||
### `post` checkpoint the specified container {#post-checkpoint} | ||
|
@@ -68,7 +149,7 @@ POST /checkpoint/{namespace}/{pod}/{container} | |
- **timeout** (*in query*): integer | ||
|
||
Timeout in seconds to wait until the checkpoint creation is finished. | ||
If zero or no timeout is specfied the default {{<glossary_tooltip | ||
If zero or no timeout is specified the default {{<glossary_tooltip | ||
term_id="cri" text="CRI">}} timeout value will be used. Checkpoint | ||
creation time depends directly on the used memory of the container. | ||
The more memory a container uses the more time is required to create | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation to move the single
a
to the next line?