Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more detail security risks and mitigation strategies for container checkpoints #41667

Closed
wants to merge 7 commits into from

Conversation

Nitishupkr
Copy link

Fixes: #41638
Added more detail about security risks and mitigation strategies for checkpointing containers

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 17, 2023
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 17, 2023
@netlify
Copy link

netlify bot commented Jun 17, 2023

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit ee014d5
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/648fe072b5d05d00086a10cb
😎 Deploy Preview https://deploy-preview-41667--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@Nitishupkr
Copy link
Author

I have made the changes please see it once @tengqm

@sftim
Copy link
Contributor

sftim commented Jun 18, 2023

/sig node
/sig security

Tech reviewers: please also triage the related issue, if reviewing.

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/security Categorizes an issue or PR as relevant to SIG Security. labels Jun 18, 2023
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about integrity protection?

This PR should get a technical review. It's OK not to add this advice (the feature is alpha right now), but if we do add it, the advice has to be right.


Do we know that the checkpoint contains sensitive data? If not, we might want to add an initial step to confirm that.

A checkpoint that only contains public (or can-be-public) data needs integrity checks but not confidentiality protection.

@sftim
Copy link
Contributor

sftim commented Jun 18, 2023

/retitle Add more detail security risks and mitigation strategies for container checkpoints

@k8s-ci-robot k8s-ci-robot changed the title docs: Added more detail about security risks and mitigation strategies for checkpointing containers Add more detail security risks and mitigation strategies for container checkpoints Jun 18, 2023
@Nitishupkr
Copy link
Author

Nitishupkr commented Jun 18, 2023

What about integrity protection?

This PR should get a technical review. It's OK not to add this advice (the feature is alpha right now), but if we do add it, the advice has to be right.

Do we know that the checkpoint contains sensitive data? If not, we might want to add an initial step to confirm that.

A checkpoint that only contains public (or can-be-public) data needs integrity checks but not confidentiality protection.

Integrity protection is indeed an important aspect to consider when dealing with checkpointing containers .I will add it in this.
Creating a checkpoint of a container involves considerations for both integrity protection and confidentiality, depending on the sensitivity of the data contained in the checkpoint. I will add all this in it .Thanks for the suggestion

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Jun 19, 2023
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Jun 20, 2023
@Nitishupkr
Copy link
Author

@sftim can u please see it .

@tengqm
Copy link
Contributor

tengqm commented Oct 10, 2023

/approve

@tengqm
Copy link
Contributor

tengqm commented Oct 10, 2023

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 10, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tengqm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2023
@divya-mohan0209
Copy link
Contributor

@kubernetes/sig-node-pr-reviews : PTAL!

@niranjandarshann
Copy link
Contributor

by authorized users. Set appropriate file permissions and access controls
to limit access to the archive.

- Encryption: Encrypt the checkpoint archive to protect the data stored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not built-in, correct? Maybe mention that there will be time between the checkpoint is available and encrypted. If there a technique that will allow to write encrupted, this will be best

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR open to encrypt the checkpoint before being written to disk: checkpoint-restore/criu#2297

This will happen on the CRIU level and the data will never hit storage unencrypted. If it is merged in CRIU we need to add it to runc/crun and CRI-O/containerd. The support on the layers above CRIU will basically be calling the layer below with the right options. All the actual encryption work will happen in CRIU.

checkpoint archive is accessed by unauthorized users, it can lead to data exposure
and potential security breaches.You can mitigate this through:

- Restricting access: Ensure that the checkpoint archive is accessible only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note here - acces control may not prevent from an access to the file if the whole disk is being backed up somewhere and somebody has an access to this backup. Worth mentioning here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRI-O creates the checkpoint archive with root:root 600. The containerd PR does the same.

Kubernetes also creates the checkpoint directory with 700.

to prevent unauthorized access and tampering. Consider the following measures:

- Secure storage location: Store checkpoint archives in a secure directory with restricted
access permissions. The underlying CRI implementation should ensure that the checkpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it true for CRI-O today? Worth mentioning or providing a link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this may need to be added to the CRI API method comment so every implementation will know this is needed

@@ -25,6 +25,87 @@ should create the checkpoint archive to be only accessible by the `root` user. I
is still important to remember if the checkpoint archive is transferred to another
system all memory pages will be readable by the owner of the checkpoint archive.

## Security risks and mitigation strategies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be really nice to make this page a little more actionable. Maybe even convert to the task under the https://kubernetes.io/docs/tasks/administer-cluster/ and have an example instruction on how exactly each item can be achieved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am undecided about these changes. They definitely sound like good ideas, but are also not very specific. Not sure how to describe it better.

processes comply with secure deletion standards and overwrite the data with random values to
make it unrecoverable.

By implementing these security measures, you can mitigate the risks associated with checkpointing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this note for item 5 or for every item before 5? Should it be after item 7?

@haircommander
Copy link
Contributor

@adrianreber PTAL

@adrianreber
Copy link
Contributor

Also see checkpoint-restore/criu#2297

@adrianreber
Copy link
Contributor

What about integrity protection?
This PR should get a technical review. It's OK not to add this advice (the feature is alpha right now), but if we do add it, the advice has to be right.
Do we know that the checkpoint contains sensitive data? If not, we might want to add an initial step to confirm that.

Yes, a checkpoint might contain sensitive information. In the current state all memory pages are on disk. Depending on your container that might contain private keys, random numbers and secrets. The checkpoint is created in such a way by CRI-O that only root can access the file. A checkpoint does not, initially, open a new way to get this data, because being root means that all that data can be accessed anyway. The big difference and danger is, now it is all conveniently packed in an archive which can be transferred and once it is one another host it might be accessed by non-root users.

I think it is important to make it clear that checkpointing does not expose more data to non-root users than without checkpointing, but it makes it easier to distribute sensitive data.

A checkpoint that only contains public (or can-be-public) data needs integrity checks but not confidentiality protection.

Integrity protection is indeed an important aspect to consider when dealing with checkpointing containers .I will add it in this. Creating a checkpoint of a container involves considerations for both integrity protection and confidentiality, depending on the sensitivity of the data contained in the checkpoint. I will add all this in it .Thanks for the suggestion

In the checkpoint/restore stack (Kubernetes, CRI-O, runc/crun, CRIU) there is not much integrity checking happening right now. CRIU, for example, checks that all open files are the same size and that binaries have the same ELF buildid. But memory pages are not checked before being restored. For some users this is a feature to be able to easily manipulate any state of the process, but it definitely can lead to situations where unintentionally changed data is restored and might result in a crash or worse a process running with corrupted data.

So far, I am not aware of any complaints in upstream CRIU from users about missing integrity checks, but we actually recently started to discuss this as part of the encryption discussion. If we are encrypting the data we could also do integrity checks.

Checkpointing a container is the functionality to create a stateful copy of a
running container. Once you have a stateful copy of a container, you could
Checkpointing a container is the functionality to create a stateful copy of
a running container. Once you have a stateful copy of a container, you could
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to move the single a to the next line?

7. **Determine sensitivity**:Before proceeding with integrity protection measures, it is essential to evaluate the
sensitivity of the data within the container checkpoint. Confirm whether the checkpoint contains any sensitive or
confidential information that needs to be protected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point 1 is already solved by the implementation. Point 4 is more or less the same.

Point 5 is very vague. Not sure it is helpful.

The integrity protection parts are a good idea, but I think that it should happen automatically and not be left to the user.

containers and protect sensitive data from unauthorized access or exposure.

6. **Integrity protection**:If the checkpoint includes sensitive data or data that requires protection against
unauthorized modifications, integrity protection measures should be implemented. This typically involves using
Copy link
Contributor

@rst0git rst0git Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of authentication and integrity protection is part of the encryption scheme. It is not something that Kubernetes users should implement themselves. As Adrian mentioned above, we decided to implement this as a built-in mechanism in CRIU that would be available with different container runtimes (e.g., CRI-O, containerd).

@kbhawkey
Copy link
Contributor

👋 @Nitishupkr .
Since this pull request has been approved and been open for awhile, do you plan to make more changes to address any suggested edits?
Thanks

@natalisucks
Copy link
Contributor

@Nitishupkr Hi there, I wanted to ping again about making some of the suggested changes and addressing feedback from our SIG Node reviewers. If you don't have the capacity to do so, it is perfectly fine to close this PR and come back to it when you do. Thanks!

@divya-mohan0209
Copy link
Contributor

Hello @Nitishupkr, good day!

I will be closing off the PR since despite multiple reminders, there has been no response. Kindly note that you can resume work on this PR when you have bandwidth by reopening it.

/close

SIG Node PR Triage automation moved this from Needs Reviewer to Done Jan 22, 2024
@k8s-ci-robot
Copy link
Contributor

@divya-mohan0209: Closed this PR.

In response to this:

Hello @Nitishupkr, good day!

I will be closing off the PR since despite multiple reminders, there has been no response. Kindly note that you can resume work on this PR when you have bandwidth by reopening it.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/security Categorizes an issue or PR as relevant to SIG Security. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add more detail about security risks and mitigation strategies for checkpointing containers