Skip to content

Conversation

@smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Apr 3, 2017

Cover our primary threats, our boundaries, and our philosophy for
defense.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 3, 2017
Cover our primary threats, our boundaries, and our philosophy for
defense.
@smarterclayton
Copy link
Contributor Author

@kubernetes/sig-auth-misc fairly early, but would appreciate comments on structure, things you'd expect to see vs not, etc.

Copy link

@timstclair timstclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started! I know it's WIP, but here are my initial thoughts.

A few high-level comments:

  • How do we prevent the list of threats from being a laundry list of every attack we can brainstorm against the system? Or is that what it should be? Ditto for defense.
  • List mitigations per threat, and/or defenses per boundary?
  • What level of detail do we want this to go into? On my todo list is doing a much more in-depth threat model of the node, and similarly I imagine you could go into much more depth for every component of the system.

/cc @destijl

**pods** on **nodes** which communicate with the **api-server** to retrieve the description and list of
pods. Pods may communicate with each other over the network, or contact virtual proxies known as
**services** from each node at a unique IP or DNS address. Traffic to pods from outside the cluster
may flow via an **ingress** router/proxy. Pods may have additional resources attached at runtime,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a requirement for a "compliant" kubernetes cluster that Pods get a cluster private IP address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, definitely worth calling out here.


1. Users deploy applications onto the platform via the APIs related to application definition.
1. Some users are highly limited in the actions they may take, such as only creating a subset of available resources and being unable to edit.
2. Administrators manage security, allocate resources, and conduct maintenance on cluster components.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/security/permissions/? Application security probably falls into the user domain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also debug cluster problems


## Security boundaries

Several important security boundaries exist in Kubernetes that reduce the possibility of interference or exploit by consumers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add consumers to the actors list above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume consumers is the superset of all the actors above. To reduce confusion I'd drop it and just use:

Several important security boundaries exist in Kubernetes.

6. Between infrastructure monitoring infrastructure and the infrastructure components
7. Between different tenants, primarily modelled as different namespaces or nodes
8. Between the federation control plane and the cluster control plane

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more, some of these might overlap with what you have above:

8. Between the federation control plane and the cluster control plane


## Multi-tenancy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @davidopp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link to multi-tenant needs to exist and soon - David you or I need to turn the various half docs into real docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton I finally have time to get back to this now. Let's talk about what scope we want for an initial doc.

1. Escaping container and pod isolation to the node or to other containers on the node
2. Accessing APIs as a user without proper authorization or authentication
3. Consuming disproportionate resources on the system to deny other users access
4. Crashing or wedging the system so that no workloads can be processed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"system" could include nodes, critical system pods, master components - might be worth enumerating here, or defining "system" elswhere.

4. Crashing or wedging the system so that no workloads can be processed
5. Encouraging controllers to act as confused deputies and perform actions at a higher level of trust
6. Accessing secret data entrusted to the system without appropriate permission for escalating access to the cluster or other systems
7. Using access to the cluster API to gain elevated permission on the nodes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goes both ways: also using elevated node permissions to gain elevated API permissions.

8. Using access to the federation API to gain elevated permission on the nodes
9. Disguising or concealing malicious actions to prevent forensic assessment after an incident
10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • MitM, both for cluster internal traffic, and external traffic
  • Threats against storage? Reusing volumes....

2. Accessing APIs as a user without proper authorization or authentication
3. Consuming disproportionate resources on the system to deny other users access
4. Crashing or wedging the system so that no workloads can be processed
5. Encouraging controllers to act as confused deputies and perform actions at a higher level of trust

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addons, nodes, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Encouraging" is a bit anthropomorphic :)

9. Disguising or concealing malicious actions to prevent forensic assessment after an incident
10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user

TODO: list threats that are not considered

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Information disclosure through side-channel attacks
  • Threats to the underlying infrastructure (or should this be included?), e.g. hardware access
    • Do we want to include threats to the node from outside pods? e.g. through ssh
  • Threats to user credentials, e.g. stealing the admin cert
  • Application security (to some extent), e.g. XSS in your web app hosted on k8s
  • Threats to other systems on the same network (or should this be included?), e.g. cloud services, non-k8s VMs, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User credential theft should be in. It's one of the most likely methods of compromise. Defences are detailed logging, alerting on unusual access patterns, fast revocation.


Kubernetes should be capable of subdividing the cluster for use by distinct participants with their own security boundaries.

1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding: single tenant, but with workloads that have different trust levels (e.g. DB handling PII and 3p software with a history of bad vulnerabilities)

@smarterclayton
Copy link
Contributor Author

How do we prevent the list of threats from being a laundry list of every attack we can brainstorm against the system? Or is that what it should be? Ditto for defense.

I'd say if we expect a contributor in the Kube community to think about a particular attack angle during normal development, or to ask about it in a review, it's reasonable to put in this doc. We definitely can't list everything, but I think a goal I might say is reasonable as well is "do I need to understand this risk to be an effective Kubernetes operator?". Some threats (side-channel attacks against crypto?) are generally not within that angle, and could be covered by us linking to reference material

List mitigations per threat, and/or defenses per boundary?

I think if it doesn't get exhaustive.

What level of detail do we want this to go into? On my todo list is doing a much more in-depth threat model of the node, and similarly I imagine you could go into much more depth for every component of the system.

I think that could be a separate document, and probably is worthwhile to do regardless. I would expect node change reviewers to apply extra node considerations as well (but maybe not others).

Another possible rule of thumb - if we think we'll need to make a system design change to defend against it, we should have it in the threat model as a high level topic at minimum.

Copy link
Member

@destijl destijl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timstclair perhaps we should just try the laundry-list approach and break it up if it becomes too unwieldy. I've done this before by settling on some categories and brainstorming a list within each of those. Being able to sort by residual risk I found really useful. So people can look at the list and just read the top 10 to get a picture of the current major concerns and not get bogged down with threats we think are well covered by existing defences. For this reason I think it's also important to group threats and defences together.

TODO: link to multi-tenancy doc


## Threats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I've done this in the past I've used a table like this:

Category, Risk, Current Treatment, Likelihood, Impact, Residual Risk (after current treatment), Future Treatment Plan

which lets you sort by residual risk and have threats and defences co-located. Categories were things like:

Initial Execution, Data Exfil, Propagation, Persistence, Priv Esc, DoS etc.

I think you're going to need some sort of grouping here at least, I liked grouping the threats around attacker actions since those are the things you're trying to prevent.

MSFT has their own groupings, here's a somewhat old example:
https://msdn.microsoft.com/en-us/library/ff648641.aspx


## Security boundaries

Several important security boundaries exist in Kubernetes that reduce the possibility of interference or exploit by consumers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume consumers is the superset of all the actors above. To reduce confusion I'd drop it and just use:

Several important security boundaries exist in Kubernetes.

7. Using access to the cluster API to gain elevated permission on the nodes
8. Using access to the federation API to gain elevated permission on the nodes
9. Disguising or concealing malicious actions to prevent forensic assessment after an incident
10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this, I proposed some categories above. Happy to contribute more threat brainstorming when we have some categories to structure around. Couple I thought of on first read: misconfigurations leading to unauthorized access, vulnerabilities in system containers.

9. Disguising or concealing malicious actions to prevent forensic assessment after an incident
10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user

TODO: list threats that are not considered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User credential theft should be in. It's one of the most likely methods of compromise. Defences are detailed logging, alerting on unusual access patterns, fast revocation.

@philips
Copy link
Contributor

philips commented Apr 19, 2017

Thanks for doing this Clayton. I won't pile on more feedback as there is plenty to address. But can you ping the comments once you push a new update?

@timothysc
Copy link
Contributor

/cc @jbeda


1. Users deploy applications onto the platform via the APIs related to application definition.
1. Some users are highly limited in the actions they may take, such as only creating a subset of available resources and being unable to edit.
2. Administrators manage security, allocate resources, and conduct maintenance on cluster components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also debug cluster problems

2. Infrastucture admins manage both the cluster and the underlying systems that run the cluster.
3. Each node offers resources to the cluster and accepts pods that have been scheduled on it.
4. Scheduler controllers assign pods to nodes based on workload definitions and must know details of both to effectively schedule.
5. Workload controllers create pods that have certain lifecycle rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And they enforce those lifecycle rules.

3. Each node offers resources to the cluster and accepts pods that have been scheduled on it.
4. Scheduler controllers assign pods to nodes based on workload definitions and must know details of both to effectively schedule.
5. Workload controllers create pods that have certain lifecycle rules.
6. Network providers use information about the topology of the cluster (physical and logical) to give pods IP address and control their access.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe expand a tiny bit on "access" -- presumably their network access, but maybe even more detail than that?

8. Between the federation control plane and the cluster control plane


## Multi-tenancy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton I finally have time to get back to this now. Let's talk about what scope we want for an initial doc.


## Multi-tenancy

Kubernetes should be capable of subdividing the cluster for use by distinct participants with their own security boundaries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely agree with this taxonomy. I certainly don't object to putting it in this doc but I think we can probably do something more a bit more complete later. For example the distinction between users and applications is a bit blurred, and also the "hardness" (you could have teams that are not "acting together" but still don't require hard multi-tenancy... actually you're multi-tenant doesn't really speak to the "hardness") ... I think maybe there are multiple axes.

1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins
2. Collaborative Multi-tenant: Two or more teams acting together on the same cluster without strong boundaries between their roles
3. Multi-tenant: Many users with limited permissions whose applications may need to interact or be isolated, and administrators to configure and enforce that tenancy
4. Organizational multi-tenant: Several large groups of users and applications (organizations) where the organization admin is granted a subset of cluster administrative permissions in order to subdivide responsibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton Do you have anything written up on the latest thoughts on using labels in access control?

2. Accessing APIs as a user without proper authorization or authentication
3. Consuming disproportionate resources on the system to deny other users access
4. Crashing or wedging the system so that no workloads can be processed
5. Encouraging controllers to act as confused deputies and perform actions at a higher level of trust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Encouraging" is a bit anthropomorphic :)


1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins
2. Collaborative Multi-tenant: Two or more teams acting together on the same cluster without strong boundaries between their roles
3. Multi-tenant: Many users with limited permissions whose applications may need to interact or be isolated, and administrators to configure and enforce that tenancy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What level of isolation is this guaranteeing?

Kubernetes should be capable of subdividing the cluster for use by distinct participants with their own security boundaries.

1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins
2. Collaborative Multi-tenant: Two or more teams acting together on the same cluster without strong boundaries between their roles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also be called "soft" idk on terminology... if it matters

@jessfraz
Copy link
Contributor

is there a plan of action to combine #532 and #551 into one doc? its a little confusing having the discussion sprawled in a few places :)

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 15, 2017
@k8s-github-robot
Copy link

This PR hasn't been active in 151 days. Closing this PR. Please reopen if you would like to work towards merging this change, if/when the PR is ready for the next round of review.

cc @calebamiles @sarahnovotny @smarterclayton

You can add 'keep-open' label to prevent this from happening again, or add a comment to keep it open another 90 days

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 18, 2017
@smarterclayton
Copy link
Contributor Author

Ugh, I wish there were more hours in the day. I'll queue this up for next sig-auth meeting.

@bgrant0607
Copy link
Member

Nit: This will need to be moved into a sig-specific subdirectory, such as sig-auth.

@k8s-github-robot k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2018
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Mar 27, 2018 via email

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 27, 2018
@gkarthiks
Copy link

Hi @smarterclayton Since we have a KEP process, can we please merge this soon or close this and create a KEP?

@mrbobbytables
Copy link
Member

As the design proposal process has been superseded by the KEP process -- I'm going to go ahead and close this out. If this is still relevant, please consider adapting it to a KEP.

/close

@k8s-ci-robot
Copy link
Contributor

@mrbobbytables: Closed this PR.

In response to this:

As the design proposal process has been superseded by the KEP process -- I'm going to go ahead and close this out. If this is still relevant, please consider adapting it to a KEP.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

danehans pushed a commit to danehans/community that referenced this pull request Jul 18, 2023
* Add release 1.9 managers

* Sort rm 1.9 list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/design Categorizes issue or PR as related to design. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.