-
Notifications
You must be signed in to change notification settings - Fork 5.3k
WIP - Add a threat model to the proposals list #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cover our primary threats, our boundaries, and our philosophy for defense.
c15cbd7 to
1063fd6
Compare
|
@kubernetes/sig-auth-misc fairly early, but would appreciate comments on structure, things you'd expect to see vs not, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this started! I know it's WIP, but here are my initial thoughts.
A few high-level comments:
- How do we prevent the list of threats from being a laundry list of every attack we can brainstorm against the system? Or is that what it should be? Ditto for defense.
- List mitigations per threat, and/or defenses per boundary?
- What level of detail do we want this to go into? On my todo list is doing a much more in-depth threat model of the node, and similarly I imagine you could go into much more depth for every component of the system.
/cc @destijl
| **pods** on **nodes** which communicate with the **api-server** to retrieve the description and list of | ||
| pods. Pods may communicate with each other over the network, or contact virtual proxies known as | ||
| **services** from each node at a unique IP or DNS address. Traffic to pods from outside the cluster | ||
| may flow via an **ingress** router/proxy. Pods may have additional resources attached at runtime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a requirement for a "compliant" kubernetes cluster that Pods get a cluster private IP address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not, definitely worth calling out here.
|
|
||
| 1. Users deploy applications onto the platform via the APIs related to application definition. | ||
| 1. Some users are highly limited in the actions they may take, such as only creating a subset of available resources and being unable to edit. | ||
| 2. Administrators manage security, allocate resources, and conduct maintenance on cluster components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/security/permissions/? Application security probably falls into the user domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also debug cluster problems
|
|
||
| ## Security boundaries | ||
|
|
||
| Several important security boundaries exist in Kubernetes that reduce the possibility of interference or exploit by consumers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add consumers to the actors list above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume consumers is the superset of all the actors above. To reduce confusion I'd drop it and just use:
Several important security boundaries exist in Kubernetes.
| 6. Between infrastructure monitoring infrastructure and the infrastructure components | ||
| 7. Between different tenants, primarily modelled as different namespaces or nodes | ||
| 8. Between the federation control plane and the cluster control plane | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more, some of these might overlap with what you have above:
- pods -> API
- nodes -> API (maybe worth breaking out from Port over SIG table/docs from kubernetes.wiki #3)
- pods -> pods (on different nodes)
- node endpoints (apiserver -> node)
| 8. Between the federation control plane and the cluster control plane | ||
|
|
||
|
|
||
| ## Multi-tenancy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @davidopp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to multi-tenant needs to exist and soon - David you or I need to turn the various half docs into real docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I finally have time to get back to this now. Let's talk about what scope we want for an initial doc.
| 1. Escaping container and pod isolation to the node or to other containers on the node | ||
| 2. Accessing APIs as a user without proper authorization or authentication | ||
| 3. Consuming disproportionate resources on the system to deny other users access | ||
| 4. Crashing or wedging the system so that no workloads can be processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"system" could include nodes, critical system pods, master components - might be worth enumerating here, or defining "system" elswhere.
| 4. Crashing or wedging the system so that no workloads can be processed | ||
| 5. Encouraging controllers to act as confused deputies and perform actions at a higher level of trust | ||
| 6. Accessing secret data entrusted to the system without appropriate permission for escalating access to the cluster or other systems | ||
| 7. Using access to the cluster API to gain elevated permission on the nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goes both ways: also using elevated node permissions to gain elevated API permissions.
| 8. Using access to the federation API to gain elevated permission on the nodes | ||
| 9. Disguising or concealing malicious actions to prevent forensic assessment after an incident | ||
| 10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- MitM, both for cluster internal traffic, and external traffic
- Threats against storage? Reusing volumes....
| 2. Accessing APIs as a user without proper authorization or authentication | ||
| 3. Consuming disproportionate resources on the system to deny other users access | ||
| 4. Crashing or wedging the system so that no workloads can be processed | ||
| 5. Encouraging controllers to act as confused deputies and perform actions at a higher level of trust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also addons, nodes, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Encouraging" is a bit anthropomorphic :)
| 9. Disguising or concealing malicious actions to prevent forensic assessment after an incident | ||
| 10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user | ||
|
|
||
| TODO: list threats that are not considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Information disclosure through side-channel attacks
- Threats to the underlying infrastructure (or should this be included?), e.g. hardware access
- Do we want to include threats to the node from outside pods? e.g. through ssh
- Threats to user credentials, e.g. stealing the admin cert
- Application security (to some extent), e.g. XSS in your web app hosted on k8s
- Threats to other systems on the same network (or should this be included?), e.g. cloud services, non-k8s VMs, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User credential theft should be in. It's one of the most likely methods of compromise. Defences are detailed logging, alerting on unusual access patterns, fast revocation.
|
|
||
| Kubernetes should be capable of subdividing the cluster for use by distinct participants with their own security boundaries. | ||
|
|
||
| 1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding: single tenant, but with workloads that have different trust levels (e.g. DB handling PII and 3p software with a history of bad vulnerabilities)
I'd say if we expect a contributor in the Kube community to think about a particular attack angle during normal development, or to ask about it in a review, it's reasonable to put in this doc. We definitely can't list everything, but I think a goal I might say is reasonable as well is "do I need to understand this risk to be an effective Kubernetes operator?". Some threats (side-channel attacks against crypto?) are generally not within that angle, and could be covered by us linking to reference material
I think if it doesn't get exhaustive.
I think that could be a separate document, and probably is worthwhile to do regardless. I would expect node change reviewers to apply extra node considerations as well (but maybe not others). Another possible rule of thumb - if we think we'll need to make a system design change to defend against it, we should have it in the threat model as a high level topic at minimum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timstclair perhaps we should just try the laundry-list approach and break it up if it becomes too unwieldy. I've done this before by settling on some categories and brainstorming a list within each of those. Being able to sort by residual risk I found really useful. So people can look at the list and just read the top 10 to get a picture of the current major concerns and not get bogged down with threats we think are well covered by existing defences. For this reason I think it's also important to group threats and defences together.
| TODO: link to multi-tenancy doc | ||
|
|
||
|
|
||
| ## Threats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I've done this in the past I've used a table like this:
Category, Risk, Current Treatment, Likelihood, Impact, Residual Risk (after current treatment), Future Treatment Plan
which lets you sort by residual risk and have threats and defences co-located. Categories were things like:
Initial Execution, Data Exfil, Propagation, Persistence, Priv Esc, DoS etc.
I think you're going to need some sort of grouping here at least, I liked grouping the threats around attacker actions since those are the things you're trying to prevent.
MSFT has their own groupings, here's a somewhat old example:
https://msdn.microsoft.com/en-us/library/ff648641.aspx
|
|
||
| ## Security boundaries | ||
|
|
||
| Several important security boundaries exist in Kubernetes that reduce the possibility of interference or exploit by consumers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume consumers is the superset of all the actors above. To reduce confusion I'd drop it and just use:
Several important security boundaries exist in Kubernetes.
| 7. Using access to the cluster API to gain elevated permission on the nodes | ||
| 8. Using access to the federation API to gain elevated permission on the nodes | ||
| 9. Disguising or concealing malicious actions to prevent forensic assessment after an incident | ||
| 10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this, I proposed some categories above. Happy to contribute more threat brainstorming when we have some categories to structure around. Couple I thought of on first read: misconfigurations leading to unauthorized access, vulnerabilities in system containers.
| 9. Disguising or concealing malicious actions to prevent forensic assessment after an incident | ||
| 10. Reusing cluster resources (like services, namespaces, or secret names) after they have been deleted to masquerade as the previous user | ||
|
|
||
| TODO: list threats that are not considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User credential theft should be in. It's one of the most likely methods of compromise. Defences are detailed logging, alerting on unusual access patterns, fast revocation.
|
Thanks for doing this Clayton. I won't pile on more feedback as there is plenty to address. But can you ping the comments once you push a new update? |
|
/cc @jbeda |
|
|
||
| 1. Users deploy applications onto the platform via the APIs related to application definition. | ||
| 1. Some users are highly limited in the actions they may take, such as only creating a subset of available resources and being unable to edit. | ||
| 2. Administrators manage security, allocate resources, and conduct maintenance on cluster components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also debug cluster problems
| 2. Infrastucture admins manage both the cluster and the underlying systems that run the cluster. | ||
| 3. Each node offers resources to the cluster and accepts pods that have been scheduled on it. | ||
| 4. Scheduler controllers assign pods to nodes based on workload definitions and must know details of both to effectively schedule. | ||
| 5. Workload controllers create pods that have certain lifecycle rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And they enforce those lifecycle rules.
| 3. Each node offers resources to the cluster and accepts pods that have been scheduled on it. | ||
| 4. Scheduler controllers assign pods to nodes based on workload definitions and must know details of both to effectively schedule. | ||
| 5. Workload controllers create pods that have certain lifecycle rules. | ||
| 6. Network providers use information about the topology of the cluster (physical and logical) to give pods IP address and control their access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe expand a tiny bit on "access" -- presumably their network access, but maybe even more detail than that?
| 8. Between the federation control plane and the cluster control plane | ||
|
|
||
|
|
||
| ## Multi-tenancy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I finally have time to get back to this now. Let's talk about what scope we want for an initial doc.
|
|
||
| ## Multi-tenancy | ||
|
|
||
| Kubernetes should be capable of subdividing the cluster for use by distinct participants with their own security boundaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I completely agree with this taxonomy. I certainly don't object to putting it in this doc but I think we can probably do something more a bit more complete later. For example the distinction between users and applications is a bit blurred, and also the "hardness" (you could have teams that are not "acting together" but still don't require hard multi-tenancy... actually you're multi-tenant doesn't really speak to the "hardness") ... I think maybe there are multiple axes.
| 1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins | ||
| 2. Collaborative Multi-tenant: Two or more teams acting together on the same cluster without strong boundaries between their roles | ||
| 3. Multi-tenant: Many users with limited permissions whose applications may need to interact or be isolated, and administrators to configure and enforce that tenancy | ||
| 4. Organizational multi-tenant: Several large groups of users and applications (organizations) where the organization admin is granted a subset of cluster administrative permissions in order to subdivide responsibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Do you have anything written up on the latest thoughts on using labels in access control?
| 2. Accessing APIs as a user without proper authorization or authentication | ||
| 3. Consuming disproportionate resources on the system to deny other users access | ||
| 4. Crashing or wedging the system so that no workloads can be processed | ||
| 5. Encouraging controllers to act as confused deputies and perform actions at a higher level of trust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Encouraging" is a bit anthropomorphic :)
|
|
||
| 1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins | ||
| 2. Collaborative Multi-tenant: Two or more teams acting together on the same cluster without strong boundaries between their roles | ||
| 3. Multi-tenant: Many users with limited permissions whose applications may need to interact or be isolated, and administrators to configure and enforce that tenancy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What level of isolation is this guaranteeing?
| Kubernetes should be capable of subdividing the cluster for use by distinct participants with their own security boundaries. | ||
|
|
||
| 1. Single-tenant: A small number of administrators who are also users deploying applications, and are often infrastructure admins | ||
| 2. Collaborative Multi-tenant: Two or more teams acting together on the same cluster without strong boundaries between their roles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also be called "soft" idk on terminology... if it matters
|
This PR hasn't been active in 151 days. Closing this PR. Please reopen if you would like to work towards merging this change, if/when the PR is ready for the next round of review. cc @calebamiles @sarahnovotny @smarterclayton You can add 'keep-open' label to prevent this from happening again, or add a comment to keep it open another 90 days |
|
Ugh, I wish there were more hours in the day. I'll queue this up for next sig-auth meeting. |
|
Nit: This will need to be moved into a sig-specific subdirectory, such as sig-auth. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
/lifecycle frozen
…On Wed, Mar 14, 2018 at 12:14 PM, fejta-bot ***@***.***> wrote:
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually
close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta
<https://github.com/fejta>.
/lifecycle stale
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#504 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p4jQAxeRFk21L0YSvaTv2Pa0KKZ9ks5teUHJgaJpZM4MyDYF>
.
|
|
Hi @smarterclayton Since we have a KEP process, can we please merge this soon or close this and create a KEP? |
|
As the design proposal process has been superseded by the KEP process -- I'm going to go ahead and close this out. If this is still relevant, please consider adapting it to a KEP. /close |
|
@mrbobbytables: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* Add release 1.9 managers * Sort rm 1.9 list
Cover our primary threats, our boundaries, and our philosophy for
defense.