New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating proposals for keeping transient namespace alive #464
Conversation
|
|
||
| ### Configuration Values | ||
|
|
||
| * `keep_namespace` or `BrokerConfig.KeepNamespace` - Parameter to always keep name space no matter what. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that this should be in the openshift/clusterConfig section of the configuration.
| if !brokerConfig.keepNamespace || !((pod.Status.Phase == apiv1.PodFailed || pod.Status.Phase == apiv1.PodUnknown || err != nil) && brokerConfig.keepNamespaceOnError) { | ||
| ... Delete Namespace | ||
| } | ||
| ...Delete role bindings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only thing(s) that should survive when we are keeping the namespace is 1) the namespace and 2) the pod? I imagine we would want to remove everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are configured to keep the namespace, I assume we want to keep everything in it we created to aid in debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we concerned about leaking role related resources like SA's and RoleBindings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are removing the role bindings, the thinking is that you can then give other users access to the new namespace, they can debug the APB without getting elevated permissions into the target namespace.
How much security are we gaining by having an ephemeral namespace versus a locked down persistent namespace for APBs?
The above is the added security that we will get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are removing the role bindings, the thinking is that you can then give other users
access to the new namespace, they can debug the APB without getting elevated
permissions into the target namespace.How much security are we gaining by having an ephemeral namespace versus a locked
down persistent namespace for APBs?The above is the added security that we will get.
I disagree. The namespace is not the source for the vulnerability because by itself it's not dangerous. The vulnerability is with the rolebindings for the namespace. If we manage the rolebindings on every APB namespace, then I don't think there's any harm in keeping them around for at least a period of time. This will allow folks to debug a failure in their APBs or to read the output logs from a successful run. I would much rather have a namespace deletion timer than having to check for APB pod failures in our code and require another config setting in the broker to enable persistent namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree that the watcher that will delete them after a certain time is a good idea. We have captured that, and I think is just too much to implement in a bug fix.
I have created a card on the backlog already:
https://trello.com/c/i5gUkUv4
|
How much security are we gaining by having an ephemeral namespace versus a locked down persistent namespace for APBs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only request is that the proposal be more explicit about, in the case that we are "keeping the namespace", what survives and what does not. I read the code change as suggesting that the broker would still remove service accounts and role bindings created as part of the sandbox but was not certain.
| The goal of this proposal is to give administrators options to keep namespaces and therefore APB pods after the execution. The use cases for this feature are demos and debugging APBs. | ||
|
|
||
| ## Problem Description | ||
| The [bug](https://bugzilla.redhat.com/show_bug.cgi?id=1497766) was created when we moved to creating transient namespaces during execution of the APB pod. What happens This bug creates issues for debugging APB's as well as issues with demos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TYPO: "What happens This bug" -> This bug creates ...
Also what sort of problems are created? How about:
Bug 1497766 titled "APB Pods are deleted even when an error occurs" occurred when we moved to creating transient namespaces during the APB pod execution. This bug makes it difficult to debug APB development and creating demos because the pods do not live long enough to capture useful information.
| ... | ||
| ``` | ||
|
|
||
| **NOTE: Both will default to false in openshift-ansible, but will be set to true for keep_namespace_on_error** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the above NOTE. both will be false, but set to true for namespace_on_error. HUH? 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was totally in my one head on this one, will make it more clear
| ### Configuration Values | ||
|
|
||
| * `keep_namespace` or `ClusterConfig.KeepNamespace` - Parameter to always keep name space no matter what. | ||
| * `keep_namespace_on_error` or `ClusterConfig.KeepNamespaceOnError` - parameter to keep name space around if an error occurs in the play book that is running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we considered a simple TTL? Keep pods alive for x amount of time, then possibly have a reaper job to clean things up. The TTL, say 1 day, would allow customer support to debug a pod that had an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I implemented the original sandboxes, there was an urgency around: the sandbox MUST be deleted following an APB method execution. I think there was a security concern with keeping resources like ServiceAccounts and RoleBindings around longer than required. We could keep the pod around and delete the SA/RB, but that might violate the idea of the atomic "sandbox" (deleting some resources, but not others). Maybe it's okay because the Pod itself isn't considered part of it, but those are my concerns with a TTL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the TTL is a 3rd configuration option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this a great idea, but it strikes me as a feature rather than a bug fix.
I have added the card: https://trello.com/c/i5gUkUv4 to keep track of this idea
| if err != nil { | ||
| s.log.Errorf("Unable to retrieve pod - %v", err) | ||
| } | ||
| if !brokerConfig.keepNamespace || !((pod.Status.Phase == apiv1.PodFailed || pod.Status.Phase == apiv1.PodUnknown || err != nil) && brokerConfig.keepNamespaceOnError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a nicer way to phrase this if statement? I have to read it several times to make ensure a particular state triggers it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract it behind a function so the logic can be tweaked without altering the if?
if shouldDeleteNamespace() { deleteNamespace(namespace) }
func shouldDeleteNamespace() {
if brokerConfig.KeepNamespace {
return false
}
// ... other conditions
}
| ## Work Items | ||
| - Add Code Above | ||
| - Add broker config values above during creation of Service Account. | ||
| - update the deployed template to set default values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the extra space before values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work items sound sane, depends on the actual configuration options we decide on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the extra space makes it not a list.
example
-Hello
vs
- Hello
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see the extra space. my mistake sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits and discussion points. Core of it makes sense to me though.
| # Keeping transient pod name spaces alive | ||
|
|
||
| ## Introduction | ||
| The goal of this proposal is to give administrators options to keep namespaces and therefore APB pods after the execution. The use cases for this feature are demos and debugging APBs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a use-case for production cluster operators; I don't think this is limited to demo's or debugging APBs. APB's could certainly fail under production scenarios, and having nothing to examine after something unexpected takes place is undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we help the production operator be able to debug things like this? would the workflow be if things fail, reconfigure the broker to keep the namespaces around, and retry?
| ### Configuration Values | ||
|
|
||
| * `keep_namespace` or `ClusterConfig.KeepNamespace` - Parameter to always keep name space no matter what. | ||
| * `keep_namespace_on_error` or `ClusterConfig.KeepNamespaceOnError` - parameter to keep name space around if an error occurs in the play book that is running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the TTL is a 3rd configuration option?
| if err != nil { | ||
| s.log.Errorf("Unable to retrieve pod - %v", err) | ||
| } | ||
| if !brokerConfig.keepNamespace || !((pod.Status.Phase == apiv1.PodFailed || pod.Status.Phase == apiv1.PodUnknown || err != nil) && brokerConfig.keepNamespaceOnError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract it behind a function so the logic can be tweaked without altering the if?
if shouldDeleteNamespace() { deleteNamespace(namespace) }
func shouldDeleteNamespace() {
if brokerConfig.KeepNamespace {
return false
}
// ... other conditions
}
| if !brokerConfig.keepNamespace || !((pod.Status.Phase == apiv1.PodFailed || pod.Status.Phase == apiv1.PodUnknown || err != nil) && brokerConfig.keepNamespaceOnError) { | ||
| ... Delete Namespace | ||
| } | ||
| ...Delete role bindings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we concerned about leaking role related resources like SA's and RoleBindings?
|
@eriknelson I like the abstraction behind shouldDeleteNamespace that will help. |
|
@eriknelson if we NEED to remove the namespaces immediately for security purposes, then I would assume the TTL won't really help. I was thinking the TTL would replace the other configs not be a 3rd. |
Updating based on review comments
updating use case section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Describe what this PR does and why we need it:
Proposal for bug