-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broker bind output rework #124
Conversation
oh this is going to be fun trying to merge this with my changes 😢 your PR makes sense. The changes I had to make was to move getPodName out of extractCredentials call. because I need to get the pod name before the job finishes. |
Good work. I'll look at it probably not until later this week. |
actually doesn't look too bad for me. |
Testing this will require: fusor/apb-examples#25 |
@jmrodri rebased fusor/apb-examples#25. So this should be ok to test. Thx for the heads up. |
I'm apprehensive of merging this until we have a working broker with new service catalog. @rthallisey would you help me understand the benefits of this approach with oc exec and reading file opposed to parsing oc logs? @rthallisey what problems do you see this resolving that we may have faced with the oc logs usage? |
@jwmatthews When we send the credentials to stdout we're making the creds readable by anyone who can view that pod. If we move to exec, the pod will have to be visible and we can use the container environment to secure who can look at the credentials. That could be things like selinux, users, passwords, or tokens to read the credentials that we can't use to protect stdout. |
I'm a little undecided on this. I'm convinced it's more secure than our current implementation given @rthallisey's argument, so even if we merged as is, there's value to be gained. On the other hand, it's not quite one of the three options I remember discussing when credential exfiltration was originally discussed. IIRC we settled on 3 options:
This PR isn't really any of those options. I don't think that disqualifies it though; is it a 4th option that accomplishes everything we're looking for? Regardless of the answer, this is definitely better than what we're doing right now. |
@eriknelson Ya it's not one of the original three, but here's my thinking.
I interpreted this as reading the pod exit status from something like an I'm going to look in the API code and see how a pod can return an error code and maybe that's what was meant here.
Networking concerns.
This is nearly identical to exec. In the container we don't allow anyone but the broker to read from the directory that's attached to the volume. But, now we have to create and destroy resources on the broker side. Exec doesn't need that. |
In regard to this PR, I'd like to chat with some of the AOS folks to run this approach by them and get feedback before we merge. Plan to address feedback in next sprint, tracking in card: https://trello.com/c/PZJfBeph |
@rthallisey please rebase this when you can. I spoke to Clayton and confirmed he's good with this approach. |
8cdd532
to
6a41012
Compare
Depends on #242 |
pkg/apb/ext_creds.go
Outdated
@@ -14,78 +15,56 @@ var stillWaitingError = "status: still waiting to start" | |||
var timeoutFreq = 6 // Seconds | |||
var totalTimeout = 900 // 15min | |||
|
|||
// ExtractCredentials - Extract credentials from pod in a certain namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this line in. I am pretty sure this is why the build is failing.
6a41012
to
a537460
Compare
@jwmatthews sounds good. It hasn't been working for me for the past few days so I'll see if I can figure out the issue. |
606f1a0
to
8e25f1c
Compare
I think this is good to go. @eriknelson can you help me test this? I'm having trouble getting a working UI. I've tested this with the environment I have and the bind credentials make it into etcd. After that, the service-catalog injects those from etcd into a running pod. So I'm confident this should work since I don't touch that part of the bind. But I'd like to confirm this with an already working environment. |
e383185
to
c0c1c29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a conversation about this with @jmrodri over lunch and the feeling is that the mechanism used for monitoring output is difficult to reason about and should be simplified. While not technically wrong; it's not easy to answer something as simple as "How long can an APB run for before it times out?" given all the retry logic between the broker, credential extraction script, and bind-init. Here's the proposed changes:
-
broker-bind-creds
Should drop any retry logic at all and be a reasonably dumb cat script. If it's there, cat it and rm, if not, exit 1. -
bind-init
is effectively "how long should the pod be kept alive waiting for the broker to collect credentials". That's good as is, we probably just want to hardcode a value of 5min there. -
Push most of the retry logic into the broker. Hard code retry count/interval and have the interval remain static. (Not a function of the retry count). Can be made configurable if desired in the future.
With these changes, we should be able to confidently say: The broker will attempt to extract credentials for N*interval units, and if the credentials haven't been written by that point in time, things have timed out.
Additional requested changes:
- monitorOutput shouldn't be a go routine, and should return output, err, where err is the bubbled oc exec cmd error. We want to make sure err gets logged somewhere in the stack.
- Destroy the APB sandbox immediately after Provision
Overall, I think the core of this accomplishes the original goal for the change, and I've seen it passing regression tests. I'd be comfortable merging this and it's apb-examples counterpart pending these changes.
@@ -67,6 +67,13 @@ func Provision( | |||
} | |||
|
|||
creds, err := ExtractCredentials(podName, instance.Context.Namespace, log) | |||
// We should not save credentials from an app that finds them and isn't | |||
// bindable | |||
if creds != nil && !instance.Spec.Bindable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
pkg/broker/broker.go
Outdated
@@ -490,6 +475,15 @@ func (a AnsibleBroker) Provision(instanceUUID uuid.UUID, req *ProvisionRequest, | |||
return nil, err | |||
} | |||
} | |||
|
|||
sm := apb.NewServiceAccountManager(a.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written, this will leak an APB sandbox if the Dao set errors out. Need to move the destruction above the SetExtractedCredentails stanza and immediately after the apb.Provision call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/apb/ext_creds.go
Outdated
var err error | ||
creds, err = buildExtractedCredentials(credOut) | ||
log.Debug("Calling monitorOutput on " + podname) | ||
go monitorOutput(namespace, podname, credOut, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to make this a goroutine. It spawns monitor output and then immediately blocks on a single value, so we don't gain anything. The http server is optimized to continue to accept requests concurrently, so a long monitorOutput isn't going to block anything incoming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 at this time and the way we do the work I don't see the need for a go routine to monitor the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/apb/ext_creds.go
Outdated
func buildExtractedCredentials(output []byte) (*ExtractedCredentials, error) { | ||
stillWaiting := strings.Contains(string(output), "ContainerCreating") || | ||
strings.Contains(string(output), "NotFound") | ||
func monitorOutput(namespace string, podname string, mon chan []byte, log *logging.Logger) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this isn't going to be run as a goroutine, should remove the channel from the signature and just return (string, error)
, string being the output from line 60.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/apb/ext_creds.go
Outdated
// instead of only getting the credentials | ||
|
||
for r := 1; r <= CredentialRetries; r++ { | ||
output, _ := RunCommand("oc", "exec", podname, GatherCredentialsCMD, "--namespace="+namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't drop the error, should return it as part of the new signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 this will totally bite us down the road. I can already foresee the exec erroring out for some odd reason and not having any idea what really happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the log if I capture err with log.Error(err.Error())
during a provision:
[2017-07-13T10:06:23.988-04:00] [ERROR] exit status 1
[2017-07-13T10:06:23.988-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 1: Waiting for container to start
[2017-07-13T10:06:23.988-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 1: exec into apb-5834b548-1611-4e5a-a4ee-a799f2fa3018 failed
[2017-07-13T10:06:25.614-04:00] [ERROR] exit status 1
[2017-07-13T10:06:25.614-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 2: Waiting for container to start
[2017-07-13T10:06:25.614-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 2: exec into apb-5834b548-1611-4e5a-a4ee-a799f2fa3018 failed
[2017-07-13T10:07:08.242-04:00] [ERROR] exit status 137
[2017-07-13T10:07:08.242-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 3: exec into apb-5834b548-1611-4e5a-a4ee-a799f2fa3018 failed
[2017-07-13T10:07:11.669-04:00] [ERROR] exit status 1
[2017-07-13T10:07:11.669-04:00] [NOTICE] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] APB completed
[2017-07-13T10:07:11.669-04:00] [INFO] Destroying APB sandbox...
Since we combine stderr and stdout [1] into one stream, both are being pushed to output
. err
is the return value from the command similar to echo $?
. So it's probably not a good idea to capture it as anything but info. Also, I don't want to exit on error because the retry logic is in this function. What I can do is move the retry logic out of this function and return on error.
[1] -
ansible-service-broker/pkg/apb/hack.go
Line 12 in 9f8abd0
output, err := exec.Command(cmd, args...).CombinedOutput() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at moving out the retry logic and it wouldn't be as clean because some of the monitoring pieces would leak into extractCredentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better. See @eriknelson explanation of what should change to finish this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting changes
Change the broker bind output from trailing the logs of the provision or bind container to exec'ing into the container.
Deprovision was calling ExtractCredentials to get the output of oc logs into the broker logs. The broker was calling provision then ExtractCredentials. We don't need to do that since provision and bind already call it.
c0c1c29
to
b8ecc27
Compare
b8ecc27
to
ec205ec
Compare
@eriknelson @jmrodri both broker-bind-output patches have been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason the build is failing
|
||
// Bind credential gathering constants | ||
// 5 seconds x 60 retries = 5 minute timout | ||
WaitTime = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add comments for exported constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to provision images built with cooresponding apb-base changes rhscl-postresql-apb, mediawiki123-apb, and bind the two as expected. All requested changes addressed, I think this is ready to go.
Travis is failing due to a failing lint:
Going to merge this and immediately follow up with a fix for that. Thank you everyone! |
* Will automatically use proper relist with an APB push following a push and/or bootstrap (unless otherwise specified with --no-relist)
* Broker bind output rework Change the broker bind output from trailing the logs of the provision or bind container to exec'ing into the container. * Fix some things after the rebase * Add namespace to monitorOutput * Remove uneeded calls to ExtractCredentials Deprovision was calling ExtractCredentials to get the output of oc logs into the broker logs. The broker was calling provision then ExtractCredentials. We don't need to do that since provision and bind already call it. * Accept pod completion as success * Catch a few corner cases and improve logging * Remove go routine and add a few fixes
Change the broker bind output from trailing the logs of
the provision or bind container to exec'ing into the container.