Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker bind output rework #124

Merged
merged 7 commits into from
Jul 13, 2017
Merged

Conversation

rthallisey
Copy link
Contributor

Change the broker bind output from trailing the logs of
the provision or bind container to exec'ing into the container.

@jmrodri
Copy link
Contributor

jmrodri commented May 24, 2017

oh this is going to be fun trying to merge this with my changes 😢 your PR makes sense. The changes I had to make was to move getPodName out of extractCredentials call. because I need to get the pod name before the job finishes.

@jmrodri
Copy link
Contributor

jmrodri commented May 24, 2017

Good work. I'll look at it probably not until later this week.

@jmrodri
Copy link
Contributor

jmrodri commented May 24, 2017

actually doesn't look too bad for me.

@rthallisey
Copy link
Contributor Author

Testing this will require: fusor/apb-examples#25

@jmrodri jmrodri added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 2, 2017
@rthallisey
Copy link
Contributor Author

rthallisey commented Jun 2, 2017

@jmrodri rebased fusor/apb-examples#25. So this should be ok to test. Thx for the heads up.

@jwmatthews
Copy link
Member

I'm apprehensive of merging this until we have a working broker with new service catalog.
My preference is to debug issues with broker<->service catalog, after those are resolved, test this PR and determine if we merge.
Main concern is introducing another variable into the mix when things are not working.

@rthallisey would you help me understand the benefits of this approach with oc exec and reading file opposed to parsing oc logs?

@rthallisey what problems do you see this resolving that we may have faced with the oc logs usage?

@rthallisey
Copy link
Contributor Author

@jwmatthews When we send the credentials to stdout we're making the creds readable by anyone who can view that pod. If we move to exec, the pod will have to be visible and we can use the container environment to secure who can look at the credentials. That could be things like selinux, users, passwords, or tokens to read the credentials that we can't use to protect stdout.

@jmrodri jmrodri added feature and removed sprint5 labels Jun 9, 2017
@eriknelson eriknelson added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2017
@eriknelson
Copy link
Contributor

I'm a little undecided on this. I'm convinced it's more secure than our current implementation given @rthallisey's argument, so even if we merged as is, there's value to be gained. On the other hand, it's not quite one of the three options I remember discussing when credential exfiltration was originally discussed. IIRC we settled on 3 options:

  1. Some kind of pod exit status that was yet to be implemented, but soon to be available?
  2. Broker opens a webhook, networking concerns here.
  3. Some kind of shared volume that the broker and pod can both read/write from.

This PR isn't really any of those options. I don't think that disqualifies it though; is it a 4th option that accomplishes everything we're looking for?

Regardless of the answer, this is definitely better than what we're doing right now.

@rthallisey
Copy link
Contributor Author

rthallisey commented Jun 21, 2017

@eriknelson Ya it's not one of the original three, but here's my thinking.

Some kind of pod exit status that was yet to be implemented, but soon to be available?

I interpreted this as reading the pod exit status from something like an oc describe. I didn't understand this one because this seems like it would be the same as what we're doing now.

I'm going to look in the API code and see how a pod can return an error code and maybe that's what was meant here.

Broker opens a webhook, networking concerns here.

Networking concerns.

Some kind of shared volume that the broker and pod can both read/write from.

This is nearly identical to exec. In the container we don't allow anyone but the broker to read from the directory that's attached to the volume. But, now we have to create and destroy resources on the broker side. Exec doesn't need that.

@jwmatthews
Copy link
Member

In regard to this PR, I'd like to chat with some of the AOS folks to run this approach by them and get feedback before we merge.

Plan to address feedback in next sprint, tracking in card: https://trello.com/c/PZJfBeph

@jwmatthews jwmatthews changed the title Broker bind output rework [DO NOT MERGE] Broker bind output rework Jun 23, 2017
@jwmatthews jwmatthews changed the title [DO NOT MERGE] Broker bind output rework Broker bind output rework Jun 27, 2017
@jwmatthews
Copy link
Member

@rthallisey please rebase this when you can.

I spoke to Clayton and confirmed he's good with this approach.

@rthallisey
Copy link
Contributor Author

Depends on #242

@@ -14,78 +15,56 @@ var stillWaitingError = "status: still waiting to start"
var timeoutFreq = 6 // Seconds
var totalTimeout = 900 // 15min

// ExtractCredentials - Extract credentials from pod in a certain namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this line in. I am pretty sure this is why the build is failing.

@rthallisey
Copy link
Contributor Author

@jwmatthews sounds good. It hasn't been working for me for the past few days so I'll see if I can figure out the issue.

@rthallisey
Copy link
Contributor Author

rthallisey commented Jul 6, 2017

I think this is good to go. @eriknelson can you help me test this? I'm having trouble getting a working UI. I've tested this with the environment I have and the bind credentials make it into etcd. After that, the service-catalog injects those from etcd into a running pod. So I'm confident this should work since I don't touch that part of the bind. But I'd like to confirm this with an already working environment.

@rthallisey rthallisey force-pushed the broker-bind-output branch 2 times, most recently from e383185 to c0c1c29 Compare July 6, 2017 18:57
Copy link
Contributor

@eriknelson eriknelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a conversation about this with @jmrodri over lunch and the feeling is that the mechanism used for monitoring output is difficult to reason about and should be simplified. While not technically wrong; it's not easy to answer something as simple as "How long can an APB run for before it times out?" given all the retry logic between the broker, credential extraction script, and bind-init. Here's the proposed changes:

  • broker-bind-creds Should drop any retry logic at all and be a reasonably dumb cat script. If it's there, cat it and rm, if not, exit 1.

  • bind-init is effectively "how long should the pod be kept alive waiting for the broker to collect credentials". That's good as is, we probably just want to hardcode a value of 5min there.

  • Push most of the retry logic into the broker. Hard code retry count/interval and have the interval remain static. (Not a function of the retry count). Can be made configurable if desired in the future.

With these changes, we should be able to confidently say: The broker will attempt to extract credentials for N*interval units, and if the credentials haven't been written by that point in time, things have timed out.

Additional requested changes:

  • monitorOutput shouldn't be a go routine, and should return output, err, where err is the bubbled oc exec cmd error. We want to make sure err gets logged somewhere in the stack.
  • Destroy the APB sandbox immediately after Provision

Overall, I think the core of this accomplishes the original goal for the change, and I've seen it passing regression tests. I'd be comfortable merging this and it's apb-examples counterpart pending these changes.

@@ -67,6 +67,13 @@ func Provision(
}

creds, err := ExtractCredentials(podName, instance.Context.Namespace, log)
// We should not save credentials from an app that finds them and isn't
// bindable
if creds != nil && !instance.Spec.Bindable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -490,6 +475,15 @@ func (a AnsibleBroker) Provision(instanceUUID uuid.UUID, req *ProvisionRequest,
return nil, err
}
}

sm := apb.NewServiceAccountManager(a.log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, this will leak an APB sandbox if the Dao set errors out. Need to move the destruction above the SetExtractedCredentails stanza and immediately after the apb.Provision call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

var err error
creds, err = buildExtractedCredentials(credOut)
log.Debug("Calling monitorOutput on " + podname)
go monitorOutput(namespace, podname, credOut, log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to make this a goroutine. It spawns monitor output and then immediately blocks on a single value, so we don't gain anything. The http server is optimized to continue to accept requests concurrently, so a long monitorOutput isn't going to block anything incoming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 at this time and the way we do the work I don't see the need for a go routine to monitor the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func buildExtractedCredentials(output []byte) (*ExtractedCredentials, error) {
stillWaiting := strings.Contains(string(output), "ContainerCreating") ||
strings.Contains(string(output), "NotFound")
func monitorOutput(namespace string, podname string, mon chan []byte, log *logging.Logger) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't going to be run as a goroutine, should remove the channel from the signature and just return (string, error), string being the output from line 60.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// instead of only getting the credentials

for r := 1; r <= CredentialRetries; r++ {
output, _ := RunCommand("oc", "exec", podname, GatherCredentialsCMD, "--namespace="+namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't drop the error, should return it as part of the new signature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 this will totally bite us down the road. I can already foresee the exec erroring out for some odd reason and not having any idea what really happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the log if I capture err with log.Error(err.Error()) during a provision:

[2017-07-13T10:06:23.988-04:00] [ERROR] exit status 1
[2017-07-13T10:06:23.988-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 1: Waiting for container to start
[2017-07-13T10:06:23.988-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 1: exec into apb-5834b548-1611-4e5a-a4ee-a799f2fa3018 failed
[2017-07-13T10:06:25.614-04:00] [ERROR] exit status 1
[2017-07-13T10:06:25.614-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 2: Waiting for container to start
[2017-07-13T10:06:25.614-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 2: exec into apb-5834b548-1611-4e5a-a4ee-a799f2fa3018 failed
[2017-07-13T10:07:08.242-04:00] [ERROR] exit status 137
[2017-07-13T10:07:08.242-04:00] [WARNING] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] Retry attempt 3: exec into apb-5834b548-1611-4e5a-a4ee-a799f2fa3018 failed
[2017-07-13T10:07:11.669-04:00] [ERROR] exit status 1
[2017-07-13T10:07:11.669-04:00] [NOTICE] [apb-5834b548-1611-4e5a-a4ee-a799f2fa3018] APB completed
[2017-07-13T10:07:11.669-04:00] [INFO] Destroying APB sandbox...

Since we combine stderr and stdout [1] into one stream, both are being pushed to output. err is the return value from the command similar to echo $?. So it's probably not a good idea to capture it as anything but info. Also, I don't want to exit on error because the retry logic is in this function. What I can do is move the retry logic out of this function and return on error.
[1] -

output, err := exec.Command(cmd, args...).CombinedOutput()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at moving out the retry logic and it wouldn't be as clean because some of the monitoring pieces would leak into extractCredentials.

jmrodri
jmrodri previously approved these changes Jul 12, 2017
Copy link
Contributor

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better. See @eriknelson explanation of what should change to finish this up.

@jmrodri jmrodri dismissed their stale review July 12, 2017 18:45

meant to be request changes.

Copy link
Contributor

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requesting changes

Ryan Hallisey added 6 commits July 13, 2017 09:39
Change the broker bind output from trailing the logs of
the provision or bind container to exec'ing into the container.
Deprovision was calling ExtractCredentials to get the output of
oc logs into the broker logs.

The broker was calling provision then ExtractCredentials. We don't
need to do that since provision and bind already call it.
@rthallisey
Copy link
Contributor Author

@eriknelson @jmrodri both broker-bind-output patches have been updated.

Copy link
Contributor

@shawn-hurley shawn-hurley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason the build is failing


// Bind credential gathering constants
// 5 seconds x 60 retries = 5 minute timout
WaitTime = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add comments for exported constants

@eriknelson eriknelson changed the title [DO NOT MERGE] Broker bind output rework Broker bind output rework Jul 13, 2017
Copy link
Contributor

@eriknelson eriknelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to provision images built with cooresponding apb-base changes rhscl-postresql-apb, mediawiki123-apb, and bind the two as expected. All requested changes addressed, I think this is ready to go.

fusor/apb-examples#25

@eriknelson
Copy link
Contributor

Travis is failing due to a failing lint:

=================================
              Lint               
=================================
/home/travis/gopath/src/github.com/openshift/ansible-service-broker/pkg/apb/types.go:104:2: comment on exported const WaitTime should be of the form "WaitTime ..."
/home/travis/gopath/src/github.com/openshift/ansible-service-broker/pkg/apb/types.go:107:2: exported const CredentialRetries should have comment (or a comment on this block) or be unexported
The command "./scripts/travis.sh lint" exited with 1.

Going to merge this and immediately follow up with a fix for that. Thank you everyone!

@eriknelson eriknelson merged commit 11792e5 into openshift:master Jul 13, 2017
@rthallisey rthallisey deleted the broker-bind-output branch July 27, 2017 13:13
@rthallisey
Copy link
Contributor Author

#104

shawn-hurley pushed a commit to shawn-hurley/ansible-service-broker that referenced this pull request Nov 2, 2017
* Will automatically use proper relist with an APB push following
a push and/or bootstrap (unless otherwise specified with --no-relist)
jianzhangbjz pushed a commit to jianzhangbjz/ansible-service-broker that referenced this pull request May 17, 2018
* Broker bind output rework

Change the broker bind output from trailing the logs of
the provision or bind container to exec'ing into the container.

* Fix some things after the rebase

* Add namespace to monitorOutput

* Remove uneeded calls to ExtractCredentials

Deprovision was calling ExtractCredentials to get the output of
oc logs into the broker logs.

The broker was calling provision then ExtractCredentials. We don't
need to do that since provision and bind already call it.

* Accept pod completion as success

* Catch a few corner cases and improve logging

* Remove go routine and add a few fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants