475 last operation description #619

maleck13 · 2018-01-08T17:12:23Z

Describe what this PR does and why we need it:
broker implementation for issue 475 last operation description
Note
This pr is based on top of the previous PR #610 so probably better to review once that one is closed out.
Changes proposed in this pull request

Add new channel that sends JobState changes asynchronously to allow the state to be updated ready for the last operation polling endpoint to read.
Add sets of tests for various places that have been changed.
change the watch pod to use the watch API
Add StartSyncJob to unify the broker Job API

Does this PR depend on another PR (Use this to track when PRs should be merged)
~~depends-on #610~~
depends-on #671

Which issue this PR fixes (This will close that issue when PR gets merged)
fixes #475

coveralls · 2018-01-08T17:54:02Z

Changes Unknown when pulling 0f253da on maleck13:475-last-operation-description into ** on openshift:master**.

maleck13 · 2018-01-09T08:51:58Z

@jmrodri I know you are working async bind at the moment. I think this PR should wait and be rebased on your work once it lands. Likely I will need to make some changes to support last operation updates in async bind operations also.

rthallisey · 2018-01-09T16:51:30Z

Related #542

coveralls · 2018-01-10T09:52:40Z

Changes Unknown when pulling 674aebb on maleck13:475-last-operation-description into ** on openshift:master**.

maleck13 · 2018-01-23T12:50:07Z

The broker code is in good shape I think (at least ready for review). Still need help understanding how to test the module

maleck13 · 2018-01-23T17:57:35Z

Test Case:

Build broker image from this branch
point it at the maleck13 org
provision keycloak apb
in the namespace where the keycloak instance was provisioned you should see the service instance if you expand it, you can see the details of the last operation
Another option is to watch the serviceinstance. Should see something like:

status: Provision request for ServiceInstance in-flight to Broker
status: The instance is being provisioned asynchronously
status: The instance is being provisioned asynchronously (created postgres service)
status: The instance is being provisioned asynchronously (created keycloak route)
status: The instance was provisioned successfully

Example code for the apb is here https://github.com/maleck13/keycloak-apb/blob/add-last-ops/roles/provision-keycloak-apb/tasks/provision-keycloak.yml#L11
Note the above code is only example code as I had to add the last_operation.py file manually to test it. However once PR ansibleplaybookbundle/ansible-asb-modules#9 is merged I think it will then be available to all new APBs

rthallisey

Nice! Lot's of new tests!

I'm going to look into pulling last_operation info during the $actions. It would be cool to see this running all the time in the gate.

rthallisey · 2018-01-23T21:40:33Z

pkg/apb/watch_pod.go

+		}
+		send(state)
+		podStatus := pod.Status
+		log.Debugf("pod [%s] in phase %s", podName, podStatus.Phase)
 		switch podStatus.Phase {
 		case apiv1.PodFailed:
 			if errorPullingImage(podStatus.ContainerStatuses) {


I think we want a w.Stop() here too before we return.

rthallisey · 2018-01-23T21:44:47Z

pkg/apb/watch_pod.go

-
-		time.Sleep(time.Duration(apbWatchInterval) * time.Second)
+		if podEvent.Type == watch.Deleted {
+			w.Stop()


Will we only get in here if the pod is deleted before either PodSucceeded or PodFailed? In other words, I think you can produce this when you delete the apb during execution. If that's the case, I think we should either return an error or log that the apb was deleted. What do you think?

Yes I think you are correct. In most cases the pod would succeed or fail. So it it was somehow deleted it would likely be an error and it should be reported

rthallisey · 2018-01-23T21:53:45Z

pkg/broker/binding_job.go


-	log.Debug("bindjob: returned from apb.Bind")
+	//read our status updates and send on updated JobMsgs for the subscriber to persist
+	for su := range stateUpdates {


if we're blocking on reading the bind status updates is this really async? Do we have to block on this?

So previously the bind function itself was synchronous and would block job function. Now we have put that into its own go routine but we still need to wait until the Job is complete within the Run function. So we block on the status updates channel as this is only closed once the Bind job is complete
The Job itself though is started in its own go routine within the work engine and so is still asynchronous
https://github.com/openshift/ansible-service-broker/pull/619/files#diff-bcf626f58af278748816aca2379c1928R53

rthallisey · 2018-01-23T22:52:30Z

pkg/broker/unbinding_job.go

-
+	go func() {
+		defer func() {
+			metrics.UnbindingJobFinished()


There seems to be the same general workflow for all the actions. Maybe at some point we can break apart the Run function into multiple functions to make the API more granular. It's something we can tackle down the road.

Or potentially abstract it into a single run function as they all effectively do the same thing

maleck13 · 2018-01-24T09:43:28Z

@rthallisey I have addressed your requested changes and comments

rthallisey · 2018-01-24T17:29:46Z

@maleck13 can you rebase this to pull in some fixes to travis?

eriknelson · 2018-02-07T02:55:26Z

Looks like this needs a rebase @maleck13. Trying to get through outstanding PRs this week, happy to come back to this when it's green.

rthallisey · 2018-02-12T17:14:56Z

@eriknelson can you have another look at this?

eriknelson · 2018-02-12T17:17:50Z

@maleck13 @rthallisey grabbing lunch and I will review. Thanks guys.

shawn-hurley · 2018-02-12T19:15:28Z

pkg/apb/watch_pod.go

 	log.Debugf(
 		"Watching pod [ %s ] in namespace [ %s ] for completion",
 		podName,
 		namespace,
 	)

-	k8scli, err := clients.Kubernetes()


Why don't we get the v1.PodInterface in this method?

I think that is better because it now ever caller has to get the client for the only purpose of passing into a function. I think this encapsulates that behaviour and if it errors we bubble up that error.

If the only reason we are doing it is for testing then I think we need to do a better job of mocking out the core componets rather then passing values into functions. I think that we got bite by doing that before.

If we want unit tests for a method like this, then at some point we will need to use an interface to allow us to mock the interactions with external dependencies.
Having a package level function used internally as part of this method, means we cannot mock it out.
Really this is just a simple form of dependency injection. The package level functions can make this tricky. I would prefer to have a unit test for the logic here, than not have one? I took a similar approach to the provision function here
https://github.com/maleck13/ansible-service-broker/blob/4f0301dac3dae580cb5f393baafaa70a2d859073/pkg/broker/broker.go#L377
Except it is passed in via the constructor. It would be a bigger refactor to
In this case to have the dependency passed as part of a constructor as it would likely mean changes all the way up the call chain.

I think we need to do a better job of mocking out the core componets rather then passing values into functions

On that point though, longer term it may be worth discussion to change some of the package functions to be struct methods and make use of the kubernetes.Interface in place of the *client.ClientSet concrete type.
If we use the kubernetes.Interface and refactor to allow it to be injected as part of the main method via constructors and dependency injection, it will allow us to have a very powerful set of unit tests as we can control the behavior of the tests by mocking out just the client as generally this is what underpins all of the external communications. This becomes even more true if we move to CRDs instead of a separate etcd.

I can create a follow on issue for this.

eriknelson · 2018-02-16T17:58:44Z

@maleck13 Thanks again for your contribution. Had a chance to sit down with @shurley this morning and talk through this one. Overall, it's looking pretty good and it's absolutely a feature we need
to bring in for 3.10. Have a couple high level requests for change:

A statusUpdates channel has been introduced into the various apb.<action>
signatures as a way for them to push status update events out to subscribers.
Instead of callers creating the channels and passing them as arguments,
I think the function parameters should remain the same as they are today,
and the function should return a channel that accepts a StatusMessage interface
that lives in the apb pkg:

// apb/types.go
type StatusMessage interface {
  String() string
}

--

// Example provision.go
// Provision - will run the abp with the provision action.
func Provision(
        instance *ServiceInstance,
) (string, *ExtractedCredentials, error, <-chan apb.StatusMessage){
  // Create channel
  // Do work
  // Return previous args + channel
}

The reasoning is that in the near future, we're expecting to break the apb
package out into its own vendorable library for other clients, and we would
like the apb pkg (and the actions) to own the channel resource, not the caller.
Additionally, it's reasonable that callers may not care about statusUpdates
at all, so forcing a caller to deal with that at the function signature level
is undesirable. With the proposed change, the caller can simply ignore the
channel with a _ variable.The StatusMessage interface decouples the apb
pkg from the broker pkg, which is also important (as opposed to a JobState).

There does not appear to be much use for the StartNewSyncJob variant.
I can understand that there might be a desire for status updates to occur with
sync requests as well, but in practice, the job is done with a completion message
that overriddes any incremental status updates before the token is even returned
to the client. Ultimately, we're unable to imagine a use-case for sync updates.
To reduce unnecessary complexity, let's revert the split and keep the existing
approach, with a single async start job, and have the sync branches of the broker.go
methods simply call the apb.<action>. They can ignore the returned channel from 1).

Happy to continue the discussion on these points, otherwise I'll make sure we stay on top of updates so things this isn't sitting in a PR for too much longer.

EDIT: @shawn-hurley raised an important point re: the return channels that requires an amendment to this. Hashing that out and I will update this comment.

eriknelson

See above.

jmrodri

Based on some discussions.

eriknelson · 2018-02-16T19:33:28Z

Background: The issue with 1) is that the channel is not going to be returned to a caller until the function has finished, which suggests the need to wrap the work in a go routine...which requires a channel to allow the caller to synchronize the work.

So, this PR has opened a really productive dialogue around what, exactly, we want the public interface for libapb to look like. The conclusion is that we would like 2 suites of actions that allow folks to run APBs either asynchronously, or synchronously: Provision(...) <-chan StatusMessage and ProvisionSync(...). The former being our preferred, recommended way for starting long running work, and getting event-driven updates as to their status.

This really isn't in the scope of this PR, so in an effort to bring in your work, it makes sense to merge this, log some issues, and fix them as a follow up.

We do still need a rebase.

eriknelson · 2018-02-16T19:37:01Z

@maleck13 please rebase and we're ready to bring this into master.

fix type in SubscriberDAO after code review

rebase and update the unbind and bind jobs to accept state updates add the sync and async job start to unbind and bind. Minor updates address review feedback add a w.Stop on pod failed and return error on pod deleted

maleck13 · 2018-02-17T22:23:31Z

@eriknelson Great excited to have this finally merged.

Thanks for your detailed comments. Adding some thoughts of my own here:

Background: The issue with 1) is that the channel is not going to be returned to a caller until the function has finished, which suggests the need to wrap the work in a go routine...which requires a channel to allow the caller to synchronize the work.

It's possible I am restating the problem.

I generally agree that it would be nice to return a channel from the Job. I played around with this option early on when working on the feature, but it presented me with some issues:

The caller in this case is the broker, which would mean the broker would need to read the status from the channel and write the state via the DAO, this seemed counter to the current design and role of the subscribers.
If not option 1 then the broker.go methods would need access to the workmsg channel and funnel the status messages to the workmsg channel so the subscribers could do their work. This also seemed counter to the current design. Perhaps it could be an option though however as you call out, I think changing the Jobs to return a channel needs some more thought (perhaps a proposal)?

So I went with what seemed the least disruptive approach that kept the broker design the same.

There does not appear to be much use for the StartNewSyncJob variant.
I can understand that there might be a desire for status updates to occur with
sync requests as well, but in practice, the job is done with a completion message
that overriddes any incremental status updates before the token is even returned
to the client. Ultimately, we're unable to imagine a use-case for sync updates.
To reduce unnecessary complexity, let's revert the split and keep the existing
approach, with a single async start job, and have the sync branches of the broker.go
methods simply call the apb.. They can ignore the returned channel from 1).

Interesting, I had actually hoped that change would reduce the complexity. The status updates are of no value here, you are correct, as there is no opportunity for them to be seen.
The goal of the change was to have the same approach to both sync and async jobs and to have the work engine be the only place that jobs were handled. It seemed the broker.go needed to know too much about apb methods (it had to know that calling apb.Provision was a synchronous action) . So adding a StartSyncJob method seemed to ensure that the broker.go didn't know too much about the internals of the apb package but instead relied on the API provided.

Also happy to chat further or jump on call.

eriknelson · 2018-02-17T23:21:36Z

The caller in this case is the broker, which would mean the broker would need to read the status from the channel and write the state via the DAO, this seemed counter to the current design and role of the subscribers.

I actually initially thought we weren't using the subscribers for this, but I was incorrect, so I have to thank you for preserving that. Definitely an architecture we want to keep, especially with a revamped engine. I'm not sure I understand this statement though; regardless of whether or not the caller creates the channel and passes it in, or if the apb package returns the channel, the caller (broker job) is going to have to monitor that channel for messages, wrap them in JobMessages, and pass along to the Subscriber?

It seemed the broker.go needed to know too much about apb methods (it had to know that calling apb.Provision was a synchronous action).

I'm personally not too concerned about this. As we continue, the OSB domain is going to get pushed into some kind of a Broker Framework, and the APB domain is getting pushed into a libapb. Ultimately, this repo is going to turn into the glue between those two worlds, so IMO, that's exactly what this binary is going to be concerned with.

Anyway, we're of the opinion this PR adds quite a bit of value as-is, and most of the feedback is more appropriate for a follow up that's part of the work moving in the direction I described. I'm planning on posting a proposal with some code on top of this for review from the community, since I think I can better explain myself with actual code. Thanks again for your help!

* add test cases for provision subscriber and job fix type in SubscriberDAO after code review * initial pass at last operation description implementation rebase and update the unbind and bind jobs to accept state updates add the sync and async job start to unbind and bind. Minor updates address review feedback add a w.Stop on pod failed and return error on pod deleted

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 8, 2018

rthallisey added the 3.10 | release-1.2 Kubernetes 1.10 | Openshift 3.10 | Broker release-1.2 label Jan 9, 2018

maleck13 mentioned this pull request Jan 9, 2018

Proposal: using interfaces for DAO operations to allow simple unit tests #542

Closed

maleck13 force-pushed the 475-last-operation-description branch from 0f253da to 674aebb Compare January 10, 2018 09:12

jmrodri self-requested a review January 12, 2018 04:21

maleck13 mentioned this pull request Jan 23, 2018

custom error messaging #675

Closed

maleck13 force-pushed the 475-last-operation-description branch from 674aebb to 3e8ca31 Compare January 23, 2018 12:30

maleck13 changed the title ~~WIP 475 last operation description~~ 475 last operation description Jan 23, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2018

maleck13 force-pushed the 475-last-operation-description branch from 3e8ca31 to 0175a9a Compare January 23, 2018 13:11

maleck13 force-pushed the 475-last-operation-description branch 2 times, most recently from 4507c4c to 384da98 Compare January 23, 2018 18:05

rthallisey suggested changes Jan 23, 2018

View reviewed changes

rthallisey mentioned this pull request Jan 23, 2018

Bug 1536629 - Send job msg immediately as job starts. To set initial JobState #671

Merged

rthallisey approved these changes Jan 24, 2018

View reviewed changes

maleck13 force-pushed the 475-last-operation-description branch from 344cfa1 to a3291d9 Compare January 24, 2018 18:22

maleck13 force-pushed the 475-last-operation-description branch 2 times, most recently from 61df7eb to 4f0301d Compare February 10, 2018 22:20

shawn-hurley requested changes Feb 12, 2018

View reviewed changes

philipgough mentioned this pull request Feb 13, 2018

WIP: Force delete a service-catalog resource #751

Closed

eriknelson added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2018

eriknelson suggested changes Feb 16, 2018

View reviewed changes

shawn-hurley approved these changes Feb 16, 2018

View reviewed changes

jmrodri approved these changes Feb 16, 2018

View reviewed changes

eriknelson approved these changes Feb 16, 2018

View reviewed changes

maleck13 added 2 commits February 17, 2018 21:17

add test cases for provision subscriber and job

11225fe

fix type in SubscriberDAO after code review

initial pass at last operation description implementation

1cfb140

rebase and update the unbind and bind jobs to accept state updates add the sync and async job start to unbind and bind. Minor updates address review feedback add a w.Stop on pod failed and return error on pod deleted

maleck13 force-pushed the 475-last-operation-description branch from 4f0301d to 1cfb140 Compare February 17, 2018 21:45

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2018

eriknelson merged commit b7e109d into openshift:master Feb 17, 2018

eriknelson mentioned this pull request Feb 17, 2018

Public apb action methods should be thoughtfully considered, and reworked. #771

Closed

maleck13 deleted the 475-last-operation-description branch February 18, 2018 09:01

eriknelson mentioned this pull request Feb 18, 2018

apb pkg public interface overhaul #773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

475 last operation description #619

475 last operation description #619

maleck13 commented Jan 8, 2018 •

edited by rthallisey

coveralls commented Jan 8, 2018

maleck13 commented Jan 9, 2018

rthallisey commented Jan 9, 2018 •

edited

coveralls commented Jan 10, 2018

maleck13 commented Jan 23, 2018

maleck13 commented Jan 23, 2018 •

edited

rthallisey left a comment

rthallisey Jan 23, 2018

rthallisey Jan 23, 2018

maleck13 Jan 24, 2018

rthallisey Jan 23, 2018

maleck13 Jan 24, 2018 •

edited

rthallisey Jan 23, 2018

maleck13 Jan 24, 2018

maleck13 commented Jan 24, 2018

rthallisey commented Jan 24, 2018

eriknelson commented Feb 7, 2018

rthallisey commented Feb 12, 2018

eriknelson commented Feb 12, 2018

shawn-hurley Feb 12, 2018

maleck13 Feb 12, 2018 •

edited

eriknelson commented Feb 16, 2018 •

edited

eriknelson left a comment

jmrodri left a comment

eriknelson commented Feb 16, 2018 •

edited

eriknelson commented Feb 16, 2018

maleck13 commented Feb 17, 2018

eriknelson commented Feb 17, 2018

475 last operation description #619

475 last operation description #619

Conversation

maleck13 commented Jan 8, 2018 • edited by rthallisey

coveralls commented Jan 8, 2018

maleck13 commented Jan 9, 2018

rthallisey commented Jan 9, 2018 • edited

coveralls commented Jan 10, 2018

maleck13 commented Jan 23, 2018

maleck13 commented Jan 23, 2018 • edited

rthallisey left a comment

Choose a reason for hiding this comment

rthallisey Jan 23, 2018

Choose a reason for hiding this comment

rthallisey Jan 23, 2018

Choose a reason for hiding this comment

maleck13 Jan 24, 2018

Choose a reason for hiding this comment

rthallisey Jan 23, 2018

Choose a reason for hiding this comment

maleck13 Jan 24, 2018 • edited

Choose a reason for hiding this comment

rthallisey Jan 23, 2018

Choose a reason for hiding this comment

maleck13 Jan 24, 2018

Choose a reason for hiding this comment

maleck13 commented Jan 24, 2018

rthallisey commented Jan 24, 2018

eriknelson commented Feb 7, 2018

rthallisey commented Feb 12, 2018

eriknelson commented Feb 12, 2018

shawn-hurley Feb 12, 2018

Choose a reason for hiding this comment

maleck13 Feb 12, 2018 • edited

Choose a reason for hiding this comment

eriknelson commented Feb 16, 2018 • edited

eriknelson left a comment

Choose a reason for hiding this comment

jmrodri left a comment

Choose a reason for hiding this comment

eriknelson commented Feb 16, 2018 • edited

eriknelson commented Feb 16, 2018

maleck13 commented Feb 17, 2018

eriknelson commented Feb 17, 2018

maleck13 commented Jan 8, 2018 •

edited by rthallisey

rthallisey commented Jan 9, 2018 •

edited

maleck13 commented Jan 23, 2018 •

edited

maleck13 Jan 24, 2018 •

edited

maleck13 Feb 12, 2018 •

edited

eriknelson commented Feb 16, 2018 •

edited

eriknelson commented Feb 16, 2018 •

edited