Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement workqueue #63

Merged
merged 2 commits into from
Aug 22, 2017
Merged

Implement workqueue #63

merged 2 commits into from
Aug 22, 2017

Conversation

asymmetric
Copy link
Contributor

@asymmetric asymmetric commented Aug 18, 2017

This PR adds a ServiceGroup workqueue.

Following the guidelines, all event handlers (SG and Pod for now) do is retrieve the appropriate SG key and add a job to the queue.

The worker then consumes these jobs, and performs the necessary operations.

Closes #26, #51 and obsoletes #56.

@asymmetric asymmetric requested a review from lilic August 18, 2017 14:38
@asymmetric
Copy link
Contributor Author

@lilic are your E2E test going to cover the functionality of the peer IP handling, e.g. that we correctly replace the IP of a dead pod?

@lilic
Copy link
Contributor

lilic commented Aug 18, 2017

@asymmetric My idea was we just need to test that a CM was updated after our Pod died, not sure we need to care about if it was the correct IP. Think that might be more unit tests, wdyt?

@lilic lilic mentioned this pull request Aug 18, 2017
Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we can merge this I would say we should take care of:

  • Watching Deployments to avoid unnecessary calls.
  • Save store/cache of all the resources we do a Get/List call to. Think that only needs to be SG, Pods and Deployments for now.


queue workqueue.RateLimitingInterface

store cache.Store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can describe this better and call it something like SGStore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it's the only store though :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but it still stores just SG cache so we can still rename it to describe what it stores. Its up to you I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll leave it like this, if you don't mind. I'll add some field comments to explain what each field does instead.

UpdateFunc: hc.onUpdate,
DeleteFunc: hc.onDelete,
AddFunc: hc.enqueueSG,
UpdateFunc: func(old, new interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we move this to a separate func?

DeleteFunc: hc.onDelete,
AddFunc: hc.enqueueSG,
UpdateFunc: func(old, new interface{}) {
oldSG, ok1 := old.(*crv1.ServiceGroup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we separate to a func, we can then also return early here and log the error. So:

if !ok1 {
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that even without extracting to a named function though. I quite like this small anonymous function (and I stole it from here ;))

return true
}

func (hc *HabitatController) syncServiceGroup(key string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just call this sync? As you mention in the comment we react on pod events as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the example of other controllers, and I guess the idea is that this function syncs the resource this controller cares about, including all depdencies.

}

func (hc *HabitatController) syncServiceGroup(key string) error {
// we got woken up, so either:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can shorten this comment to something like: (Feel free to adjust the comment.)
// We react on every watched resource event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That slipped in, wasnt' planning on having it committed :)

But I guess if it's reworded it can be useful, as you say.

return err
}
if !exists {
// The SG was deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be true if Pod was deleted as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the comment, the store naming confused me.

return hc.handleServiceGroupDeletion(key)
}

// it was either created or updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:

// Resource was either created or updated.

level.Error(hc.logger).Log("msg", err)
return
// Create Deployment, if it doesn't already exist.
d, dErr := hc.config.KubernetesClientset.AppsV1beta1Client.Deployments(apiv1.NamespaceDefault).Create(deployment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should introduce watching for deployments as well. After that we can just do a Get here to our cache/store instead of a create to the API. This way we save calls to the API and use our cache better.

As right now from what I can see it will do a Create call is being made every time we update/create our Pod or SG.

FieldSelector: fs.String(),
}

// TODO We can use the store for this query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we do this now. From what I can see we already have access to pod store/cache, we just need to save it? Or is there anything else stopping us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this and other things (like increasing the number of workers or using watchers for other types of resources) can be done as additional PRs later on, as there is already enough going on in this PR.

I see those changes as basically performance optimizations, and I think we can split them up from this PR, which is about core functionality.

Copy link
Contributor

@lilic lilic Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that (although I would prefer we do it in this PR), but then we shouldn't close the issue it relates to, but rather maybe add a list of AC to the issue. Otherwise we can forget about it.

Something like:

  • Introduce WorkQueue
  • Add Deployment watching
  • Store Pod cache

etc.
This was we can just pick up those tasks and keep track of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean #51, this PR does close that issue AFAICT. The features you mentioned are additional nice-to-haves IMO.

Or did you mean another issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add separate issues for the items you mentioned above.

return
}

new, ok2 := newObj.(*apiv1.Pod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new maybe not a reserved name, but I would still like to avoid using that. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this.

@lilic
Copy link
Contributor

lilic commented Aug 21, 2017

I tested this branch and it seems like writing IP is not working the same, compared to master.

When I create two SG one after another, the ConfigMaps contain the same IP.

oldSG, ok1 := oldObj.(*crv1.ServiceGroup)
newSG, ok2 := newObj.(*crv1.ServiceGroup)

if !ok1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move it before newSG, ok2? That way we can return early and not create newSG.

@asymmetric asymmetric force-pushed the asymmetric/queue branch 2 times, most recently from e603102 to 6fc4f71 Compare August 21, 2017 10:08
@asymmetric
Copy link
Contributor Author

Good catch! PTALA.

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested again and it didn't seem to fix the problem I was having earlier. When creating two SG, example1 and example2, both CM contain IP from same SG (in my case example1).

Ignore me, forgot to pull in the latest changes. :)


newCM := newConfigMap(sg.Name, deploymentUID, leaderIP)

var cm *apiv1.ConfigMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this declaration needed?

}))

running := metav1.ListOptions{
FieldSelector: fs.String(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just create fs and ls inline, as it more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this style more readable, and unless there are guidelines going against it, I'd like to keep it.

It's also how we do it elsewhere.

return
// Create Deployment, if it doesn't already exist.
d, dErr := hc.config.KubernetesClientset.AppsV1beta1Client.Deployments(apiv1.NamespaceDefault).Create(deployment)
if dErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not the standard naming of error?

return dErr
}

d, dErr = hc.config.KubernetesClientset.AppsV1beta1Client.Deployments(sg.Namespace).Get(deployment.Name, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why we need to Get, as our Create already returns a deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we're here, the Create has failed, and the deployment already exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I would suggest we restructure this, as right now from first glance its not really clear whats happening.

I would actually reverse those and start by first seeing if our deployment already exists, rather then trying to create it. If it doesn't then create it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we save one call though. If the resource doesn't exist, we just create it, with no need to perform the check first.

I've seen the pattern in other controllers as well, so I'm more inclined on leaving this as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance argument will probably become irrelevant if/when we move to using a Deployments cache, so I think we can refactor this there.

But for now it seems like a good enough reason to me. I can add a comment to make it clearer though.

WDYT?

@asymmetric
Copy link
Contributor Author

Are there any more blockers? If not, can this get a green-light?

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We just shouldn't forget about the followup issues to this PR. Other then that lets just clean up the commits and then 👍

@asymmetric
Copy link
Contributor Author

I've created two issues: #65, #66.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants