Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor user experience when a channel provisioner is missing #779

Closed
bbrowning opened this issue Feb 4, 2019 · 11 comments
Closed

Poor user experience when a channel provisioner is missing #779

bbrowning opened this issue Feb 4, 2019 · 11 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@bbrowning
Copy link
Contributor

I've helped users debug the scenario where a provisioner isn't installed a couple of times now and it's not obvious that's what the problem is unless you know what to look for.

Expected Behavior

Creating a Channel that references an unknown/uninstalled provisioner provides me some feedback that the provisioner is not installed. Either immediate feedback at the time I create the Channel via the webhook doing some checks or feedback that surfaces as a Status on the Channel from something that monitors and marks channels that don't have matching provisioners.

Actual Behavior

The Channel creates successfully, ends up with no Status whatsoever, and when trying to use that Channel you have to poke through other parts of the system to see what went wrong.

Steps to Reproduce the Problem

  1. Deploying Eventing from release or source but don't deploy the in-memory channel provisioner
  2. Create a channel that uses the in-memory provisioner
@evankanderson
Copy link
Member

We should have a backup controller which adds some clue-ifying Conditions (like Ready=False and OwnerByProvisioner=False after the object has been live for 10+ minutes). We could also do this with a controller that looks for the correctly-named CCP and fills in the status if the CCP is not found.

/kind bug
/kind good-first-issue

@knative-prow-robot knative-prow-robot added kind/bug Categorizes issue or PR as related to a bug. kind/good-first-issue labels Feb 5, 2019
@evankanderson
Copy link
Member

/milestone 0.4.0

@knative-prow-robot
Copy link
Contributor

@evankanderson: The provided milestone is not valid for this repository. Milestones in this repository: [v0.4.0]

Use /milestone clear to clear the milestone.

In response to this:

/milestone 0.4.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@evankanderson
Copy link
Member

/milestone v0.4.0

@knative-prow-robot knative-prow-robot added this to the v0.4.0 milestone Feb 5, 2019
@vaikas
Copy link
Contributor

vaikas commented Feb 7, 2019

Another option might be to have the webhook set the Status to a "Failure" condition that then gets cleared by an owning controller? Not sure if that's any simpler or would avoid some potential races with multiple controllers potentially owning an object.

@knative-prow-robot
Copy link
Contributor

@akashrv: GitHub didn't allow me to assign the following users: akashrv.

Note that only knative members and repo collaborators can be assigned.
For more information please see the contributor guide

In response to this:

/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@akashrv
Copy link
Contributor

akashrv commented Feb 11, 2019

/assign

akashrv added a commit to akashrv/eventing that referenced this issue Feb 25, 2019
Summary of issue: If we create a Channel with a non-installed channel
provisioner that then the channel gets created but events never flow through.
There is no way for user to know what is wrong in the eventing pipleine
This fix:
1. Updates the Validation Admission Webhook for Channel and fails
channel creation in case the ChannelProvisioner doesn't exist. It doesn't
call the API server, and instead uses a cache from the Informer on
ChannelProvisioners.
2. Added check to ensure that Channel provisioner has to be of the kind
ClusterChannelProvisioner.
akashrv added a commit to akashrv/eventing that referenced this issue Feb 26, 2019
This reverts commit 60665a2.
akashrv added a commit to akashrv/eventing that referenced this issue Feb 26, 2019
akashrv added a commit to akashrv/eventing that referenced this issue Feb 26, 2019
@akashrv
Copy link
Contributor

akashrv commented Feb 27, 2019

After digging around this issue here is a proposal:

  1. We will add a check in channel validation webhook that the provisioner specified is of apiVersion: eventing.knative.dev/v1alpha1 and kind: ClusterChannelProvisioner.
  2. We will not check if the provisioner is installed or not, and here is the reasoning:

It seems channel provisioners exist so that a "developer" can query and check what kind of eventing provisioners are installed. (There is also another probable reason that is to know whether the provisioned infra is ready or not and we will discuss that later)
When the "Operator" installs a provisioner infra such as the one in in-memory-channel.yaml, the yaml file creates the channel controllers, the supported provisioner(s) and anything else if needed such as dispather service --> Will refer to this as "provisioner infra".
Once the operator installs the provisioner infra a developer can now query the clusterchannelprovisioners and know that the infra is there and can start creating channels to point to it.

However, if for some reason the ClusterChannelProvisioner is deleted (not the infra, but just the provisioner) then the pipelines continue to work and that is what happens today, except for kafka and in-memory***. I am assuming in future this could be fixed with different authorization on artifacts created by provisioner infra.

So the existence of ClusterChannelProvisioner will not affect the data plane activities, and hence there is no need for any controller to watch the channel and verify that the provisioner specified is installed or not

Q1. What about orphaned channels due to any reason including invalid provisioner.
For any kubernetes object that is not being watched or never ever watched, the "Status" field doesn't exist. So any channel that doesn't have the Status field is one that is not watched and hence not ready for anything. No need to create new conditions IMO.

Q2. What about Condition.Ready for installed provisioner.
This needs more discussion. The bigger question is what does the Condition.Ready on a provisioner mean? Does it mean that the provisioner infra is ready or does it mean that provisioner infra is installed and someone is watching the provisioner? or in other words is the control-plane ready or the data plane ready? My opinion is that since "ClusterChannelProvisioner" resource object is meant for a developer to know that the infra is installed, Installed=Control-plane-readiness and Condition.Ready=Data-plane-readiness. Would like to hear others' thoughts on this.
Based on what we decide as the meaning of Ready, then we can decided whether there needs to be a controller for ClusterChannelProvisioners and if so, then what should it do.

Q3. What about the user case where a developer creates a Channel with say in-memory provisioner but the provisioner was never installed by the operator.

  • In such a case we should have a webhook that doesn't allow creation of such channels. There have been concerns about late binding, however the provisioner is control plane component and the channel is data plane component. I dont believe late binding is necessary in this case and not having it will affect anyone. Provisioner gets installed when someone like an operator installs the provisoner infra using one single yaml fro a official release. Where as channels will be created by developers post provisioning.

This will help us simplify things and not add unnecessary dependencies between objects and do tricky things through webhooks and or multiple controllers watching same object.

Thoughts/Suggestions/Concerns: Please share. For Google folks (lets talk in-person) and then we can summarize the minutes-of-meeting here. (just prefer in-person comm rather than writing :))

***In Kafka and in-memory there is a clusterchannelprovisionercontroller that creates a K8s/Service for the dispatcher. While rest of the infra is created with the yaml when the provisioner is installed.
3. We will remove this code and move the K8s/Service creation into the yaml along with rest of the artifacts.

@bbrowning
Copy link
Contributor Author

So, as a user, when I create a Channel that specifies a provisioner that is not installed, how will I know I did something wrong? And how will I know how to fix it?

akashrv added a commit to akashrv/eventing that referenced this issue Mar 2, 2019
and update its status in case the channel is not being watched by any controller
This could happen if the end user creates a channel but doesn't install the
provisioner.
Issue#779: knative#779
@akashrv
Copy link
Contributor

akashrv commented Mar 2, 2019

@bbrowning: Please see the above PR.
Created a default channel controller. This controller will watch every Channel and set its ProvisionerInstalled condition to false if there is no provisioner that watches the channel
This condition will be set to true by individual controllers when they start watching the Channel
There were concerns about multiple controllers trying to update the status of the same object. However this is not a concern with the current implementation and it seems there are multiple cases in kubernetes where this is done.
The API server Update API support optimistic concurrency based on resource version. When using controller client to update, if the object on server side changes after reading the object and before updating it in the controller, then the update api will fail with CONFLICT error. This is handled by re-queueing the request for reconcile.
I have verified this behavior with current code on a GKE cluster.

@vaikas vaikas modified the milestones: v0.4.0, v0.5.0 Mar 6, 2019
knative-prow-robot pushed a commit that referenced this issue Mar 7, 2019
* Creating a default channel controller that will watch every channel
and update its status in case the channel is not being watched by any controller
This could happen if the end user creates a channel but doesn't install the
provisioner.
Issue#779: #779

* Removed commetned lines

* Fixed some typos

* Changes based on comments on code review (pull request)

* Moved the code to set ChannelConditionProvisionerInstalled=True to Initialize()
function from MarkProvisioned()

* Changed all %v to %s when logging strings

* Updating some formatting issues called out in PR
@bbrowning
Copy link
Contributor Author

Thanks for improving this experience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants