Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check to verify injector is operating correctly #3708

Open
wants to merge 2 commits into
base: master
from

Conversation

@tegioz
Copy link
Collaborator

tegioz commented Nov 13, 2019

A new check has been added to validate that the injector is operating correctly. It creates a pod using dry-run and checks if the pod was injected successfully.

CLI screenshots:

Screenshot 2019-11-13 at 18 06 35

Screenshot 2019-11-13 at 18 28 25

Dashboard screenshot:

Screenshot 2019-11-13 at 18 08 37

Mentioned in a comment in #3224

Signed-off-by: Sergio Castaño Arteaga tegioz@icloud.com

@tegioz tegioz self-assigned this Nov 13, 2019
@tegioz tegioz added this to In Progress in L5D-Dashboard-EVOL via automation Nov 13, 2019
@tegioz

This comment has been minimized.

Copy link
Collaborator Author

tegioz commented Nov 13, 2019

Hi @grampelberg

This PR should cover the second part of #3224.

@tegioz tegioz moved this from In Progress to Pending Reviewer Approval in L5D-Dashboard-EVOL Nov 13, 2019
Copy link
Member

adleong left a comment

This is cool! It looks like dry run is not enabled for my cluster:

‼ injector is operating correctly
    error creating pod to check injector: the dryRun alpha feature is disabled

Can we detect if dryRun is available and skip this check if it is not?

charts/linkerd2/templates/web-rbac.yaml Outdated Show resolved Hide resolved
@tegioz tegioz force-pushed the tegioz/check-injector branch from 40decec to 47888c3 Nov 14, 2019
@tegioz

This comment has been minimized.

Copy link
Collaborator Author

tegioz commented Nov 14, 2019

Hi @adleong

‼ injector is operating correctly
    error creating pod to check injector: the dryRun alpha feature is disabled

Can we detect if dryRun is available and skip this check if it is not?

I was thinking that maybe checks could have a requiredFeatures field that we could check when running the checks. That way we could skip them easily if the required features are not enabled. But I haven't found yet anything in the k8s api to check what feature gates are enabled in the cluster.

{
    description: "injector is operating correctly",
    requiredFeatures: []featuregate.Feature{
        features.DryRun,
    },
}

Another option would be to do something similar but based on the Kubernetes cluster version. The problem is this doesn't seem very reliable because users can disable feature gates and we wouldn't be covering when features are in alpha either.

Going back to the first solution, if we can't find a way to check for the feature gates enabled, we could build something ourselves to check for the ones the checks depend of. In this case it'd be just to check DryRun. I could try to just use it attempting to create a pod and based on the error returned track the feature as enabled or disabled. This functionality could be packaged as something like FeatureGateChecker.

The check is marked as a warning anyway, so it won't make the checks fail. And the error returned is informing the user why it didn't work, which can lead him to enable that feature gate (he wouldn't know if we just skip it). Also, the minimum k8s version supported is 1.12, in which DryRun is alpha. Starting 1.13 it should be beta and be enabled by default, so eventually this may not even be a problem at all.

@tegioz tegioz force-pushed the tegioz/check-injector branch from 47888c3 to 4752f26 Nov 14, 2019
@grampelberg

This comment has been minimized.

Copy link
Member

grampelberg commented Nov 14, 2019

Oh man, this is so awesome! There are quite a few users who will really appreciate this check.

This functionality could be packaged as something like FeatureGateChecker.

I think having this functionality will be pretty great. We're starting to run into features that are only enabled on modern clusters which we'd like to use and can't because we have to stick to the lowest common denominator right now. That said, I think we should tackle that as a separate PR and spend some time thinking about what we'd like to do. I'm also a little unconvinced it is possible, last time I played around with this, it required getting the api-server flags themselves and it wasn't possible to do in an automated fashion.

Starting 1.13 it should be beta and be enabled by default, so eventually this may not even be a problem at all.

I'm good bumping our required version up to 1.13 as 1.14 is the earliest version supported by k8s itself right now.

@alpeb
alpeb approved these changes Nov 14, 2019
Copy link
Member

alpeb left a comment

This looks great to me 👍 Plus I learnt about unstructured!

@alpeb

This comment has been minimized.

Copy link
Member

alpeb commented Nov 14, 2019

Also please rebase against master one more time to have those tests pass.

Copy link
Member

zaharidichev left a comment

This looks pretty awesome! TIOLI: You can add unit test to healthcheck_test.go although I am not really sure it will bring that much value.

L5D-Dashboard-EVOL automation moved this from Pending Reviewer Approval to Approved Nov 15, 2019
@tegioz tegioz force-pushed the tegioz/check-injector branch 2 times, most recently from 4df6f89 to 61e3502 Nov 15, 2019
Copy link
Member

adleong left a comment

I can see how this is certainly a very useful check that will be helpful to a lot of folks. However, I want to point out that we're crossing a doorway here because this is the first time (as far as I know) that the CLI will need write-permissions on the cluster. Even though we're only operating in dry-run mode, I think this is a big deal and violates our initial design choices to never have the CLI do any kind of write operation (or more specifically, any operation that requires write privileges).

I'm trying to be practical and acknowledge that A) no write operation is actually performed and B) this is a useful feature; but I just can't shake the feeling that this sets a precedent and starts us down a path that is different from the initial vision for the CLI.

Anyway, I think we should be careful here and make sure that the benefit we get justifies eroding the conceptual boundaries between the responsibilities of the components.

@grampelberg

This comment has been minimized.

Copy link
Member

grampelberg commented Nov 22, 2019

@adleong I thought we'd split this into two pieces - checking connectivity and actually running a test. I'm in agreement with you that we shouldn't be modifying cluster state from the CLI currently. Let's chat in person to get a better grasp on what's going on here.

@grampelberg grampelberg self-assigned this Dec 2, 2019
@tegioz tegioz moved this from Approved to In Progress in L5D-Dashboard-EVOL Dec 4, 2019
@tegioz tegioz force-pushed the tegioz/check-injector branch from 61e3502 to f18be4f Dec 4, 2019
Mentioned in a comment in #3224

Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
@tegioz tegioz force-pushed the tegioz/check-injector branch from f18be4f to 1b2f284 Dec 4, 2019
Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
@tegioz

This comment has been minimized.

Copy link
Collaborator Author

tegioz commented Dec 4, 2019

Hi @adleong

I've created a new checks category called LinkerdControlPlaneDryRunChecks and added the injector check to it as suggested in #3701.

At the moment it's enabled by default for the CLI, but users can opt-out using --dry-run=false. Happy to leave it disabled by default if that makes more sense. On the dashboard side, I've included this new category but we can remove it if that fits better.

If this approach works I can do something similar for #3701.

Thanks!

@grampelberg grampelberg assigned adleong and unassigned grampelberg Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
L5D-Dashboard-EVOL
  
In Progress
5 participants
You can’t perform that action at this time.