Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for --identity-external-issuer=true #3552

Closed

Conversation

zaharidichev
Copy link
Member

@zaharidichev zaharidichev commented Oct 9, 2019

This PR adds support for using cert-manager as the source of issuer certs. When we specify
--identity-external-issuer=true the cert, key and trust anchors are loaded from the k8s secret that should already be present. As cert manager rolls certificates, the identity service monitors the path where the secret is mounted and automatically refreshes them.

Signed-off-by: zaharidichev zaharidichev@gmail.com

@zaharidichev zaharidichev changed the title WIP: Add support for --identity-issuer-mode flag to install cmd Add support for --identity-external-issuer=true Oct 14, 2019
@zaharidichev zaharidichev force-pushed the zd/cert-manager-cli-support branch 2 times, most recently from 049b28c to d0576a1 Compare October 14, 2019 17:19
@olix0r olix0r self-requested a review October 15, 2019 17:49
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zaharidichev, I started looking and everything makes sense to me so far 👍
So far I've only spotted one possible improvement described below. I'll follow up later with more feedback.

@@ -359,6 +362,12 @@ control plane.`,
os.Exit(1)
}

shouldExist := options.identityOptions.identityExternalIssuer
if err := errIfLinkerdIdentityissuerSecretMissing(shouldExist); err != nil && !options.ignoreCluster {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could wrap that condition with if !options.ignoreCluster && shouldExist and then have errIfLinkerdIdentityissuerSecretMissing() have no args and only check when shouldExist is true, to avoid the network calls when shouldExist is false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of fact, I'm unable to run a naked linkerd install because it's forcing me to have the namespace already. This is probably also why the integration tests are failing.

@@ -458,6 +467,11 @@ func (options *installOptions) recordableFlagSet() *pflag.FlagSet {
flags.StringVarP(&options.controlPlaneVersion, "control-plane-version", "", options.controlPlaneVersion, "(Development) Tag to be used for the control plane component images")
flags.MarkHidden("control-plane-version")

flags.BoolVar(
&options.identityOptions.identityExternalIssuer, "identity-external-issuer", options.identityOptions.identityExternalIssuer,
"Whether to use an external identity issuer (false by default)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the standard followed by other commands is (default false)


secret, err := kubeAPI.CoreV1().Secrets(controlPlaneNamespace).Get(k8s.IdentityIssuerSecretName, metav1.GetOptions{})
if (secret == nil || kerrors.IsNotFound(err)) && shouldExist {
return fmt.Errorf("'linkerd-identity-issuer' needs to exist if --identity-external-issuer=true")
Copy link
Member

@alpeb alpeb Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can be a bit more explicit with the 'linkerd-identity-issuer' secret needs to exist if...?

@alpeb
Copy link
Member

alpeb commented Oct 15, 2019

Ref #3531

creds, err := tls.ValidateAndCreateCreds(externalIssuerData.issuerCrt, externalIssuerData.issuerKey)

if err != nil {
log.Fatalf("Failed to read CA from %s: %s", consts.IdentityIssuerSecretName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, log.Fatalf isn't producing any output in the CLI so when there's an error the command just fails with an empty output. I would replace this (and the one below) with something like return nil, fmt.Errorf("Failed to read CA from %s: %s", consts.IdentityIssuerSecretName, err)

return nil, fmt.Errorf("failed to verify issuer credentials for '%s' with trust anchors: %s", fscw.expectedName, err)
}

log.Infof("Loaded issuer cert:\nCert: %s\nKey: %s", creds.EncodeCertificatePEM(), creds.EncodePrivateKeyPEM())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we did this elsewhere, but better not log the private key IMO

@ihcsim
Copy link
Contributor

ihcsim commented Oct 16, 2019

@zaharidichev Might wanna branch off the linkerd/linkerd2 repo, instead of a fork. There are a few things that CI doesn't run on forks, including Docker build and integration test, even though they show up with ✔️.

charts/linkerd2/templates/identity.yaml Outdated Show resolved Hide resolved
cli/cmd/install.go Outdated Show resolved Hide resolved
cli/cmd/install.go Outdated Show resolved Hide resolved
@@ -359,6 +362,12 @@ control plane.`,
os.Exit(1)
}

shouldExist := options.identityOptions.identityExternalIssuer
if err := errIfLinkerdIdentityissuerSecretMissing(shouldExist); err != nil && !options.ignoreCluster {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to add this check to newCmdInstallConfig which corresponds to the linkerd install config subcommand. You can learn more about Linkerd's multi-stage installation here.

The challenge is that recordableFlagSet can't be part of linkerd install config. So you will have to add --identity-external-issuer to allStageFlagSet. Once moved, in order for this new flag to be persisted in the linkerd-config config map, you will have to update the proto/config.proto pb definition, by adding a new field to IdentityContext, which is a nested field inside Global.

You can update the protobuf dependencies and auto-generated code with:

bin/protoc-go.sh

return nil, fmt.Errorf("error fetching external issuer config: %s", err)
}

_, err = kubeAPI.CoreV1().Namespaces().Get(controlPlaneNamespace, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed, since errIfLinkerdIdentityissuerSecretMissing() already perform this check.

}

secret, err := kubeAPI.CoreV1().Secrets(controlPlaneNamespace).Get(k8s.IdentityIssuerSecretName, metav1.GetOptions{})
if secret == nil || kerrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just simplify this to if err != nil to communicate the intent of returning err if it isn't nil, regardless of what the error is.


secret, err := kubeAPI.CoreV1().Secrets(controlPlaneNamespace).Get(k8s.IdentityIssuerSecretName, metav1.GetOptions{})
if secret == nil || kerrors.IsNotFound(err) {
return nil, fmt.Errorf("'linkerd-identity-issuer' needs to exist if --identity-external-issuer=true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer that we don't bury the original err, but to include it in the return value. Why not just:

return nil, err

IIRC if secret doesn't exist, won't err just be something like:

Error from server (NotFound): secrets "foo" not found

At least, that's the behaviour of kubectl.

@@ -246,7 +246,7 @@ func (options *upgradeOptions) validateAndBuild(stage string, k kubernetes.Inter
}
configs.GetGlobal().IdentityContext = toIdentityContext(identity)
} else {
identity, err = fetchIdentityValues(k, idctx)
identity, err = fetchIdentityValues(k, idctx, options.identityOptions.identityExternalIssuer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, options.identityOptions.identityExternalIssuer is reading the value from the CLI. We need to retrieve the value from the linkerd-config config map. You should be able to retrieve it from the configs.GetInstall().GetFlags() slice.

(Note that this will change if we change --identity-external-issuer to a global field in the protobuf definition as suggested above, in order to support linkerd install config.)

@@ -369,7 +370,15 @@ func fetchIdentityValues(k kubernetes.Interface, idctx *pb.IdentityContext) (*ch
}, nil
}

func fetchIssuer(k kubernetes.Interface, trustPEM string) (string, string, time.Time, error) {
func fetchIssuer(k kubernetes.Interface, trustPEM string, externalIssuer bool) (string, string, time.Time, error) {
crtName := k8s.IdentityIssuerCrtName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad that we can't just change k8s.IdentityIssuerCrtName and k8s.IdentityIssuerKeyName to what cert-manager produces, so that we won't need the externalIssuer boolean here and in the identity service. Changing these constants will require an update to the variables defined in values.yaml, which will be a breaking change for some users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, whats even worse is that there is the possibility for this to change over time and if some people are running older/newer versions of cert-manager the names of their secret keys might be different. So I wonder whether we should make all of this configurable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping it the way you have it is fine. Making it configurable can be a double-edged sword because the last thing we want is to support cases where users can accidentally provide some unsupported secret keys in the k/v pairs. If cert-manager changed their API, then it's a conscious effort for us and our users to upgrade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than passing a boolean in, can we just check for both filenames?

}

// StartWatching starts watching the filesystem for cert updates
func (fscw *FsCredsWatcher) StartWatching() error {
Copy link
Contributor

@ihcsim ihcsim Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are trying to do too much in this method. How about instead of sending *tls.CA to the channel, we just send it an "event", and then let identity.Service do most of the TLS-related stuff?

I think there might be a way to make this and the changes to pkg/identity/services.go simpler. I am experimenting with it in this incomplete gist. The initializing and switching of the issuer object will be interesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the diff. I though about something similar in the beginning. We can go both ways but my initial idea was that this makes the service harder to test in case we want to add some extensive testing to it.

Since the service is has quite a bit of logic in there that has to do with issuing certs, we might want to exercise some of these scenarios by having it call into tls.CA created from different certs. If that is the case, we will have to put the issuer certs on disk and have different ones for every test case while pinging the service some dummy event. Supplying the tls.CA via the channel allows us to mock it in a much easier way I think.

Wrt to initializing and swapping the tls.CA, I dont see much of a problem as long as the lock is held. Maybe I am missing smth here? In any way, we can go either direction. Do not really have a strong opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats more interesting to me however is all the error handling in case we have errors from the error channel or invalid issuer certs written to the file system

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of moving the TLS code out of the CredsWatcher is that we can convert it into a generic file watcher package, say pkg/watcher/file.

In terms of testing the identity.Service, I think certain unexported functions (like loadCredentials if we choose to move it out of the file watcher) can be mock by using function pointers. There are some examples in cli/cmd/install.go. Testing Certify() will be difficult even in its current state.

If that is the case, we will have to put the issuer certs on disk and have different ones for every test case while pinging the service some dummy event.

Sounds like a good integration test. We can create a separate issue for that, as I assume there will be quite a bit of new test code.

Whats more interesting to me however is all the error handling in case we have errors from the error channel or invalid issuer certs written to the file system

Good point. My first thought is first of all, the file watcher shouldn't worry about how to handle it; the identity service should. Then, the old cert shouldn't be rotated and the identity service shouldn't go down. As to how we notify the users in addition to generate logs, we can consider publishing k8s events. I don't know if we need to tackle this here, but controller/proxy-injector/webhook.go has some good examples on recording k8s events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all great points. Thanks!

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far! Some high-ish level questions:

  • It would be really great if we did not have to persist this flag value. If think if we check for both file names, as in my comment above, we will no longer need to persist it.
  • Why does the flag need to apply to install config? as far as I can tell, it affects control-plane concerns only, not cluster concerns. Can it just be a install control-plane flag?
  • If I understand correctly, to use an external issuer, you MUST do a multistage install:
    • linkerd install config (to create the linkerd namesapce)
    • external issuer create secret in linkerd namespace
    • linkerd install control-plane reads the issuer secret
      Does that sound right?

Let me know if I've misunderstood anything.

@@ -4,7 +4,7 @@
###
### Identity Controller Service
###
{{ if .Identity.Issuer -}}
{{ if and (.Identity.Issuer) (eq .Identity.Issuer.External false) -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still learning about helm template functions... could this be not .Identity.Issuer.External instead?

@@ -259,6 +262,13 @@ resources for the Linkerd control plane. This command should be followed by
os.Exit(1)
}

shouldExist := options.identityOptions.identityExternalIssuer
if !options.ignoreCluster && shouldExist {
if err := errIfLinkerdIdentityIssuerSecretMissing(); err != nil && !options.ignoreCluster {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think options.ignoreCluster shouldn't be changed at this point, and it was just checked on line 266

@@ -187,7 +187,7 @@ func TestRender(t *testing.T) {
}
}

func testInstallOptions() (*installOptions, error) {
func testInstallOptions(addCertDataOptions bool) (*installOptions, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about always including the certDataOptions but manually unsetting fields in the returned options when necessary?

@@ -369,7 +370,15 @@ func fetchIdentityValues(k kubernetes.Interface, idctx *pb.IdentityContext) (*ch
}, nil
}

func fetchIssuer(k kubernetes.Interface, trustPEM string) (string, string, time.Time, error) {
func fetchIssuer(k kubernetes.Interface, trustPEM string, externalIssuer bool) (string, string, time.Time, error) {
crtName := k8s.IdentityIssuerCrtName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than passing a boolean in, can we just check for both filenames?

@olix0r
Copy link
Member

olix0r commented Oct 16, 2019

@adleong

It would be really great if we did not have to persist this flag value. If think if we check for both file names, as in my comment above, we will no longer need to persist it.

Can you elaborate on this?

To my reasoning: if it's specified on the commandline, it should be persisted.

@ihcsim
Copy link
Contributor

ihcsim commented Oct 16, 2019

you MUST do a multistage install:

Not necessarily. Some users already create their own linkerd namespace outside of linkerd install. (Hence, we have the {{.InstallNamespace}} variable in the values.yaml.)

linkerd install config (to create the linkerd namesapce)
external issuer create secret in linkerd namespace
linkerd install control-plane reads the issuer secret
Does that sound right?

Yes.

Why does the flag need to apply to install config?

Yes, you are right. Looking closer at the Helm templates, the flag isn't used until the linkerd install control-plane phase. cc @zaharidichev

@adleong
Copy link
Member

adleong commented Oct 17, 2019

Not necessarily.

Ah, let me rephrase: simply doing a single step install with linkerd install --identity-external-issue=true will not work.

Can you elaborate on this?

As far as I can tell, the identity-external-issuer enables the following behavior:

  • The issuer secret is NOT created
  • The trust root is copied from the issuer secret to the Linkerd config

These are both install behaviors and the identity-external-issuer flag doesn't change any behaviors after install. Thus there is no reason to persist it. This is great because it creates a very clear loosely coupled boundary between the external issuer and Linkerd.

The exception to this is that the names of the cert and key files that cert-manager creates are different from the ones that Linkerd creates. Rather than using the identity-external-issuer to determine which file name to read, we could check both filenames.

@ihcsim
Copy link
Contributor

ihcsim commented Oct 17, 2019

Thus there is no reason to persist it.

My thought around this is that, from a support perspective, there are times when I'll like to know how the control plane is installed, along with the install options used. With this flag persisted, I can tell for sure at one point in an older release, the user had decided to use cert-manager, which will be helpful with certs-related issues, as per #3578 (comment), it sounds like cert-manager has some sort of revert/reconciliation mechanism, if the user tries to override the managed cert.

@zaharidichev
Copy link
Member Author

@ihcsim I see. I thinkg @adleong had some very good points around the lack of need for persisting that as we can deduce even during upgrade whether cert-manager was used by simply looking at the contents of the secret. That being said though, you comments also make sense. So should we persist it after all or not. I guess it does not hurt. Should I modify it to include it in the recordableFlagSet ?

@olix0r
Copy link
Member

olix0r commented Oct 17, 2019

@adleong ok, i see your point. I'm not 100% sold that it's worth the optimization. I don't see any harm in making the persisted flags explicitly reflect the install mechanism (i.e. for diagnostics, etc). Is there a reason that persisting the flag is harmful?

@adleong
Copy link
Member

adleong commented Oct 17, 2019

Persisting the flag is not harmful but I think the abstraction boundary is more clear if we don't. i.e. Linkerd simply acts on the issuer secret without knowing how it was created.

And as @zaharidichev said, we can reverse engineer if cert-manager was used by looking at the file names.

@zaharidichev
Copy link
Member Author

@adleong @ihcsim added some more changes along the comments you have. Currently I do not persist the flag. Let me know if we want to change that. Also shuffled some things around in the identitiy service. Btw, why is my CI not running on this. Is it because the PR has conflicts ?

@adleong
Copy link
Member

adleong commented Oct 17, 2019

why is my CI not running on this.

I... have no idea. Try pushing a new commit?

@siggy
Copy link
Member

siggy commented Oct 17, 2019

@zaharidichev yup it's the conflicts. ci merges your PR into master for testing.

@ihcsim
Copy link
Contributor

ihcsim commented Oct 18, 2019

@zaharidichev Let's not worry about persisting the flag for now. I am more keen on getting this tested, and merged so that users can try it out in the next edge. If we see the needs to add it in the future, we just need to change its flagset to recordableFlagSet (because we don't need it for linkerd install config anymore).

Btw, why is my CI not running on this

In addition to the conflicts, as a reminder, because your branch is on a fork, certain CI steps (like Docker build and integration tests) won't run even though GitHub Action still gives you the ✔️.

@@ -259,6 +261,7 @@ func (options *upgradeOptions) validateAndBuild(stage string, k kubernetes.Inter
return nil, nil, fmt.Errorf("could not build install configuration: %s", err)
}
values.Identity = identity
values.InstallNamespace = !identity.Issuer.External
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? My understanding of our earlier conversation around omitting the persistence of the --identity-external-issuer flag is that its value isn't needed during upgrade. If during upgrade, the code needs to make decisions based on whether an issuer is external, then I don't think what the isExternal() function below is doing is reliable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ihcsim I see. I think @adleong had some very good points around the lack of need for persisting that as we can deduce even during upgrade whether cert-manager was used by simply looking at the contents of the secret.

Well if this is not set, then an attempt to include the namespace in the produced output will be made, right? Just like I noted here, this can be deduced without persisting. If thats unreliable, lets persist it then. The isExternal() function is doing pretty much the same thing that the identity service is doing when looking at the contents of the secret.

I dont have the full context around the potential use and edge cases here, so its quite difficult for me to decide for sure. But why would it be unreliable? If the secret contains contents from cert manager and the flag is set to not use external issuer, it will not work at all after all. Again, up to you what we do here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if this is not set, then an attempt to include the namespace in the produced output will be made, right?

My thought is that kubectl apply will leave it alone, unless there are some differences. But yes, if the users created their own linkerd namespace, then linkerd upgrade should leave it alone, like what you did here.

lets persist it then.

Sure. Thanks. Sorry for the back-and-forth.

Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
@zaharidichev
Copy link
Member Author

closing in favor of #3600

@zaharidichev zaharidichev removed this from PR in Stable Track Oct 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Stable Track
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

6 participants