Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(kubernetes): Improved error messages on failing deployments #3442

Merged
merged 13 commits into from
Mar 13, 2019

Conversation

german-muzquiz
Copy link
Contributor

When a Deploy (Manifest) stage fails because of a misconfigured kubernetes account (like lack of permissions in kubeconfig file), UI shows a misleading error message about a "notValidKind", where it hides the underlying issue:

default screen shot 2019-03-08 at 10 49 34 am

The code keeps track of kubernetes kinds that it can and cannot access using lists. When clouddriver starts, by default it tries to read all known kinds, and in case of errors, it adds them to the omitKinds list.

This PR changes the list of omitKinds to a map, where the value is a string explaining the reason for being omitted. This string is used to render the final error message to users:

screen shot 2019-03-08 at 11 38 06 am

screen shot 2019-03-08 at 11 20 51 am

@german-muzquiz
Copy link
Contributor Author

cc @ezimanyi

Copy link
Contributor

@ezimanyi ezimanyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR---agreed that the error message is not too helpful and we should improve it.

I'm not sure that we need to push the error message this far down in the data model, however, and store a custom-generated string for each omitted kind. Some alternatives I can think of:

  • Make the error message better, but still generic...ex: "Could not deploy kind 'KindName'. Please check that it is a valid Kubernetes kind, is configured for your account, and that your account has permission to access this Kind." The disadvantage is that users wouldn't know which particular case applied, but this would certainly be a lot better.
  • If we do want to give a more detailed error to the user, I think an enum of reasons for omitting a kind would make more sense than a new string for each one. (And I think it would be enough to group the read errors into a single category and not store the actual error, which will be in the logs and seems better suited for a log file. If it's easier to track, the kinds that lead to read errors could go in their own set/array rather than appended to omitKinds as long as it doesn't change the interface of the class.)

@@ -115,15 +115,16 @@ public boolean getOnlySpinnakerManaged() {

private final Path serviceAccountNamespacePath = Paths.get("/var/run/secrets/kubernetes.io/serviceaccount/namespace");

public boolean isValidKind(KubernetesKind kind) {
public Optional<String> checkIfInvalidKind(KubernetesKind kind) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract of this function has gotten a lot more complicated, changing from a simple boolean isValid to an Optional that all callers have to unwrap and negate. We should think about how to support the need for additional info in validateKind without complicating the interface for all other callers. As one possibility, isValidKind could keep the same contract, but another function getInvalidReason could be used in the one place where it's needed.

.collect(Collectors.toCollection(ConcurrentHashMap::newKeySet));
omitKinds.addAll(unreadableKinds);
.filter(k -> !omitKinds.keySet().contains(k))
.collect(Collectors.toConcurrentMap(k -> k, k -> checkIfKindError(k, checkNamespace))).entrySet().parallelStream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep it as a Stream instead of calling Collect then immediately converting back to a Stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having problems trying to convert the list to a map without using .collect, I think using .map to convert to a Map.Entry can achieve this.

@german-muzquiz
Copy link
Contributor Author

I think going with the enum of reasons is a good option if we don't want to show the underlying error in UI. Spotting the actual error from the log files has been proven not so easy to track, because by default all kinds are validated when clouddriver starts, so when a "Deploy (Manifest)" stage executes it doesn't actually log the real reason of why it failed.

Copy link
Contributor

@ezimanyi ezimanyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I left one suggestion to slightly deduplicate the code, let me know what you think. Thanks!

return String.format(this.errorMessage, kind);
}
}

public boolean isValidKind(KubernetesKind kind) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be refactored now to just return getInvalidKindReason(kind) != null? Just to make sure the two functions don't diverge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good!

@ezimanyi ezimanyi merged commit f71d181 into spinnaker:master Mar 13, 2019
ezimanyi added a commit that referenced this pull request Mar 13, 2019
#3442 inverted the check in isValidKind; fix
this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants