-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(kubernetes): Improved error messages on failing deployments #3442
Conversation
cc @ezimanyi |
…iver into ENG-3567/k8s-error-msg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR---agreed that the error message is not too helpful and we should improve it.
I'm not sure that we need to push the error message this far down in the data model, however, and store a custom-generated string for each omitted kind. Some alternatives I can think of:
- Make the error message better, but still generic...ex: "Could not deploy kind 'KindName'. Please check that it is a valid Kubernetes kind, is configured for your account, and that your account has permission to access this Kind." The disadvantage is that users wouldn't know which particular case applied, but this would certainly be a lot better.
- If we do want to give a more detailed error to the user, I think an enum of reasons for omitting a kind would make more sense than a new string for each one. (And I think it would be enough to group the read errors into a single category and not store the actual error, which will be in the logs and seems better suited for a log file. If it's easier to track, the
kinds
that lead to read errors could go in their own set/array rather than appended toomitKinds
as long as it doesn't change the interface of the class.)
@@ -115,15 +115,16 @@ public boolean getOnlySpinnakerManaged() { | |||
|
|||
private final Path serviceAccountNamespacePath = Paths.get("/var/run/secrets/kubernetes.io/serviceaccount/namespace"); | |||
|
|||
public boolean isValidKind(KubernetesKind kind) { | |||
public Optional<String> checkIfInvalidKind(KubernetesKind kind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract of this function has gotten a lot more complicated, changing from a simple boolean
isValid
to an Optional
that all callers have to unwrap and negate. We should think about how to support the need for additional info in validateKind
without complicating the interface for all other callers. As one possibility, isValidKind
could keep the same contract, but another function getInvalidReason
could be used in the one place where it's needed.
.collect(Collectors.toCollection(ConcurrentHashMap::newKeySet)); | ||
omitKinds.addAll(unreadableKinds); | ||
.filter(k -> !omitKinds.keySet().contains(k)) | ||
.collect(Collectors.toConcurrentMap(k -> k, k -> checkIfKindError(k, checkNamespace))).entrySet().parallelStream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should keep it as a Stream
instead of calling Collect
then immediately converting back to a Stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having problems trying to convert the list to a map without using .collect
, I think using .map
to convert to a Map.Entry
can achieve this.
I think going with the enum of reasons is a good option if we don't want to show the underlying error in UI. Spotting the actual error from the log files has been proven not so easy to track, because by default all kinds are validated when clouddriver starts, so when a "Deploy (Manifest)" stage executes it doesn't actually log the real reason of why it failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I left one suggestion to slightly deduplicate the code, let me know what you think. Thanks!
return String.format(this.errorMessage, kind); | ||
} | ||
} | ||
|
||
public boolean isValidKind(KubernetesKind kind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be refactored now to just return getInvalidKindReason(kind) != null
? Just to make sure the two functions don't diverge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good!
#3442 inverted the check in isValidKind; fix this
When a
Deploy (Manifest)
stage fails because of a misconfigured kubernetes account (like lack of permissions inkubeconfig
file), UI shows a misleading error message about a "notValidKind", where it hides the underlying issue:The code keeps track of kubernetes kinds that it can and cannot access using lists. When clouddriver starts, by default it tries to read all known kinds, and in case of errors, it adds them to the
omitKinds
list.This PR changes the list of
omitKinds
to a map, where the value is a string explaining the reason for being omitted. This string is used to render the final error message to users: