-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of error handling #33
Conversation
In the case of an error, I switched from returning two lists of strings representing errors and warnings, to a single list of "Problems" (I guess the name could be something different, but I don't think it's worth debating too intensely). Each "problem" is the combination of an error severity (ranging from the unfixable, to a simple suggestion), a message, an optional remediation, and an optional location/coordinate to the place in the config where this error was encountered. Right now it's mostly validating simple, generic properties of the config/request (like was this requested account defined exactly once), but once I plug in the rest of my validators it will return the full list of problems they encounter when validating your config and any changes you've made to it.
I've added on a second commit that significantly simplifies the validators. I'm dropping the annotation based approach in favor of simply having each class define how to validate itself. |
I added one more commit to take the kubernetes validation logic I tore out and reinstated it (with some extra checks) using the refactored validators, as well as expose this functionality in an endpoint. |
problemLocation = "Global:"; | ||
} | ||
|
||
if (severity == HalconfigProblem.Severity.FATAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a switch
on severity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't do that in java world
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice, intellij was complaining about non-constant values in a case statement which from a compiler standpoint didn't make sense (enum values should be determined statically). Let me switch this around
@@ -49,7 +49,7 @@ public static void failure(String message) { | |||
} | |||
|
|||
public static void remediation(String message) { | |||
AnsiMessage prefix = new AnsiMessage("? ") | |||
AnsiMessage prefix = new AnsiMessage("\t? ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why \t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make clear that this recommendation is for the above error/warning
public class HalconfigCoordinates { | ||
String deployment; | ||
String provider; | ||
String webhook; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which webhook is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any webhook
} | ||
|
||
String output = res.toString(); | ||
// cut of trailing period |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/of/off/
|
||
@Override | ||
public String toString() { | ||
StringBuilder res = new StringBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not build a list of these ("deployment.${deployment}", etc) and then do list.join(".")
to handle the period interspersing. This way seems fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this just for rendering or will this actually be consumed by something? If the latter, what's wrong with json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for rendering only, I can switch to a list.join, but don't really see the need for 4 elements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you ever add anything though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change the coordinate layout a bigger refactoring is in order. I personally prefer the way this reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
|
||
/** | ||
* This is reserved for Halyard configs that fall between unparseable (not valid yaml), and incorrectly configured | ||
* (provider-specific error). Essentially, when a config has problems that prevent halyard from validating it, although | ||
* it is readable by our yaml parser into the halfconfig Object, this is thrown | ||
* it is readable by our yaml parser into the hafconfig Object, this is thrown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/hafconfig/halconfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha, I miscorrected my typo
@@ -47,4 +55,112 @@ public String getKubeconfigFile() { | |||
return kubeconfigFile; | |||
} | |||
} | |||
|
|||
public List<HalconfigProblem> validateKubeconfig(DeploymentConfiguration deployment, HalconfigCoordinates coordinates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, good catch
return result; | ||
} | ||
|
||
public List<HalconfigProblem> validateDockerRegistries(DeploymentConfiguration deployment, HalconfigCoordinates coordinates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here with public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. changes to the docker registries outside this account mean the kubernetes account registries need to be validated without triggering live calls to the kubernetes cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after the switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt get through all the details but here's some initial high level feedback before it is too late.
* | ||
* Warning: If any of the fields in a class implement Validatable, do not call validate on them!! This will mean even the simplest | ||
* config changes could cause the full halconfig for every deployment to be validated (very costly, many network calls). It is up | ||
* to whatever is performing a field update to decide what to validate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand this comment yet.
It sounds like you have a framework where this interface is only intended as a callback for the framework that is providing the control flow of when to call it.
It looks like this method is package protected. I suppose that is related.
If you want to discourage it from being called, make the return result a parameter where the method adds the problems into it. Then make that object hard to construct (e.g. package private or an abstract class that the controller derives into a private concrete class). I guess you still have a delegation problem if that was your concern. You might be able to add some sensor into that aggregator parameter to detect that case.
I see you have a "ProblemSet". Why not pass that in?
You could pass in either a new set for each all, or a running set across all callers.
Or at lesat return a ProblemSet. You could define a static final NO_PROBLEMS set, though it would need to be immutable.
After poking around this PR more I'm thinking the following:
At the end of the day you have a report with the different entities and their problems.
Somehow an entity is tied to the coordinates. So rather than passing in coordinates and getting out problems (which the callee constructs by passin gin the coordinates), why not pass in an object that already associates the coordinates and problems. If the callee needs to know their coordinates, they can get it from this.
That object acts as a problem factory. They use it to create a new problem (which the factory constructs with the right coordinates) and adds the problem to the list it is returning. The factory can return problem builders so you can use the builder pattern. The actual problems can be instantiated on demand when the caller retrieves them from this factory object.
I think that could be easier to use and take out some of the boilerplate code you'd need inside a validator. Because the problems are added immediately, you cannot speculatively create one (unless you add some kind of discard method) but I dont think that would be happening in practice anyway.
As for return result, it could be boolean, but the caller probably just cares whether there are any problems or not, which that factory thing keeping the list would know automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I like the idea of the problem constructor, I'll add that in.
As far as restricting calling "validate" recursively, I'll look into how I can make that happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the caller into the factory identifies themself ("this") when creating a problem then you could detect that someone you didnt call had called back into you. Or if you do want some kind of delegation, you can detect repeated problems. It could be helpful to have a call into the factory that confirms no problems are found as a means to audit coverage over all the configuration. That call too would act as a control point to detect duplicate calls if in fact that is something you want to protect against or debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only mentioning the detection thing since you seem concerned about that situation. I do think the auditability of things that are ok to ensure coverage would be desirable regardless.
private String passwordFile; | ||
private String email; | ||
private List<String> repositories = new ArrayList<>(); | ||
|
||
public List<HalconfigProblem> validate(Halconfig context, HalconfigCoordinates coordinates) { | ||
return new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An in/out-param of the HalconfigProblemSet that contains the problem list would eliminate the need to do this. If you wanted you could have a method that confirms there are no problems (to distinguish from the config thing being overlooked or left out for some reason)
@ewiseblatt I followed your advice and created a |
@Setter(AccessLevel.PUBLIC) | ||
private HalconfigCoordinates coordinates; | ||
|
||
public void addProblem(Severity severity, String message, String remediation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though this would return builders.
ProblemBuilder newProblem() {
ProblemBuilder builder = new ProblemBuilder
problems.append(builder)
return builder
}
problemSetBuilder.newProblem()
.set_severity(INFO)
.set_message("You are hosed")
.set_remediation("Try again")
You could have newProblem take a severity since it is required.
And maybe too a message if you'd know that exactly up front.
That gets rid of the different variations of addProblem.
This builder now contains a list of builders rather than problems.
You are just deferring the work you did here anyway.
public HalconfigProblemSet build() {
problemSet = new ProblemSet()
for (ProblemSetBuilder builder : problems) {
problemSet.add(builder.build())
}
return problemSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch it over - but I'm not sure what this buys me
Addressed |
DeploymentConfiguration deployment = deploymentService.getDeploymentConfiguration(coordinates); | ||
HalconfigProblemSetBuilder builder = new HalconfigProblemSetBuilder().setCoordinates(coordinates); | ||
|
||
validatable.validate(builder, deployment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awkward because one validate takes a builder and does not throw where this other one takes a validatable and throws exceptions.
I think you want different terminology at the minimum.
I'm not sure about this builder passed in as being a builder as opposed to a builder factory. When you set the coordinates on it above, are you expecting that the caller did not change them? Or are these mutated as the caller propagates down the tree?
Should the deployment be an attribute of the builder?
It isnt clear from the service how this is used. It seems terminal but I'd expect it to flow down in some kind of traversal. Maybe your comment earlier about calling validate was saying not to do that. I still dont understand how this is used for composite configs unless the parts take the identity of their container.
I cannot find the "not sure what it buys me" comment to respond to it. On Fri, Nov 18, 2016 at 4:03 PM, Lars Wander notifications@github.com
|
For posterity about the above comment: I was hoping you could explain the motivation/benefit of your proposed changes, which you described offline + some other ideas on how to make this easier to extend. I'll work on implementing those this weekend/next week. |
@ewiseblatt, I pulled the validation into a vistor that each yaml node accepts. Nodes to visit are decided by an iterator that returns nodes yaml nodes matching a given filter. The core validation logic is now very small: b5822cc#diff-af46e1bd0bad8e97c8e74a98cb583073R35 As far what new nodes will have to implement goes: It'll be a This isn't done yet (still need to to plug the last bits in), and I'll have it finished tomorrow. |
Ok, I think the refactor is done (pending feedback of course) The idea is now that every node in the yaml tree extends The core mechanism to keep in mind in the Next we have the Lastly, in the services package, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seems generally ok. I assume the omission of services is just a scoping constraint for now.
A bunch of comments and thoughts. Minor mistakes here and there. We can let it evolve.
The one last thing you should pay attention to is how you make recommendations. These are going to be tedious, littered throughout the code, and produced by multiple authors. You should write up a policy and recommendations/guidelines for authors so that you have consistency and maximize intent. This is going to evolve over time as you get more experience, hind sight, and encounter new use cases that stretch your thinking. Ultimately you'll have to go back through everything and clean it up to adhere to those guidelines. But start thinking about it now because it looks like it might be one of the harder things to maintain and is a significant sensitivity point for the overall quality and value since those recommendations are significant component to the overall value you provide.
String message = problem.getMessage(); | ||
String remediation = problem.getRemediation(); | ||
|
||
if (reference != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add a getReferenceTitle() or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking
switch(severity) { | ||
case FATAL: | ||
case ERROR: | ||
AnsiUi.error(problemLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why two calls here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally just like the way it looks
! In kubernetes provider:
! Context invalid
? Make sure the context exists
|
||
@Override | ||
public NodeIterator getChildren() { | ||
return NodeIteratorFactory.getListIterator(deploymentConfigurations.stream().map(d -> (Node) d).collect(Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did deploymentConfigurations come from?
I'm interested in knowing if this is what you currently have, or if this is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here: https://github.com/spinnaker/halyard/pull/33/files#diff-d72f4274fc3a2430fbf88f09602effb4R55
It's complete. The value is filled in by the YAML parser.
import lombok.Getter; | ||
|
||
/** | ||
* The "Node" class represents effectively a YAML node in our config hierarchy that can be validated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/effectively//
protected Node parent = null; | ||
|
||
@JsonIgnore | ||
public void parentify() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd. Why would the children not have a parent already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the snakeyaml parser alone doesn't add parents, so I have my "special" yaml parser call this before returning your halconfig object.
@@ -42,21 +48,32 @@ public Provider getProvider(String deploymentName, String providerName) { | |||
provider = providers.getDockerRegistry(); | |||
} else if (providerName.toLowerCase().equals("google")) { | |||
provider = providers.getGoogle(); | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like this enumeration here. I let it go in node/Providers.java. Could you pick one or the other so there is only one place to modify when adding a new provider?
For example here you could
provider = providers.find(providerName)
And add the above into the node/Providers. Then this class doesnt care who the providers are and doesnt need to be modified when adding a new provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is leaving once the NodeFilter
is used to do the lookup here - I agree it's messy especially when I have the more powerful filters to rely on.
log.warn("Failed to invoke validate() on " + validator.getClass() + " for node " + c, e); | ||
} | ||
|
||
return runMatchingValidators(psBuilder, validator, node, c.getSuperclass()) + result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird because you are calling validate on each of the classes in the node's hieararchy. Why wouldnt the specialized validate already accomodate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way we avoid duplicating validation between subclasses - like with the given example for Account
vs. Kubernetes Account
. The first validator checks only the name, and the second checks all the k8s specific fields.
File kubeconfigFileOpen = new File(kubeconfigFile); | ||
kubeconfig = KubeConfigUtils.parseConfig(kubeconfigFileOpen); | ||
} catch (IOException e) { | ||
psBuilder.addProblem(ERROR, e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the problem were there is no kubeconfig file, would you want to instruct how to create one (or point to instructions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't worked out the cleanest way to do that yet, and want to get some experience with the different modes of failure here first before making recommendations. I'll drop in a TODO
client.namespaces().list(); | ||
} catch (Exception e) { | ||
psBuilder.addProblem(ERROR, "Unable to communicate with your Kubernetes cluster: " + e.getMessage()) | ||
.setRemediation("Verify that these credentials work manually using \"kubectl\""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"these credentials" which are "these" or is that clear from the reference reported with the remediation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll make it clearer
DockerRegistryProvider dockerRegistryProvider = deployment.getProviders().getDockerRegistry(); | ||
if (dockerRegistryProvider == null || dockerRegistryProvider.getAccounts() == null || dockerRegistryProvider.getAccounts().isEmpty()) { | ||
psBuilder.addProblem(ERROR, "The docker registry provider has not yet been configured for this deployment") | ||
.setRemediation("Kubernetes needs a Docker Registry as an image source to run"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a useful remediation. Perhaps point to the spinnaker.io documentation for adding docker registries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it's best to hold off on that, since ideally the registry should be added with halyard, but there is no documentation for that. I'll add a todo.
In the case of an error, I switched from returning two lists of strings
representing errors and warnings, to a single list of "Problems" (I
guess the name could be something different, but I don't think it's worth
debating too intensely). Each "problem" is the combination of an error severity
(ranging from the unfixable, to a simple suggestion), a message, an optional
remediation, and an optional location/coordinate to the place in the config
where this error was encountered. Right now it's mostly validating simple,
generic properties of the config/request (like was this requested account
defined exactly once), but once I plug in the rest of my validators it will
return the full list of problems they encounter when validating your config and
any changes you've made to it.
@ewiseblatt @duftler @ttomsu @jtk54 PTAL