Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of error handling #33

Merged
merged 6 commits into from
Nov 21, 2016
Merged

Refactor of error handling #33

merged 6 commits into from
Nov 21, 2016

Conversation

lwander
Copy link
Member

@lwander lwander commented Nov 17, 2016

In the case of an error, I switched from returning two lists of strings
representing errors and warnings, to a single list of "Problems" (I
guess the name could be something different, but I don't think it's worth
debating too intensely). Each "problem" is the combination of an error severity
(ranging from the unfixable, to a simple suggestion), a message, an optional
remediation, and an optional location/coordinate to the place in the config
where this error was encountered. Right now it's mostly validating simple,
generic properties of the config/request (like was this requested account
defined exactly once), but once I plug in the rest of my validators it will
return the full list of problems they encounter when validating your config and
any changes you've made to it.

@ewiseblatt @duftler @ttomsu @jtk54 PTAL

Lars Wander added 2 commits November 17, 2016 16:17
In the case of an error, I switched from returning two lists of strings
representing errors and warnings, to a single list of "Problems" (I
guess the name could be something different, but I don't think it's worth
debating too intensely). Each "problem" is the combination of an error severity
(ranging from the unfixable, to a simple suggestion), a message, an optional
remediation, and an optional location/coordinate to the place in the config
where this error was encountered. Right now it's mostly validating simple,
generic properties of the config/request (like was this requested account
defined exactly once), but once I plug in the rest of my validators it will
return the full list of problems they encounter when validating your config and
any changes you've made to it.
@lwander
Copy link
Member Author

lwander commented Nov 18, 2016

I've added on a second commit that significantly simplifies the validators. I'm dropping the annotation based approach in favor of simply having each class define how to validate itself.

@lwander
Copy link
Member Author

lwander commented Nov 18, 2016

I added one more commit to take the kubernetes validation logic I tore out and reinstated it (with some extra checks) using the refactored validators, as well as expose this functionality in an endpoint.

@lwander
Copy link
Member Author

lwander commented Nov 18, 2016

problemLocation = "Global:";
}

if (severity == HalconfigProblem.Severity.FATAL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a switch on severity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do that in java world

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, intellij was complaining about non-constant values in a case statement which from a compiler standpoint didn't make sense (enum values should be determined statically). Let me switch this around

@@ -49,7 +49,7 @@ public static void failure(String message) {
}

public static void remediation(String message) {
AnsiMessage prefix = new AnsiMessage("? ")
AnsiMessage prefix = new AnsiMessage("\t? ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why \t?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make clear that this recommendation is for the above error/warning

public class HalconfigCoordinates {
String deployment;
String provider;
String webhook;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which webhook is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any webhook

}

String output = res.toString();
// cut of trailing period
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/of/off/


@Override
public String toString() {
StringBuilder res = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not build a list of these ("deployment.${deployment}", etc) and then do list.join(".") to handle the period interspersing. This way seems fragile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this just for rendering or will this actually be consumed by something? If the latter, what's wrong with json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for rendering only, I can switch to a list.join, but don't really see the need for 4 elements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ever add anything though...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change the coordinate layout a bigger refactoring is in order. I personally prefer the way this reads

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


/**
* This is reserved for Halyard configs that fall between unparseable (not valid yaml), and incorrectly configured
* (provider-specific error). Essentially, when a config has problems that prevent halyard from validating it, although
* it is readable by our yaml parser into the halfconfig Object, this is thrown
* it is readable by our yaml parser into the hafconfig Object, this is thrown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/hafconfig/halconfig

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, I miscorrected my typo

@@ -47,4 +55,112 @@ public String getKubeconfigFile() {
return kubeconfigFile;
}
}

public List<HalconfigProblem> validateKubeconfig(DeploymentConfiguration deployment, HalconfigCoordinates coordinates) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, good catch

return result;
}

public List<HalconfigProblem> validateDockerRegistries(DeploymentConfiguration deployment, HalconfigCoordinates coordinates) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with public.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. changes to the docker registries outside this account mean the kubernetes account registries need to be validated without triggering live calls to the kubernetes cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Contributor

@jtk54 jtk54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the switch.

Copy link

@ewiseblatt ewiseblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt get through all the details but here's some initial high level feedback before it is too late.

*
* Warning: If any of the fields in a class implement Validatable, do not call validate on them!! This will mean even the simplest
* config changes could cause the full halconfig for every deployment to be validated (very costly, many network calls). It is up
* to whatever is performing a field update to decide what to validate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand this comment yet.
It sounds like you have a framework where this interface is only intended as a callback for the framework that is providing the control flow of when to call it.

It looks like this method is package protected. I suppose that is related.

If you want to discourage it from being called, make the return result a parameter where the method adds the problems into it. Then make that object hard to construct (e.g. package private or an abstract class that the controller derives into a private concrete class). I guess you still have a delegation problem if that was your concern. You might be able to add some sensor into that aggregator parameter to detect that case.

I see you have a "ProblemSet". Why not pass that in?
You could pass in either a new set for each all, or a running set across all callers.
Or at lesat return a ProblemSet. You could define a static final NO_PROBLEMS set, though it would need to be immutable.

After poking around this PR more I'm thinking the following:
At the end of the day you have a report with the different entities and their problems.
Somehow an entity is tied to the coordinates. So rather than passing in coordinates and getting out problems (which the callee constructs by passin gin the coordinates), why not pass in an object that already associates the coordinates and problems. If the callee needs to know their coordinates, they can get it from this.

That object acts as a problem factory. They use it to create a new problem (which the factory constructs with the right coordinates) and adds the problem to the list it is returning. The factory can return problem builders so you can use the builder pattern. The actual problems can be instantiated on demand when the caller retrieves them from this factory object.

I think that could be easier to use and take out some of the boilerplate code you'd need inside a validator. Because the problems are added immediately, you cannot speculatively create one (unless you add some kind of discard method) but I dont think that would be happening in practice anyway.

As for return result, it could be boolean, but the caller probably just cares whether there are any problems or not, which that factory thing keeping the list would know automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I like the idea of the problem constructor, I'll add that in.

As far as restricting calling "validate" recursively, I'll look into how I can make that happen.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller into the factory identifies themself ("this") when creating a problem then you could detect that someone you didnt call had called back into you. Or if you do want some kind of delegation, you can detect repeated problems. It could be helpful to have a call into the factory that confirms no problems are found as a means to audit coverage over all the configuration. That call too would act as a control point to detect duplicate calls if in fact that is something you want to protect against or debug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only mentioning the detection thing since you seem concerned about that situation. I do think the auditability of things that are ok to ensure coverage would be desirable regardless.

private String passwordFile;
private String email;
private List<String> repositories = new ArrayList<>();

public List<HalconfigProblem> validate(Halconfig context, HalconfigCoordinates coordinates) {
return new ArrayList<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An in/out-param of the HalconfigProblemSet that contains the problem list would eliminate the need to do this. If you wanted you could have a method that confirms there are no problems (to distinguish from the config thing being overlooked or left out for some reason)

@lwander
Copy link
Member Author

lwander commented Nov 18, 2016

@ewiseblatt I followed your advice and created a HalconfigProblemSetBuilder, which is passed into the validators and modified by them as they see fit. I don't see a nice way preventing Validators from recursively calling themselves just yet, so I think I'll save that for a later PR. Any objections to merging?

@Setter(AccessLevel.PUBLIC)
private HalconfigCoordinates coordinates;

public void addProblem(Severity severity, String message, String remediation) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though this would return builders.
ProblemBuilder newProblem() {
ProblemBuilder builder = new ProblemBuilder
problems.append(builder)
return builder
}

problemSetBuilder.newProblem()
.set_severity(INFO)
.set_message("You are hosed")
.set_remediation("Try again")

You could have newProblem take a severity since it is required.
And maybe too a message if you'd know that exactly up front.
That gets rid of the different variations of addProblem.

This builder now contains a list of builders rather than problems.
You are just deferring the work you did here anyway.

public HalconfigProblemSet build() {
problemSet = new ProblemSet()
for (ProblemSetBuilder builder : problems) {
problemSet.add(builder.build())
}
return problemSet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch it over - but I'm not sure what this buys me

@lwander
Copy link
Member Author

lwander commented Nov 18, 2016

Addressed

DeploymentConfiguration deployment = deploymentService.getDeploymentConfiguration(coordinates);
HalconfigProblemSetBuilder builder = new HalconfigProblemSetBuilder().setCoordinates(coordinates);

validatable.validate(builder, deployment);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awkward because one validate takes a builder and does not throw where this other one takes a validatable and throws exceptions.
I think you want different terminology at the minimum.

I'm not sure about this builder passed in as being a builder as opposed to a builder factory. When you set the coordinates on it above, are you expecting that the caller did not change them? Or are these mutated as the caller propagates down the tree?

Should the deployment be an attribute of the builder?
It isnt clear from the service how this is used. It seems terminal but I'd expect it to flow down in some kind of traversal. Maybe your comment earlier about calling validate was saying not to do that. I still dont understand how this is used for composite configs unless the parts take the identity of their container.

@ewiseblatt
Copy link

I cannot find the "not sure what it buys me" comment to respond to it.
If you dont know, then dont listen. Maybe I'm wrong or I can explain to
clarify.
From our side converstation it sounds like I have different expectations of
how this is going to be consumed.

On Fri, Nov 18, 2016 at 4:03 PM, Lars Wander notifications@github.com
wrote:

Addressed


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AEKkGL8gCYhgJtvS2Mycf6EhTzYRPX6Nks5q_hK0gaJpZM4K1017
.

@lwander
Copy link
Member Author

lwander commented Nov 18, 2016

For posterity about the above comment: I was hoping you could explain the motivation/benefit of your proposed changes, which you described offline + some other ideas on how to make this easier to extend. I'll work on implementing those this weekend/next week.

@lwander
Copy link
Member Author

lwander commented Nov 19, 2016

@ewiseblatt, I pulled the validation into a vistor that each yaml node accepts. Nodes to visit are decided by an iterator that returns nodes yaml nodes matching a given filter. The core validation logic is now very small: b5822cc#diff-af46e1bd0bad8e97c8e74a98cb583073R35

As far what new nodes will have to implement goes: It'll be a validateClassName(ClassName c) for each node added, as well as some information about the type of the node for the sake of filtering.

This isn't done yet (still need to to plug the last bits in), and I'll have it finished tomorrow.

@lwander
Copy link
Member Author

lwander commented Nov 20, 2016

Ok, I think the refactor is done (pending feedback of course)

The idea is now that every node in the yaml tree extends Node, which provides a few bits of utility that need to be implemented for each new type Node (i.e. when we add support for Fiat config, but not when target/veritas add support for configuring opentstack). Node requires us to describe how to: get children, get the parent, accept a validator, and say if it matches a NodeFilter or not. Most of the logic for these things have already been implemented in either the abstract Node class, or in related utility classes (see NodeIteratorFactory), so extending this isn't much work for new types Nodes. If instead you are adding a new account/provider/webhook/etc... you just need to extend the right superclass (Account/Provider/Webhook/etc...) and provide whatever fields should show up in the yaml.

The core mechanism to keep in mind in the Node package is the NodeIterator, which allows us to walk over each node in the yaml tree, or each node that matches an input NodeFilter. This is important during validation.

Next we have the Validator, which is simply a visitor that takes a ProblemSetFactory (to keep track of all the errors/warnings encountered during validation), and a Node. Inside the validate package, we have the custom validators (see AccountValidator for a small example), and the ValidatorCollector, which through a Spring annotation collects every class that implements Validator. Using reflection, the ValidatorCollector takes a node, and scans each Validator to see if it can validate the given Node or not.

Lastly, in the services package, the ValidatorService takes a NodeFilter, and then iterates over every node in the current halconfig that matches the filter, and runs the ValidatorCollector on it. Once complete. Incoming requests just format their NodeFilter to match what they way, and then have the ValidatorService do the rest.

Copy link

@ewiseblatt ewiseblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seems generally ok. I assume the omission of services is just a scoping constraint for now.

A bunch of comments and thoughts. Minor mistakes here and there. We can let it evolve.

The one last thing you should pay attention to is how you make recommendations. These are going to be tedious, littered throughout the code, and produced by multiple authors. You should write up a policy and recommendations/guidelines for authors so that you have consistency and maximize intent. This is going to evolve over time as you get more experience, hind sight, and encounter new use cases that stretch your thinking. Ultimately you'll have to go back through everything and clean it up to adhere to those guidelines. But start thinking about it now because it looks like it might be one of the harder things to maintain and is a significant sensitivity point for the overall quality and value since those recommendations are significant component to the overall value you provide.

String message = problem.getMessage();
String remediation = problem.getRemediation();

if (reference != null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add a getReferenceTitle() or something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking

switch(severity) {
case FATAL:
case ERROR:
AnsiUi.error(problemLocation);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why two calls here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally just like the way it looks

! In kubernetes provider:
! Context invalid
  ? Make sure the context exists


@Override
public NodeIterator getChildren() {
return NodeIteratorFactory.getListIterator(deploymentConfigurations.stream().map(d -> (Node) d).collect(Collectors.toList()));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did deploymentConfigurations come from?

I'm interested in knowing if this is what you currently have, or if this is complete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here: https://github.com/spinnaker/halyard/pull/33/files#diff-d72f4274fc3a2430fbf88f09602effb4R55

It's complete. The value is filled in by the YAML parser.

import lombok.Getter;

/**
* The "Node" class represents effectively a YAML node in our config hierarchy that can be validated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/effectively//

protected Node parent = null;

@JsonIgnore
public void parentify() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd. Why would the children not have a parent already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the snakeyaml parser alone doesn't add parents, so I have my "special" yaml parser call this before returning your halconfig object.

@@ -42,21 +48,32 @@ public Provider getProvider(String deploymentName, String providerName) {
provider = providers.getDockerRegistry();
} else if (providerName.toLowerCase().equals("google")) {
provider = providers.getGoogle();
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like this enumeration here. I let it go in node/Providers.java. Could you pick one or the other so there is only one place to modify when adding a new provider?

For example here you could
provider = providers.find(providerName)
And add the above into the node/Providers. Then this class doesnt care who the providers are and doesnt need to be modified when adding a new provider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leaving once the NodeFilter is used to do the lookup here - I agree it's messy especially when I have the more powerful filters to rely on.

log.warn("Failed to invoke validate() on " + validator.getClass() + " for node " + c, e);
}

return runMatchingValidators(psBuilder, validator, node, c.getSuperclass()) + result;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird because you are calling validate on each of the classes in the node's hieararchy. Why wouldnt the specialized validate already accomodate this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we avoid duplicating validation between subclasses - like with the given example for Account vs. Kubernetes Account. The first validator checks only the name, and the second checks all the k8s specific fields.

File kubeconfigFileOpen = new File(kubeconfigFile);
kubeconfig = KubeConfigUtils.parseConfig(kubeconfigFileOpen);
} catch (IOException e) {
psBuilder.addProblem(ERROR, e.getMessage());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the problem were there is no kubeconfig file, would you want to instruct how to create one (or point to instructions)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't worked out the cleanest way to do that yet, and want to get some experience with the different modes of failure here first before making recommendations. I'll drop in a TODO

client.namespaces().list();
} catch (Exception e) {
psBuilder.addProblem(ERROR, "Unable to communicate with your Kubernetes cluster: " + e.getMessage())
.setRemediation("Verify that these credentials work manually using \"kubectl\"");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"these credentials" which are "these" or is that clear from the reference reported with the remediation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll make it clearer

DockerRegistryProvider dockerRegistryProvider = deployment.getProviders().getDockerRegistry();
if (dockerRegistryProvider == null || dockerRegistryProvider.getAccounts() == null || dockerRegistryProvider.getAccounts().isEmpty()) {
psBuilder.addProblem(ERROR, "The docker registry provider has not yet been configured for this deployment")
.setRemediation("Kubernetes needs a Docker Registry as an image source to run");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a useful remediation. Perhaps point to the spinnaker.io documentation for adding docker registries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it's best to hold off on that, since ideally the registry should be added with halyard, but there is no documentation for that. I'll add a todo.

@lwander lwander merged commit 63f29fd into spinnaker:master Nov 21, 2016
@lwander lwander deleted the error-refactor branch November 21, 2016 16:37
costimuraru pushed a commit to costimuraru/halyard that referenced this pull request Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants