Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve configuration reloading #1149

Closed
roymiloh opened this Issue Oct 7, 2015 · 10 comments

Comments

Projects
None yet
5 participants
@roymiloh
Copy link
Contributor

roymiloh commented Oct 7, 2015

Improve configuration reloading

Current state of reloadConf lacks two things:

  1. Verbosity. it only returns whether the configuration reloaded successfully or not. If partially succeed, it returns false. It could be helpful for clients.
  2. Open to situations when it partially successes. It could be dangerous.

The proposal

  1. Returning verbose error(s) if reloading configuration failed for some reason.
  2. Reduce these situations as possible by validate first and then (if validations passed) commit it (assignation).

Implementation Details

  1. Returning error of the first error occurred or returning a list of error if deciding to show all errors.
  2. Adding ValidateConfig(conf config.*Config) error method to the interface Reloadable and as mentioned, iterate over all reloadables for validatation purposes, if all passed, iterate again and assign.

    Duplicated code between ValidateConfig and ApplyConfig is reasonable for now, a method to pass data between ValidateConfig and ApplyConfig could be "dirty" (@juliusv suggested to use the config instance for that) but more reliable.

Thanks to @juliusv for helping!

@roymiloh

This comment has been minimized.

Copy link
Contributor Author

roymiloh commented Oct 7, 2015

I made a prototype, I'll make a PR later.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 8, 2015

Sounds good to me for now, thanks! For the first error vs. list of errors, I would just go for the first error for now, but I don't feel strongly.

roymiloh added a commit to roymiloh/prometheus that referenced this issue Oct 8, 2015

More verbose and reliable config reloading.
also reflect changes to reload web handler. related to prometheus#1149
@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 8, 2015

A multi-error error would be helpful in various places. A type alias for []error can implement the error interface, have a meaningful Error() string representation, and be converted to []error if a caller requires so.

We have to be aware that this is no commit/rollback behavior but simply validation. This means we are pushing configuration validation from a central point to the edges of the application. This stands in direct contrast to the central approach promtool takes when validating a configuration and transitively the referenced rule files.

We are still assuming that a valid configuration always succeeds, i.e. there are no runtime errors on "commit". The part missing in our current configuration validation is exactly the (inherently static) rule file parsing. The other methods in the proposal PR are always returning nil to begin with.
So we now have replicated config validation logic, once centralized for promtool, once semi-scattered over multiple components (with the main part remaining centralized).

Extending the central approach in prometheus to be equivalent to the one in promtool allows use to do proper code-reuse and prevent both validation semantics to diverge in the long run. At the same time we avoid complication of an interface.

Thanks a lot @roymiloh for all the effort and thought you put into this! I feel a bit different for the stated reasons though.

My proposal is to remove the bool return value and move the rule file validation to the configuration validation preceding the current call to ApplyConfig.

@roymiloh

This comment has been minimized.

Copy link
Contributor Author

roymiloh commented Oct 9, 2015

Thank you for reviewing.

The validation phase should contain logic only, it's not about I/O at all, so IMO assuming valid configuration always succeeds is reasonable. Having that's said, current PR does include I/O operations because current architecture does not allow to pass any content (= the loaded rule files) / arbitrary things between methods.

Actually, because of the separation of concerns, code reuse is achieved easily. You can even take the PR and use it in protool for checking rules (it's only about creating an instance of config and use the ValidateConfig of rule manager). Also, it's not that related, protool is not about applying configuration, so it's not centralized in the same terms of prometheus.

And still, I'm not sure how it solves the semi-problem of theoretical "commit" and how we reuse the already read rule files in validation phase (because we still use ApplyConfig; it's only possible if we attach it to the config instance, but same solution is also relevant to the current PR).

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 9, 2015

I agree with @fabxc that there's a tension between centralizing the logic and thus having that easily reusable for promtool without having to instantiate all kinds of objects (ok, right now it would be only the target manager, but still) and the alternative of decentralizing the validation logic and encapsulating it in the types that are authoritative for them.

Given that for the 100% atomic solution (don't allow for rule file changes between validation and application) we would need to read the rule files once outside of the target manager and attach them to the config object, that's already one more thing that we can't do decentralized anymore anyways.

But otherwise, I'm quite ambivalent about this and will let @fabxc make the decision here if he feels strong one way or the other.

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Jan 7, 2016

I would also argue that this is not just about verbosity, but about trying to tell the user what the actual root cause is. Point in case:

Spurious 'target' stanza:

Having an extra "target' in front of the target as such:

    target_groups:
      - targets:
        - target: 127.127.0.1

results in

    INFO[842745] Loading configuration file prometheus.yml     source=main.go:196
    ERRO[842745] Couldn't load configuration (-config.file=prometheus.yml): yaml: unmarshal errors:
  line 362: cannot unmarshal !!map into string  source=main.go:208

Wrong indentation

The above indentation is incorrect, the lower is correct

  - job_name: 'foo'
   target_groups:
    target_groups:

which is harder to spot in a longer YAML including comments. This results in

INFO[843257] Loading configuration file prometheus.yml     source=main.go:196
ERRO[843257] Couldn't load configuration (-config.file=prometheus.yml): yaml: line 359: did not find expected '-' indicator  source=main.go:208
@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jan 7, 2016

Agreed. But these are YAML errors. We make an effort to give meaningful error messages for anything after that but the YAML parsing itself is something we cannot influence, unfortunately. This would require to write a better YAML parser, which is obviously not realistic for us to do.

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Jan 7, 2016

Looking at https://github.com/go-yaml/yaml/issues I am not sure if filing those two will be of much use. Pity.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 13, 2016

This is covered by other PRs/issues.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.