Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy workflow #1540

Merged
merged 12 commits into from Nov 18, 2019
Merged

Deploy workflow #1540

merged 12 commits into from Nov 18, 2019

Conversation

slowli
Copy link
Contributor

@slowli slowli commented Nov 13, 2019

This PR describes artifact deployment in greater detail and adds some tests covering deployment.

@codecov
Copy link

codecov bot commented Nov 13, 2019

Codecov Report

Merging #1540 into dynamic_services will decrease coverage by 0.1%.
The diff coverage is 87.43%.

Impacted file tree graph

@@                 Coverage Diff                  @@
##           dynamic_services    #1540      +/-   ##
====================================================
- Coverage             93.87%   93.77%   -0.11%     
====================================================
  Files                   205      203       -2     
  Lines                 29086    29282     +196     
====================================================
+ Hits                  27304    27458     +154     
- Misses                 1782     1824      +42
Impacted Files Coverage Δ
exonum/src/runtime/mod.rs 91.46% <ø> (ø) ⬆️
exonum/src/runtime/rust/call_context.rs 92.85% <ø> (ø) ⬆️
exonum/src/runtime/rust/service.rs 89.85% <100%> (ø) ⬆️
exonum/src/runtime/dispatcher/mod.rs 91.48% <100%> (-2.54%) ⬇️
exonum/src/blockchain/tests.rs 97.1% <66.66%> (-0.04%) ⬇️
exonum/src/runtime/dispatcher/tests.rs 86.18% <87.64%> (+0.93%) ⬆️
components/merkledb/src/error.rs 60% <0%> (-10%) ⬇️
services/supervisor/src/lib.rs 85.36% <0%> (-5.64%) ⬇️
services/supervisor/src/transactions.rs 88.53% <0%> (-4.67%) ⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9e7e42...629e813. Read the comment docs.

@aleksuss aleksuss added this to the Release 0.13 milestone Nov 14, 2019
exonum/src/runtime/dispatcher/tests.rs Outdated Show resolved Hide resolved
exonum/src/runtime/dispatcher/tests.rs Show resolved Hide resolved
exonum/src/runtime/dispatcher/tests.rs Outdated Show resolved Hide resolved
exonum/src/runtime/dispatcher/tests.rs Outdated Show resolved Hide resolved
exonum/src/runtime/dispatcher/tests.rs Outdated Show resolved Hide resolved
exonum/src/runtime/dispatcher/tests.rs Outdated Show resolved Hide resolved
@popzxc popzxc added core Changes affecting exonum core documentation labels Nov 14, 2019
//! 4. Async deployment usually has a deadline and/or success and failure conditions. As an example,
//! the supervisor may collect confirmations from the validator nodes that have
//! successfully deployed the artifact, and once all the validator nodes have sent
//! their confirmations, the artifact is *committed*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no 'undeploy' operation in case when some nodes failed to deploy the artifact, is that correct that the runtimes do not know if and when the artifact is actually committed and just rely on the (?) dispatcher issuing start_adding_service only if the conditions are met?

What would happen if

  1. Most nodes deployed an artifact correctly, but a couple of nodes failed (e.g., the admins forgot to put the artifact where it belongs). As a result, the runtimes on the 'good' nodes have it deployed; on the 'bad' — don't. The runtimes on the 'good' don't know that it was not committed.

  2. The administrators attempt a re-deploy. This time, as the 'good' nodes already have this artifact, the operation fails there (just because of an illegal attempt to deploy it twice); and succeeds on the 'bad' ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, "consensus-agreed result" = failure -> deploy of this particular artifact failed in the whole network, but 'good' runtimes will retain the artifact as deployed till restarted -> the redeploy of the same artifact is impossible without restart, is that correct?

I think a deploy error on some but all nodes is a realistic use-case (but the one we can probably agree to ignore in the upcoming release). Would a 'Runtime#unloadArtifact' operation, that the dispatcher invokes when the operation failed in terms of network but completed successfully on this node (i.e., if I remember the current policy correctly, failed on any single node), help? Or Runtime#unloadIfPresent if we'd like the dispatcher to not have local data on whether the local deployment was successful — but it might already have that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the redeploy of the same artifact is impossible without restart

@slowli , @aleksuss Is this limitation planned to be addressed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK yes, we plan it.

/// if several alternative initial service configurations are tried), but as a rule of thumb,
/// a `Runtime` should not return an error or panic here unless it wants the node to stop forever.
/// An error or panic returned from this method will not be processed and will lead
/// to the node stopping. A runtime should only return an error / panic if the error is local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that the Java runtime that performs some user and framework code (= same on all nodes and hopefully mostly deterministic — we can't say that of all the framework and library code) that is not perfomed in start_adding_service (but is not considered likely to fail) violates this restriction? But respects

The runtime should commit long-term resources for the service after a commit_service() call.

Is that fine given small/unknown likelyhood? Or shall we attempt to do everything in start_adding_services and then just flip a switch (that would complicate things with clean ups in after_commit, and, if we need that, a restriction of API operation of a newly added service before commit_service; oh, and with no start_adding_ for restarts, it will have to operate differently for newly added and restarted services)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that the Java runtime [...] violates this restriction?

There is no 100% non-violable restriction here, hence the use of "should" instead of "must" as per RFC 2119. The balance between delaying resource commitment and potentially ending up with the non-functional blockchain is up to the runtime. Subjectively, I don't think it's reasonable to assume that the probability of the service developer screwing up HTTP API (but not screwing up anything else) is high enough to complicate workflow because of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree it is reasonable to not complicate the workflow for highly unlikely scenarious (also given the user tests their code with testkit).

Copy link
Contributor

@dmitry-timofeev dmitry-timofeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Runtime) LGTM.

@aleksuss aleksuss merged commit 6b11d47 into exonum:dynamic_services Nov 18, 2019
@slowli slowli deleted the deploy-workflow branch January 22, 2020 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes affecting exonum core documentation
Development

Successfully merging this pull request may close these issues.

None yet

5 participants