Deploy workflow #1540

slowli · 2019-11-13T14:34:42Z

This PR describes artifact deployment in greater detail and adds some tests covering deployment.

codecov · 2019-11-13T15:06:01Z

Codecov Report

Merging #1540 into dynamic_services will decrease coverage by 0.1%.
The diff coverage is 87.43%.

@@                 Coverage Diff                  @@
##           dynamic_services    #1540      +/-   ##
====================================================
- Coverage             93.87%   93.77%   -0.11%     
====================================================
  Files                   205      203       -2     
  Lines                 29086    29282     +196     
====================================================
+ Hits                  27304    27458     +154     
- Misses                 1782     1824      +42

Impacted Files	Coverage Δ
exonum/src/runtime/mod.rs	`91.46% <ø> (ø)`	⬆️
exonum/src/runtime/rust/call_context.rs	`92.85% <ø> (ø)`	⬆️
exonum/src/runtime/rust/service.rs	`89.85% <100%> (ø)`	⬆️
exonum/src/runtime/dispatcher/mod.rs	`91.48% <100%> (-2.54%)`	⬇️
exonum/src/blockchain/tests.rs	`97.1% <66.66%> (-0.04%)`	⬇️
exonum/src/runtime/dispatcher/tests.rs	`86.18% <87.64%> (+0.93%)`	⬆️
components/merkledb/src/error.rs	`60% <0%> (-10%)`	⬇️
services/supervisor/src/lib.rs	`85.36% <0%> (-5.64%)`	⬇️
services/supervisor/src/transactions.rs	`88.53% <0%> (-4.67%)`	⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9e7e42...629e813. Read the comment docs.

exonum/src/runtime/dispatcher/tests.rs

dmitry-timofeev · 2019-11-15T17:04:57Z

exonum/src/runtime/mod.rs

+//! 4. Async deployment usually has a deadline and/or success and failure conditions. As an example,
+//!   the supervisor may collect confirmations from the validator nodes that have
+//!   successfully deployed the artifact, and once all the validator nodes have sent
+//!   their confirmations, the artifact is *committed*.


As there is no 'undeploy' operation in case when some nodes failed to deploy the artifact, is that correct that the runtimes do not know if and when the artifact is actually committed and just rely on the (?) dispatcher issuing start_adding_service only if the conditions are met?

What would happen if

Most nodes deployed an artifact correctly, but a couple of nodes failed (e.g., the admins forgot to put the artifact where it belongs). As a result, the runtimes on the 'good' nodes have it deployed; on the 'bad' — don't. The runtimes on the 'good' don't know that it was not committed.

The administrators attempt a re-deploy. This time, as the 'good' nodes already have this artifact, the operation fails there (just because of an illegal attempt to deploy it twice); and succeeds on the 'bad' ones.

So, "consensus-agreed result" = failure -> deploy of this particular artifact failed in the whole network, but 'good' runtimes will retain the artifact as deployed till restarted -> the redeploy of the same artifact is impossible without restart, is that correct?

I think a deploy error on some but all nodes is a realistic use-case (but the one we can probably agree to ignore in the upcoming release). Would a 'Runtime#unloadArtifact' operation, that the dispatcher invokes when the operation failed in terms of network but completed successfully on this node (i.e., if I remember the current policy correctly, failed on any single node), help? Or Runtime#unloadIfPresent if we'd like the dispatcher to not have local data on whether the local deployment was successful — but it might already have that.

the redeploy of the same artifact is impossible without restart

@slowli , @aleksuss Is this limitation planned to be addressed?

AFAIK yes, we plan it.

dmitry-timofeev · 2019-11-15T17:09:18Z

exonum/src/runtime/mod.rs

-    /// if several alternative initial service configurations are tried), but as a rule of thumb,
-    /// a `Runtime` should not return an error or panic here unless it wants the node to stop forever.
+    /// An error or panic returned from this method will not be processed and will lead
+    /// to the node stopping. A runtime should only return an error / panic if the error is local


Am I right that the Java runtime that performs some user and framework code (= same on all nodes and hopefully mostly deterministic — we can't say that of all the framework and library code) that is not perfomed in start_adding_service (but is not considered likely to fail) violates this restriction? But respects

The runtime should commit long-term resources for the service after a commit_service() call.

Is that fine given small/unknown likelyhood? Or shall we attempt to do everything in start_adding_services and then just flip a switch (that would complicate things with clean ups in after_commit, and, if we need that, a restriction of API operation of a newly added service before commit_service; oh, and with no start_adding_ for restarts, it will have to operate differently for newly added and restarted services)?

Am I right that the Java runtime [...] violates this restriction?

There is no 100% non-violable restriction here, hence the use of "should" instead of "must" as per RFC 2119. The balance between delaying resource commitment and potentially ending up with the non-functional blockchain is up to the runtime. Subjectively, I don't think it's reasonable to assume that the probability of the service developer screwing up HTTP API (but not screwing up anything else) is high enough to complicate workflow because of it.

Thanks, I agree it is reasonable to not complicate the workflow for highly unlikely scenarious (also given the user tests their code with testkit).

dmitry-timofeev

(Runtime) LGTM.

slowli added 3 commits November 13, 2019 16:21

Describe artifact lifecycle in more details

abe9862

Add tests for artifact deployment

f6ff21f

Fix grammar

40ad4b4

slowli requested review from alekseysidorov and aleksuss as code owners November 13, 2019 14:34

aleksuss requested review from popzxc and pavel-mukhanov November 13, 2019 14:48

aleksuss added this to the Release 0.13 milestone Nov 14, 2019

Merge branch 'dynamic_services' into deploy-workflow

cb2922b

aleksuss approved these changes Nov 14, 2019

View reviewed changes

alekseysidorov approved these changes Nov 14, 2019

View reviewed changes

slowli added 2 commits November 14, 2019 14:02

Remove merge artifacts

51db4cd

Make comment more compact

880e97d

popzxc reviewed Nov 14, 2019

View reviewed changes

popzxc added core Changes affecting exonum core documentation labels Nov 14, 2019

slowli added 4 commits November 14, 2019 16:26

Elaborate on return value in commit_service

e68eca9

Refactor dispatcher tests

df090e7

Add TODO

bbbd2da

Remove redundant cloning

e4f933e

popzxc approved these changes Nov 15, 2019

View reviewed changes

dmitry-timofeev reviewed Nov 15, 2019

View reviewed changes

slowli added 2 commits November 15, 2019 20:03

Clarify artifact commitment

1950e0e

Clarify error handling during service creation

629e813

dmitry-timofeev approved these changes Nov 15, 2019

View reviewed changes

aleksuss merged commit 6b11d47 into exonum:dynamic_services Nov 18, 2019

slowli deleted the deploy-workflow branch January 22, 2020 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy workflow #1540

Deploy workflow #1540

slowli commented Nov 13, 2019

codecov bot commented Nov 13, 2019 •

edited

dmitry-timofeev Nov 15, 2019

dmitry-timofeev Nov 15, 2019

dmitry-timofeev Nov 20, 2019

aleksuss Nov 20, 2019

dmitry-timofeev Nov 15, 2019

slowli Nov 15, 2019

dmitry-timofeev Nov 15, 2019

dmitry-timofeev left a comment

Deploy workflow #1540

Deploy workflow #1540

Conversation

slowli commented Nov 13, 2019

codecov bot commented Nov 13, 2019 • edited

Codecov Report

dmitry-timofeev Nov 15, 2019

Choose a reason for hiding this comment

dmitry-timofeev Nov 15, 2019

Choose a reason for hiding this comment

dmitry-timofeev Nov 20, 2019

Choose a reason for hiding this comment

aleksuss Nov 20, 2019

Choose a reason for hiding this comment

dmitry-timofeev Nov 15, 2019

Choose a reason for hiding this comment

slowli Nov 15, 2019

Choose a reason for hiding this comment

dmitry-timofeev Nov 15, 2019

Choose a reason for hiding this comment

dmitry-timofeev left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 13, 2019 •

edited