Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Epic) Define semantics (and testing strategy) when host version is revved #310

Closed
MonsieurNicolas opened this issue Aug 3, 2022 · 8 comments
Assignees

Comments

@MonsieurNicolas
Copy link
Contributor

Context:

  • network is at protocol 20 (corresponds to host version 20.0.0)
  • network is going to upgrade to 21 (host 21.0.0)
  • contract was developed against version 20

Scenarios:

  • Can the contract be updated to support 21 before the protocol switch (as in: build using the SDK 21.0.0)? This implies adding code conditional on protocol version if the contract wants to use "21 features" as soon as they're enabled.
  • Similar to the previous point. As host developer, how can we ship "21 ready" code that works in the same way than "20" until the network votes for 21 (at which point it follows 21 semantics)?
    * Even if we have both runtimes on the core side during switchover, the current plan is to pretty much guarantee that version 21 supports 20 semantics so that we don't have to keep a dependency from core on all historical releases
    * this implies that we also have conditionals "in the right spot" to switch behavior (ie: there is a "config" of sorts that allows to switch between protocol versions at runtime).
  • How can a contract developer test the behavior of their code compiled against 20 when the host revs to 21 as part of due diligence for a network upgrade. Note this implies some sort of isolation wrt types, as "recompiling using 21" may taint certain types/behavior if not done properly.

This is somewhat related to #289

@graydon
Copy link
Contributor

graydon commented Aug 4, 2022

The core part of this of this is started in stellar/stellar-core#3471

@graydon graydon self-assigned this Feb 9, 2023
@MonsieurNicolas
Copy link
Contributor Author

Bumping this as there were recent discussions that should be impacted by this.

Additional things to think about:

  • we really want to ensure that everything is pinned so that we can have releases that are not protocol changes (ie: that do not require consensus)
    • in particular, we have to be sure to understand the consequences of revving the toolchain
  • any given network protocol version maps to a very specific version of env. This happens to include things like WASM parser, runtime, types, etc. We may be able to use the same version for different parts with additional testing, but by default we should isolate everything.
  • when invoking a contract
    • we need to know which version of the host to use
      • specifically, we need to know which version of the WASM parser to use in order to load the WASM blob, this may indicate that some properties like "protocol version" are stored alongside code in ContractCodeEntry (I know we're still discussing this, but I didn't find issues to link to)
    • we probably need to have some sort of hook (via dependency injection?) so that the right thing happens when contracts from different environment versions call each other (in both directions of compat).
      • this probably has some interesting implications in terms of how and where the "version dispatch code" lives.
        • in the prototype above, it was started in core's bridge, but it may be cleaner to have an intermediate rust-multiversion-host as this problem will exist for core and others (like soroban-rpc)
      • additional constraints may come when testing contracts that target x86
      • auth should work properly when the call tree / account abstraction method (__check_auth) are using different versions
  • certain scenarios need to be addressed. Like how to install a contract that uses an old protocol version (I think this has to be allowed if we do not want to break contract deployment when the network upgrades)

@MonsieurNicolas
Copy link
Contributor Author

Talking to @dmkozhI realized that I should have been a bit more explicit on what the problems are wrt multi versions.

The "protocol version" attached to a contract is not really a protocol version, more like an ABI version.

All changes to the protocol change "the protocol version", but there are really two kind of changes:

  • a critical/security fix in a host function/built-in contract -- we want all contracts (even already deployed) to see the change.
  • usability fixes and other additions to the protocol, where "v2" versions of everything are available, including critical parts of the runtime like the WASM runtime (if we can't guarantee that it's 100% backward compat). "v1" contracts should not be able to observe any change (ie: when they call a host function for example, it should behave exactly the same post protocol upgrade).

For the later scenario, there are multiple possible strategies there:

  • snapshot. where "v2" gets a new env (any breaking change is possible). This allows "v2" to be fairly clean as there is only one version of everything in the same codebase
  • mix versions in the same code base (like we did in C++). For certain things, just adding a new "v2" version of a hostfunction might be OK, but guaranteeing that there are no breaking changes for v1 may be a bit trickier and the SDK may get pretty messy.

Regardless of which kind of runtimes are available at a specific "protocol version", we need:

  • to be able to replay traffic for previous versions of the protocol
  • ensure that "current" is stable and we only switch to a new protocol version after consensus

@MonsieurNicolas
Copy link
Contributor Author

stellar/stellar-core#3471 was merged which sets the foundation for multi-version support -- work remaining is to figure out how to also implement this consistently between core, soroban-rpc and potentially cli

@graydon
Copy link
Contributor

graydon commented Aug 18, 2023

Re downstream systems, @anupsdf just ran a test where we approached this problem how I (roughly) previously thought it would work, and it worked:

  1. Upgrade the downstream systems (CLI and RPC) from v20 -> v21 first. They do not need multiversioning support. They build a contract for v21 and support preflighting it (possibly slightly incorrectly, eg. admitting a v21 contract call before the core network would admit it or giving very slightly wrong fee, footprint or other preflight advice -- this is harmless as worst case it just causes a tx to bounce when it hits the real network)
  2. Upgrade the network second, using multi-versioning. The cut-over will cause ledgers to start being emitted with v21, but the RPC is already ready to receive them at that point.

@MonsieurNicolas
Copy link
Contributor Author

I think this is mostly complete.
I would say that the one thing left to do is to have some automated test somewhere that ensures that multi-host actually works in core:
maybe something we can do as soon as we have enabled protocol 20 by default, as any "vnext" build will be protocol 21 and we can have a special behavior in vnext that we can test against. Probably an issue to open in the https://github.com/stellar/supercluster repo

@anupsdf
Copy link
Contributor

anupsdf commented Aug 29, 2023

I think this is mostly complete. I would say that the one thing left to do is to have some automated test somewhere that ensures that multi-host actually works in core: maybe something we can do as soon as we have enabled protocol 20 by default, as any "vnext" build will be protocol 21 and we can have a special behavior in vnext that we can test against. Probably an issue to open in the https://github.com/stellar/supercluster repo

Created a tracking issue for adding automated test.

@graydon
Copy link
Contributor

graydon commented Sep 1, 2023

Ok, if that residue was all there was left here, closing this.

@graydon graydon closed this as completed Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants