(Epic) Define semantics (and testing strategy) when SDK version is revved #1095

MonsieurNicolas · 2022-08-03T16:59:33Z

This is similar to stellar/rs-soroban-env#310 but scenarios are around SDK changes that are not impacting the core protocol.

Context:

contract A and B are compiled with SDK 1.0.0
contract A depends on B (ie: A invokes B)

Scenario:

A's developer wants to upgrade to SDK 2.0.0 and wants to test what happens when calling B compiled with version 1.0.0
B gets recompiled using SDK 2.0.0, A's developer wants to test and understand how A would be impacted.

This is somewhat related to stellar/rs-soroban-env#289

Tasks

Give feedback

The text was updated successfully, but these errors were encountered:

leighmcculloch · 2022-08-08T21:40:33Z

contract A depends on B (ie: A invokes B)

There are a couple variations of this. For example:

contract A invokes contract B
contract A imports the Rust crate for contract B

For (1) the compatibility layer is the host layer, and so the SDK plays very little role in there being compatibility or incompatibility between A and B. Except for the contract spec. Bs contract spec might contain elements that are new if it is using a new SDK, in which case it would be ideal if A fails to build gracefully, indicating that a newer or unrecognized version of the contract spec is in use and A needs to use a newer SDK. The plan is to not make any breaking changes to the contract spec, so if A is using a newer contract spec that should have no bearing on anything. The contract spec has versioning already embedded into it for these cases.

For (2) things are much more fuzzy and I don't think we should do much other than use semver to the best of our ability. If anything in the SDK is a breaking change, increment the major version, and developers will use Cargo and versioning definitions to appropriately constrain their compatibility like they do for any Rust crate.

MonsieurNicolas · 2022-08-09T00:55:24Z

For (1) the compatibility layer is the host layer, and so the SDK plays very little role in there being compatibility or incompatibility between A and B. Except for the contract spec

yes, this is why I separated the "host compat issues" and this issue that tracks changes that may impact the contract spec in the broader sense (behavior included).
This includes issues related to custom types (UDTs), but I am sure there are more issues like if certain shared (embedded) functions/types behave slightly differently for example (therefore testing is needed).

Except for the contract spec. Bs contract spec might contain elements that are new if it is using a new SDK, in which case it would be ideal if A fails to build gracefully, indicating that a newer or unrecognized version of the contract spec is in use and A needs to use a newer SDK. The plan is to not make any breaking changes to the contract spec, so if A is using a newer contract spec that should have no bearing on anything. The contract spec has versioning already embedded into it for these cases.

"failing to build" does not make sense in this context: if you require all contracts to be build against the exact same SDK, there is no way to upgrade them (contracts get deployed if at all independently).
For clarity, what I was talking about here is a contract update that picks up the latest SDK (for bug fix, or whatever) and all effort is made by the party making those changes to not want to break older contracts.
I imagine that most changes that people will do would preserve contract specs in terms of "broad" function signature, and follow patterns like what we do in xdr: add enum values for examples is safe for inputs at bottom of the dependency tree (old clients calling an updated contract would call with the old enum values only), for UDTs it's a similar story.

graydon · 2023-06-22T22:43:34Z

@MonsieurNicolas I had a chat with @leighmcculloch and talked this through some more, and am recording the results of that for posterity / "an official plan" here.

Almost all of the time, if contract A depends on contract B (in the sense of calling it) A will import B's spec and env-meta, and generate their own client for B. The env-meta in B describes the env and protocol against which B was compiled, in terms of a protocol version. The spec describes how to invoke functions, and the spec fields are all versioned XDR (they have _V0 on their enum/struct names) and spec evolution is defined to be backward-compatible (new entries allowed that old code won't understand yet, but old entries retain meaning and new code can call old code). So if new A tries to call old B, A can import B, see that it's old, and generate an old-compatible client for it. If old A tries to call new B -- old A here being in the sense of a client with an old SDK -- then if the SDKs are similar enough (same env-meta protocol and B not using any new spec features) it will work, otherwise it will fail with a "A needs to upgrade SDK before being able to call B" message.
Occasionally A will actually link against B in the normal rust import-and-use sense. This should be rare, but it's a workflow people might do for eg. high speed native integration testing. This will face the exact same set of constraints that any rust linkage story faces when crate A calls crate B with some shared type they both import C: if C is part of the interface between A and B, cargo will need to be able to resolve C to a single version otherwise there'll be a link error. So sometimes it will work (when A and B are loose enough with their SDK or env version specifications in Cargo.toml) that they are resolvable to a single version, other times it will not work (eg. if A specifies SDK 2.0.1 exactly and B specifies 2.0.2 exactly). We can't control this. We can make it unlikely-to-fail by making the types of the SDK mostly not involved in the cross-contract calling interface (eg. mostly the interface can consist of env types) but A and B can still specify incompatible versions of the env transitively through SDKs. I do not think this will be a big problem because (a) it's rare (b) it's the same as happens in any rust project (c) it happens locally on a user's workstation and only because of something the user specifically did rather than accidentally at runtime or in production or whatever and (d) everyone knows how to fix it, you loosen Cargo.toml specs or upgrade one party or downgrade the other.

This bug does mention the testing strategy, which I think is relevant but fairly straightforward. We should test the following:

For each previous protocol P:
- Maintain a pair of contracts A and B where A calls B and both use only features in protocol P
- Check A and B can both build against new SDK, and A can call B when both are new
- Check that building A against new SDK and importing old B spec allows new A to call old B
- Check that building B against new SDK and deploying does not break old A's ability to call new B

There's one additional bit of diagnostic work to do here, which is bug #735 -- embedding the SDK version info into the meta section of a contract. That's assigned to me now but is not a strict requirement for anything, just a helper for diagnostics.

leighmcculloch · 2023-06-22T22:52:17Z

Regarding the testing story, I think it would be good if we could maintain a repo of test vectors for a variety of different SDK versions, that CI builds and runs against a variety of env versions. We could have it run daily against the tip of other repos too, so that we get warning ahead of time as the examples need updating, or things break from a compatibility standpoint.

leighmcculloch · 2023-06-22T22:54:20Z

Another idea for the testing story, I think it'd be interesting if contracts defined as part of their interface a set of test inputs, with expected outputs. If that existed, we could download latest ledger state, and run the tests across all recently used contracts that have test data, using an unreleased environment.

MonsieurNicolas · 2023-08-28T21:15:29Z

(going over "upgrade" related issues)
@leighmcculloch do you have some idea of practical steps we can take here for v1?

For the testing story, is it as simple as having

some test vectors that can be used in the context of multi versioning
potentially, allow to "export" those test vectors (both inputs and outputs), maybe xdr encoded, to allow for old vs new testing

On the implementation front, can we try out the proposed steps and see how it goes?

As to where we can record both of these.... is this a SEP of sorts (before we even talk about automating some of it)?
It sounds like there would be a fairly good number of expectations that need to be manually implemented by contract developers.

leighmcculloch · 2023-08-28T23:18:22Z

In terms of testing:

The scope needs deciding. The scope could be no stellar-core, and just the soroban-env, if the goal is to detect purely contract and environment changes over time. That scope wouldn't be testing features like expiry.

Agree on a goal, which I think should be consistency, not correct behavior. i.e. Tests don't assert "correct" outputs, they assert consistent outputs across versions, etc.

As a starting point:

create a repo.
write a couple of initial test vectors. we can add more as we go, so I don't think we need to spend much time getting right what the test vectors are right now, if we get it wrong we can change them.
get all the outputs (return values, events, data written) from an invocation into some easily comparable-by-machine and readable-by-humans form, such as xdr rendered as json (rs-stellar-xdr will do this for us), so that when we do diffs it's relatively easy for anyone to understand.
decide how we will define or generate test scenarios (ranges and matrix of versions), store results, compare results over time, etc.

Once we get the foundation implemented we could look at:

adding test vectors for auth
host function coverage tracking so we can identify host functions not tested
host value coverage tracking
supplement test vectors with pubnet contracts, pubnet history of inputs (data, args) -> outputs (return, events, data)

It sounds like there would be a fairly good number of expectations that need to be manually implemented by contract developers.

What are you thinking contract developers will need to implement? Iirc all the components that @graydon noted above should be covered by the SDK and Environment in the sense that we'd aim to not introduce any breakage into any existing API provided at either the Env or SDK levels. Breaking changes would be new functions/types. Changes to contract calling convention would be absorbed by either new SDKs, or the Environment.

leighmcculloch · 2023-08-28T23:22:51Z

is this a SEP of sorts

I think we need to write down a compatibility guarantee. What existing functionality won't or will change. E.g. will existing host functions change and if yes under what conditions (e.g. only security fixes, only bug fixes)?

What type of things do you see going into a SEP?

leighmcculloch · 2023-08-28T23:32:46Z

create a repo.

@MonsieurNicolas Could these tests fit into the integration testing that stellar-core does in supercluster? Or do you think this need to be a new thing?

MonsieurNicolas · 2023-09-05T23:44:10Z

@leighmcculloch this issue is specifically looking at contract behaviors, your recent comments make it look like this is about "host testing", which is not the intent.

So going back to the intent of this issue: which is focused on contract developers.
I don't think this is a "supercluster" type of test infra: this would be way too late into the testing pipeline and we want contract dev elopers to find compat issues (assuming there is some Soroban toolchain build they can use).
As any "env" is supposed to support current and any previous protocol version (with bit level fidelity), I think that it should be possible to run this kind of test using any env (as long as it meets the min requirement in the contract spec).

If we're looking at "beta testing" this: maybe using the https://github.com/stellar/soroban-examples repo should do the trick (it should be possible to spin up an AMM.sdk_v_X that swaps tokens implemented with sdk_v_Y and sdk_v_Z)? That way contract developers (could be AMM dev or token dev in this example), can test the various combinations of cross contract behavior when some "automagically generated client" is used.

Note that we could use Soroban-examples as part of the validation criteria for Soroban (core, tools, etc) as some sort of smoke test as we expect any "Soroban build" to not break examples.

I mentioned having a SEP of sorts because specifying test vectors in a standard way allows to test any possible cross contract matrix, regardless of environment (and I am not sure I fully understand yet how all this would work).

leighmcculloch · 2023-09-06T00:16:53Z

this issue is specifically looking at contract behaviors, your recent comments make it look like this is about "host testing", which is not the intent.

I agree we need sdkN x sdkM testing, but it is a slice of a larger set testing we need to do. The type of machinery we'll need for sdk testing seems like the same sort of machinery we need for testing envN x envM, and sdkN envM x sdkO envP.

Contract behaviors, beyond their own logic, are mostly defined by the host, not by the SDK, because the system is "host heavy" in terms of where functionality lives. The SDK is a very thin layer. It is 99% just changing how a host function is presented, so it looks more like common Rust code. If we intend to test contract behavior consistency outside of the contracts own internal logic, we need to test across all variations of 1) Env versions, 2) a matrix of SDK and Env versions, and 3) a matrix of SDK and SDK versions.

Are there other ways we intend to test Env's participation in consistency?

I mentioned having a SEP of sorts because specifying test vectors in a standard way allows to test any possible cross contract matrix, regardless of environment (and I am not sure I fully understand yet how all this would work).

Something that @graydon and I discussed offline previously was the idea of contracts embedding test vectors into their contract interface stored in contracts. It would allow anyone to execute them on an upcoming protocol release, and form part of their guarantees on expected behavior. There's tradeoffs here in terms of binary size which would probably lead to storing them elsewhere, and in general challenge of making it expressive enough though. This idea hasn't been explored beyond this.

leighmcculloch · 2023-09-06T00:20:23Z

Scenario:

A's developer wants to upgrade to SDK 2.0.0 and wants to test what happens when calling B compiled with version 1.0.0

B gets recompiled using SDK 2.0.0, A's developer wants to test and understand how A would be impacted.

Ah, so yes I see I misunderstood the issues intent.

The issue is specifically exampling how do contract developers test their own contracts compatibility across versions. The examples mention SDK version, but ideally it would also include env/protocol versions.

If the intent is to give contract developers the tools to run their own tests, then I think we're talking about building versioning tooling into the soroban-cli. I can brainstorm this some more.

leighmcculloch · 2023-09-06T00:31:58Z

I've opened a new issue for the other problem I was discussing:

Define testing strategy for sdk and env consistency across versions rs-soroban-env#1040

leighmcculloch · 2023-09-07T00:13:32Z

I wrote up a form of this comment, but lost it to a browser bug... (browser ate my homework!) ...hopefully this rewrite captures everything.

Scenario:

A's developer wants to upgrade to SDK 2.0.0 and wants to test what happens when calling B compiled with version 1.0.0

B gets recompiled using SDK 2.0.0, A's developer wants to test and understand how A would be impacted.

I think we've already built the bare minimum tools a contract developer needs to handle the scenarios that this issue is concerned with.

There are two approaches to writing integration tests – tests that exercise more than one contract at once – for Soroban contracts: 1) by loading dependencies as .wasm, 2) by registering dependencies natively. The tutorials and docs mostly discuss the former as it has less surprises, several advantages, one being specifically relevant to this discussion.

When developing contracts at arms length the docs recommend loading dependencies as .wasm. Because dependencies are loaded as .wasm it doesn't really matter what version of the SDK they're using. The other contract may not even be using a Rust SDK. It could be using the AssemblyScript SDK, or a future Zig SDK. The interface that the contract exposes isn't defined by the SDK it uses, but by the Env interface. And so the process of checking if a contract will upgrade safely is the same process for any integration test run.

For example:

I build contract A with SDKv1.
Someone else builds contract B with SDKv1.
Contract A calls contract B.
I write tests for contract A that load contract Bs .wasm file into an Env alongside contract A.
Tests executed in Env are running both contracts that were built with SDKv1.
I upgrade contract A to SDKv2.
I run the integration tests that still have contract B's .wasm file built with SDKv1.
Assuming my integration tests have sufficient coverage and sufficient assertions, I should be able to confirm that the behavior of contract A and contract B is consistent.
To be extra confident of consistency:
- The Rust SDK test framework provides a function for saving a snapshot of ledger state to a .json file. e.g. env.to_snapshot_file("testname.json"). I can add to the end of every test, run all the tests before and after the upgrade, and then compare the results.
- I can use property tests, or fuzzing (although not branch guided because the dependency is .wasm), to select a wide range of inputs and do the same.

The same process works if I find out that contract B plans to upgrade, and they happen to share a pre-release .wasm somewhere. I can load it and test its behavior. If the developer for contract B publishes pre-releases this works.

For developers that develop a set of contracts in close proximity as a single unit, and not at arms length, they have options. They can develop with each dependency being loaded as a .wasm. But more likely they will develop and test with all contracts loaded natively. In this case Rust won't allow us to test contract A and contract B if they are on different versions of the SDK, because the SDK types will be incompatible and Rust will not allow a contract built with SDKv1 to be loaded into an env in SDKv2. I don't think there's a way around this, it is just a tradeoff for developing contracts natively.

I think there is one thing we could do to make it easier to do this type of testing:

We could add a custom test attribute, such as #[contracttest], to replace the builtin Rust #[test] attribute. We can add additional behavior to contracttest tests such as automatically writing out the ledger state to the file system. This might encourage developers to commit test recordings and therefore validate that they are consistent over time. Test frameworks in other ecosystems do this, such as "VCRs" in the Ruby ecosystem.
Add an environment variable that triggers the same behavior ☝🏻 for all regular Rust tests that use an Env.

I don't think we need a SEP to support the above, but we could have a doc page that describes how to test an SDK upgrade.

It's unclear to me how useful publishing test vectors would be for this scenario. As a developer of contract A I'm more interested in my own integration tests that exercise both A and B together, than I am of tests of B in isolation.

leighmcculloch · 2023-09-07T15:45:22Z

Whilst we have a decent story so far in the SDK, I've opened #1082 to explore improving the SDKs test framework support for testing general contract consistency over time.

leighmcculloch · 2023-09-07T16:26:01Z

Opened issue for adding documentation mentioned above: stellar/stellar-docs#480

leighmcculloch · 2023-11-10T20:36:11Z

Adding functionality to the SDK to make it trivial to test contract consistency over time:

Autosave a test snapshot file on every test exit #1137

leighmcculloch · 2023-11-11T01:04:02Z

With #1137 implemented I think we're providing a lot for developers to use to detect unexpected inconsistencies in their contracts. The main two features we're providing is:

Integration testing of .wasm across SDK/env/protocol versions. (Existing functionality already in the SDK.)
Recording behavior for tracking over time. (Implemented in Autosave a test snapshot file on every test exit #1137)

The last thing missing is documentation to capture these important concepts, which will be addressed by:

Add docs for how to test contract consistency and sdk/env/protocol upgrades stellar-docs#480

anupsdf · 2024-04-08T23:41:03Z

Closing this because its done and only doc issue is pending.

graydon self-assigned this Feb 9, 2023

graydon mentioned this issue Mar 31, 2023

Sketch of a subtype judgment on contract specs stellar/rs-soroban-env#747

Draft

leighmcculloch mentioned this issue Sep 6, 2023

Define testing strategy for sdk and env consistency across versions stellar/rs-soroban-env#1040

Open

leighmcculloch mentioned this issue Sep 7, 2023

Add test utilities that make it easier to test contract consistency over time #1082

Closed

leighmcculloch mentioned this issue Apr 16, 2024

Add docs for how to test contract consistency and sdk/env/protocol upgrades stellar/stellar-docs#480

Open

graydon transferred this issue from stellar/rs-soroban-env Sep 21, 2023

graydon assigned leighmcculloch and unassigned graydon Sep 21, 2023

anupsdf closed this as completed Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Epic) Define semantics (and testing strategy) when SDK version is revved #1095

(Epic) Define semantics (and testing strategy) when SDK version is revved #1095

MonsieurNicolas commented Aug 3, 2022 •

edited by leighmcculloch

Loading

Tasks

leighmcculloch commented Aug 8, 2022

MonsieurNicolas commented Aug 9, 2022

graydon commented Jun 22, 2023

leighmcculloch commented Jun 22, 2023

leighmcculloch commented Jun 22, 2023

MonsieurNicolas commented Aug 28, 2023

leighmcculloch commented Aug 28, 2023 •

edited

Loading

leighmcculloch commented Aug 28, 2023

leighmcculloch commented Aug 28, 2023

MonsieurNicolas commented Sep 5, 2023

leighmcculloch commented Sep 6, 2023 •

edited

Loading

leighmcculloch commented Sep 6, 2023

leighmcculloch commented Sep 6, 2023

leighmcculloch commented Sep 7, 2023 •

edited

Loading

leighmcculloch commented Sep 7, 2023

leighmcculloch commented Sep 7, 2023

leighmcculloch commented Nov 10, 2023

leighmcculloch commented Nov 11, 2023

anupsdf commented Apr 8, 2024

(Epic) Define semantics (and testing strategy) when SDK version is revved #1095

(Epic) Define semantics (and testing strategy) when SDK version is revved #1095

Comments

MonsieurNicolas commented Aug 3, 2022 • edited by leighmcculloch Loading

Tasks

leighmcculloch commented Aug 8, 2022

MonsieurNicolas commented Aug 9, 2022

graydon commented Jun 22, 2023

leighmcculloch commented Jun 22, 2023

leighmcculloch commented Jun 22, 2023

MonsieurNicolas commented Aug 28, 2023

leighmcculloch commented Aug 28, 2023 • edited Loading

leighmcculloch commented Aug 28, 2023

leighmcculloch commented Aug 28, 2023

MonsieurNicolas commented Sep 5, 2023

leighmcculloch commented Sep 6, 2023 • edited Loading

leighmcculloch commented Sep 6, 2023

leighmcculloch commented Sep 6, 2023

leighmcculloch commented Sep 7, 2023 • edited Loading

leighmcculloch commented Sep 7, 2023

leighmcculloch commented Sep 7, 2023

leighmcculloch commented Nov 10, 2023

leighmcculloch commented Nov 11, 2023

anupsdf commented Apr 8, 2024

MonsieurNicolas commented Aug 3, 2022 •

edited by leighmcculloch

Loading

leighmcculloch commented Aug 28, 2023 •

edited

Loading

leighmcculloch commented Sep 6, 2023 •

edited

Loading

leighmcculloch commented Sep 7, 2023 •

edited

Loading