-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Builtin Actor Debugging Story #592
Comments
This issue summarises the problem and the core of the solution well, but there are still many open questions. Let's flesh those out to build a mature solution.
Let's think through this more in detail. Can you propose some details on what the DX will look like from the Lotus perspective? i.e. can we think though how core devs would go about providing a remapped bundle? Config, command, env var? I think we need commands, see below.
This won't be enough. The FVM doesn't use the manifest to route calls to actors, it just uses it to resolve certain syscalls. This will likely need to be a A few more things:
|
(BTW -- I realise that this issue is in ref-fvm and most of my considerations have to do with client integration, but my point here is that we need to take the full picture into account to avoid building local solutions) |
This is really good to read, thanks @vyzo and @raulk. Some thoughts:
So, just to be a little more concrete for what this would look like in the FVM and in Lotus:
This is already a decent bit of work (I suspect complications will arise / it'll take a couple iterations to get right) for something that is fairly urgently needed for the M1 milestone -- it's not in our feature freeze, but we would all be comfortable going into v16 with this working, and it helps the testing work we have planned over the next 2-3 weeks. I would urge against doing anything more involved (unless strictly necessary) right now in the interest of protecting our timelines. I suspect we will have lots of scope for future improvement that we should capture, but not prioritize. |
This is a good conversation start, but I would highlight that it's only one part of the full story of debugging [builtin] actors – tracing/logging/printlining. It's good, we should do it, and well. But it's not enough. In addition to this, we need a good story for executing actor code in an IDE, stepping execution, inspecting memory etc. I think we could get a very good 80/20 for this in pure Rust, skipping the WASM compilation and all the complications of inspecting and sourcemapping the WASM code and environment. Obviously it would miss some things like gas and issues that result from the compilation process, but for what I would expect to be the majority of debugging efforts around bugs in the actual actor code, this would be a far more powerful tool. This dev/debug environment should be independent of any node implementation, with everything behind the FVM API under direct control of the developer/debugger. It's closely related to what I think we should have for an actor integration testing harness: a pure-Rust VM that integrates the lowest FVM syscall level. |
Good enough for nowThe basics: to debug I go into builtin actors, make a code change, generate a debug bundle with Under the hood this works as @vyzo and @arajasek describe above. One thing missing from the discussion here is integration that tracks debug gas. One major use of debug specs-actor code in the past has been testing out and measuring non-state-breaking but performance changes (which can be security critical i.e. network v4) against live network messages. Up until now we've collected timing information in these instances but the dual execution model makes this more challenging as the old lotus commands will time both runs and we won't have a comparison benchmark to determine time/gas diff without a lot of operational setup or annoying context switching. IMO some way to measure debug gas and debug execution time (to debug situations where time gas mismatches are showing up) is high priority. The perfect worldAs above you build actors changes but you can load them into lotus at runtime with Mapping code cids to debug code cids is identically handled in lotus and fvm for builtin and user actors. I am guessing that the future of the manifest is that it will grow to include user code ids (though maybe system actors are partitioned separately in chain state) and the fvm will do the exact same code cache thing for user actor code.
Strong agree that debug bundles should be a map of actor code-id to actor debug-code-id/debug code. When the fvm loads actor code id for the first time it checks for existence in debug set, if found marks this code as debug which then forces dual execution. For M1 this would mean you could theoretically only debug compile a subset of system actors instead of all 11. But probably we wouldn't add this to builtin-actors debug-build for simplicity. Post M2 you could have a single debug bundle with some system and some user actor debug code. As a side note in the perfect world we have a tool that adds / removes compiled wasm code into debug bundles. I think the idea that EVM contracts can leverage full WASM debugging assumes that there are debugging directives in EVM that cross compilers can pick up on and put into WASM. Seems reasonable but no idea if these are true.
I think sensible overwrite semantics for More thoughtsNo debug bundle releasesA debug bundle released and shipped with lotus doesn’t seem worth the effort. If I’m debugging actors I will probably want to write some code changes specific to my use to build into the bundle. I guess that even those just curious about reading more logs and using the debug defaults without changing code will be comfortable building a bundle from source and linking it to lotus with bundles.toml.
I don't think that this is a problem that needs to be solved since the bundle should only be changing on dev branches / next network version where pulling changes is less critical and more under debugger's control. If I'm missing a reason that this is a legitimate problem then releasing debug bundles would make more sense. Runtime loading
Strong agree. Note this is a strict improvement of the current workflow: modify specs-actors => go mod replace in lotus => rebuild => restart node. So while it's important in the long run it's no problem to live without it for a while.
If the lotus fvm object is created for every block (like the current lotus VM) then this is reasonable. If its created by DI when making a new node this means you have to bring your node up and down which is better than rebuilding but not ideal. Questions
@raulk this is confusing me because I don't know what side of the fvm you are on. Is this about debuggable wasm actor code or are you talking about making the native ref-fvm debuggable? Is there some interaction between the two such that the native ref-fvm debug endpoints can be triggered by debug compiled wasm?
@anorth doesn't this restrict debugging only to wasm code that was compiled from rust? I see the benefits in debugging builtin actors for sure but it seems like this forces our work to not apply to the many user contracts expected to be developed from solidity/EVM. And making this work well in general is high leverage. |
Yes, debugging in a debugger would be really nice to have, but it caters to different use cases and it is rather complex to do; it could also be outsourced with a grant. With regards to dual execution, let me summarize the result of the sync discussion yesterday:
Initially for M1 we want to support debug execution; this will unblock our debugging efforts, allowing us to quickly test changes and see debug output while the system is running on mainnet. The support will require the following (minimal) changes in FVM:
|
It's a good point, I was only addressing the issue as titled. However I think much of point still stands: for development-time debugging, involving Lotus or any other full node is a whole lotta complication that's not needed. So for debugging any WASM actor, we want a WASM execution environment under full and direct developer control. And again, this is a thing we'd want for FVM integration tests, which should not depend on a node. |
@anorth i am totally in agreement, but this is orthogonal. We still need to be able to debug actors as running on mainnet, it is a different part of the journey. |
The Debug Problem
It has become quite apparent, especially as we are getting closer to the nv16 upgrade, that we need a mechanism to debug builtin actors.
This goes beyond enabling the
actor_debugging
context option (which enables the debug syscall), as debug code has to be compiled in and will diverge from the mainline code, at the very least in gas (eg debug prints, assertions, possibly different code paths for testing and so on).At the root of the problem we have two issues:
Debugging Strategy
Here we propose a strategy for resolving the issue in a way that supports our debugging needs, without forking from mainnet.
The strategy is two fold:
Dual Code Execution
More specifically, we can provide a debug manifest at fvm instantiation.
Upon message execution time, we concurrently execute the mainnet code (for the result and gas) and the debug code, redirecting mainnet code to the debug code in the debug execution, which has
actor_debugging
enabled. The result of message execution will be the result of the mainnet code execution. The debug code will be executed for side effects, which can be observed through stdout/stderr or by collecting and side-returning the debug result/trace.Debug Bundles
At the very least we need to enable tracing/debug for logs, through a build feature.
The same feature test can also be used to enable debug logic (assertions, experimental features and so on).
The bundle can either be built by the developer for testing, or be part of the release workflow matrix which would build a
builtin-actors-mainnet-debug.car
bundle for consumption by lotus.Lotus Integration
This is the easiest part, as we could simply load the debug bundle in the blockstore and pass the manifest CID to the fvm when enabling debug. This is probably best done by using an environment variable.
The text was updated successfully, but these errors were encountered: