Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(wasm): split-up build #3289

Merged
merged 48 commits into from
Nov 17, 2023
Merged

feat(wasm): split-up build #3289

merged 48 commits into from
Nov 17, 2023

Conversation

TalDerei
Copy link
Collaborator

@TalDerei TalDerei commented Nov 7, 2023

References #3236 and #3274 and #3238. The build method has been refactored to enable separate web-workers to operate on sub-components of build process in parallel in the web context. The appropriate methods will be exposed to wasm using the wasm-bindgen attribute in the wasm crate.

The PR includes a preliminary wasm unit testing suite using wasm-bindgen-test that mocks indexDB calls using an interactive browser. This simplified integration testing rather than having to compile and test locally in my webgpu repo every time a change was made. The command to run the test suite is wasm-pack test --chrome -- --test test_build --target wasm32-unknown-unknown --release.

@TalDerei TalDerei requested review from hdevalence and removed request for hdevalence November 7, 2023 18:35
@TalDerei
Copy link
Collaborator Author

TalDerei commented Nov 8, 2023

The parallel build process is split into two phases:

  • Phase 1: Spawn a distinct web-worker for each ActionPlan, and generate an Action by invoking the action_builder wasm function. The action_builder is generic on the ActionPlan. Under the hood, it uses pattern matching to distinguish between different ActionPlan variants, and invokes Build(BuildPlan) to construct the proper Action. Each web worker will be supplied with an instance of the private witness. The result of the first phase will a list of Actions [Action].

  • Phase 2: A top-level build method, build_parallel, will be exposed for constructing the transaction. Among other fields, build_parallel will require the TransactionPlan and [Action] and AuthorizationData. Internally, it will call the build_tx_parallel and authorize methods for constructing the Transaction and blinding signature.

Screenshot 2023-11-08 at 1 05 50 PM

I generally noticed as I expose more of the lower-level methods in the build process to WASM, some of the internal structs are not serializable since they're exported from external libraries (arkworks) that don't implement the serialize / deserialize traits (for instance the synthetic_blinding_factor field in UnauthTransaction: https://github.com/penumbra-zone/penumbra/blob/main/crates/core/transaction/src/plan/build.rs#L434). This construction was designed around this limitation. In the future, we shouldn't have to define extra data formats unless we're specifically exposing features from an external library like Arkworks.

@hdevalence
Copy link
Member

hdevalence commented Nov 8, 2023

@TalDerei I'm not totally sure I understand the phase 1 / phase 2 distinction. Are these phases of implementation, or different phases of using the same API? If we want to expose parallelism to the web context, the unit of parallelism has to be tasks assigned to different web workers, right? I'm also not sure about adding a new BuildPlan variant to the ActionPlan, I think building is something we want to be able to do to any ActionPlan, rather than a type of ActionPlan itself.

It would be good to see if we could reduce some of the code duplication we currently have between the existing parallel and non-parallel Rust methods, as a side effect of doing this wasm work.

What about something like the following:

  • Add an ActionPlan::build_unauth(&self, witness_data: &WitnessData) -> Action method to the ActionPlan enum. Internally, this method would match on the ActionPlan variant and call the action-specific build methods inside. Because it doesn't have access to the AuthorizationData, it should fill any required authorization data in with dummy values (like an all 0s array).
  • Add a TransactionPlan::build_unauth_with_actions(&self, witness_data: &WitnessData, actions: Vec<Action>) -> Transaction method to the TransactionPlan. This would work like the existing TransactionPlan::build method, but it would slot in the provided prebuilt actions instead of using the ActionPlans in the TransactionPlan. Because it doesn't have access to the AuthorizationData, it would need to fill in the binding signature with dummy values (like an all 0s array).
  • Add a TransactionPlan::authorize(&self, transaction: &Transaction, auth_data: &AuthorizationData) -> Transaction method that would use the provided auth_data to overwrite all the dummy values in all of the transaction's actions, then use the blinding factors in the transaction plan (&self) to derive the synthetic blinding factor needed to compute the binding signature.

The use of dummy values is unappealing, but the big upside of this approach is that we can use it in the Typescript/WASM context without having to add any additional serialization formats, and I think we can also clean up the existing Rust code:

  • We can replace the existing TransactionPlan::build with an all-in-one method TransactionPlan::build(&self, witness_data: &WitnessData, auth_data: &AuthorizationData) -> Transaction that (1) calls ActionPlan::build_unauth on each action, (2) passes those to TransactionPlan::build_unauth_with_actions, (3) slots in the auth data with TransactionPlan::authorize, and then returns the completed transaction.

  • We can replace the existing TransactionPlan::build_concurrent with an all-in-one method TransactionPlan::build_concurrent(&self, witness_data: &WitnessData, auth_data: &AuthorizationData) -> Transaction that is exactly like TransactionPlan::build but it wraps each call to ActionPlan::build_unath in a tokio::spawn. This gives almost perfect code reuse between build and build_concurrent.

  • The web code can farm out the calls to ActionPlan::build_unauth to individual web workers, then do the build_unauth_with_actions and authorize steps once it gets the AuthorizationData. This means we can start the expensive part (proving) before getting the auth signatures, but still only have one step once we get the AuthorizationData.

  • We can eliminate the UnauthTransaction type, I'm not sure any of the Rust code is using the delayed authorization feature, and the reason it exists is for this use case, which this new design would solve better.

@TalDerei
Copy link
Collaborator Author

TalDerei commented Nov 8, 2023

I should have clarified. phases 1 / 2 are different phases of the same API. I think your suggestions for refactoring this way make a lot of sense, and seem to be more idiomatic while reducing code duplication at the same time!

The only thing I'm wondering is why ActionPlan::build_unauth needs to do anything with the AuthorizationData dummy values in the first place? It would only need to perform this work.

@hdevalence
Copy link
Member

Well, suppose ActionPlan::build_unauth is called on an ActionPlan::SpendPlan. This has to return an Action::Spend with a fully constructed Spend inside, but that Spend has an auth_sig field. That field needs to be set to something, and the AuthorizationData is not available, so we can set it to a placeholder value instead.

@TalDerei
Copy link
Collaborator Author

TalDerei commented Nov 9, 2023

I see now, the auth_sig placeholder value was set to resemble the code screaming AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.

@TalDerei
Copy link
Collaborator Author

TalDerei commented Nov 13, 2023

@hdevalence Currently, the memo field isn't being handled properly. There's a divergence in the wrappedMemoKey field in an Action between the serial and concurrent build methods.

Update: performing mem::take inside WasmPlanner returns the default planner state. Consequently, I reverted to passing in the entire TransactionPlan to build_action, which internally calls build_unauth after performing some necessary deserialization work. This seems like an overkill since we only need the memo_plan field. But, we can't access the memo field from self since it now contains the default memo. The memo field is now being handled properly, but it's ugly.

I'm trying to think through the best way to handle the memo field. There's also code duplication happening between the build_action, build_unauth_with_actions and build methods with respect to the memo field that need to be addressed.

@TalDerei
Copy link
Collaborator Author

TalDerei commented Nov 14, 2023

@hdevalence breaking API changes:

penumbra-wasm package

  • new exposed build_action WASM method in wasm_planner.rs
  • new exposed build_parallel WASM method in tx.rs
  • modified the internals of the serial WASM build method in tx.rs to reduce code duplication

penumbra-transaction package

Our all-in-one build method internally calls build_unauth_with_actions and authorize_with_auth per your suggestions. The conflict with this approach is that existing calls to the build method in main separate the build and authorize calls since authorize takes an extra field for randomness. I suggest we strip out the authorize_with_auth from the build method, and call the authorize_with_auth method separately to handle this? The downside to doing this is we'll need to clone the TransactionPlan between the build and authorize_with_auth calls. Unless they can take a reference to self rather than taking ownership?

Testing

I've manually tested the equivalence of the refactored serial and parallel build methods against the original build method. They yield the same transaction payload and proofs, except with different blinding signatures as expected.

Additionally, I'm currently running the existing unit / integration tests to check for failures. What's the idiomatic approach to testing correctness here?

TODOs

  • The methods currently support Spend and Output actions. Add support for other actions like Swap, SwapClaim, etc.

Relevant

According to one of the comments in effect_hash: "TransactionPlan::build builds the actions sorted by type, so hash the actions in the order they'll appear in the final transaction." The web-workers will need to implement some ordering mechanism to place the actions in the correct order.

Fix

CI pipeline state is currently failing.

@hdevalence
Copy link
Member

According to one of the comments in effect_hash: "TransactionPlan::build builds the actions sorted by type, so hash the actions in the order they'll appear in the final transaction." The web-workers will need to implement some ordering mechanism to place the actions in the correct order.

This is wrong (or should be wrong, we should check that the implementation actually doesn't do this). At one point in the past we attempted to do this, and then changed course.

The EffectHash should be computed on the actions in the order they are in the plan. Otherwise (as we experienced), there are all kinds of possibilities for subtle mismatches of ordering. Instead, choosing an ordering of the actions is the responsibility of the planner; once the TransactionPlan is generated, it should be used as-is.

@hdevalence
Copy link
Member

Additionally, I'm currently running the existing unit / integration tests to check for failures. What's the idiomatic approach to testing correctness here?

It should be sufficient to run the Rust tests (either with cargo test or cargo nextest) and to push a PR and see that the smoke test (running pcli against pd) passes.

@hdevalence
Copy link
Member

Our all-in-one build method internally calls build_unauth_with_actions and authorize_with_auth per your suggestions. The conflict with this approach is that existing calls to the build method in main separate the build and authorize calls since authorize takes an extra field for randomness. I suggest we strip out the authorize_with_auth from the build method, and call the authorize_with_auth method separately to handle this? The downside to doing this is we'll need to clone the TransactionPlan between the build and authorize_with_auth calls. Unless they can take a reference to self rather than taking ownership?

This is a good catch. I think, though, that we should keep the existing API, and figure out a way to make it work.

At a high level, the TransactionPlan is supposed to be a fully deterministic description of the transaction to be built. So why do we need new randomness at all? Especially, why do we need randomness after the authorization signatures are already computed? Why does build_dao_transaction have to jump through hoops to add fake randomness?

Looking through the existing code, where is the randomness actually used? It's only used in the computation of the binding signature:

        // Compute the binding signature and assemble the transaction.
        let binding_signing_key = rdsa::SigningKey::from(synthetic_blinding_factor);
        let auth_hash = transaction.transaction_body.auth_hash();

        let binding_sig = binding_signing_key.sign(rng, auth_hash.as_bytes());
        tracing::debug!(bvk = ?rdsa::VerificationKey::from(&binding_signing_key), ?auth_hash);

Why does the decaf377-rdsa sign take an RNG? decaf377-rdsa is an EdDSA variant, so it should support deterministic signing. If we changed the decaf377-rdsa API to support deterministic signing, we wouldn't need any RNG at all, and then we wouldn't have this problem.

self,
fvk: &FullViewingKey,
mut actions: Vec<Action>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't actions be immutable, since it's preconstructed data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I'd call the actions semi preconstructed data. If we think about this from the perspective of a web-worker, they each call build_action which internally matches on the Action type. build_action should only be called for actions that are computationally intensive to compute. It therefore only precomputes the output, spend, swap, swap-claim, and delegator-vote actions.

If we don't mark it as mutable, how should we handle pushing other actions to the transaction body?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly shouldn't handle that, we should require that all of the actions are built and passed in, rather than trying to manage things case by case.

As a first-pass implementation strategy, I think it would be a good idea to unconditionally spawn a new web worker for each ActionPlan, but even if we did something more sophisticated, it wouldn't change this interface.

In general we should try to avoid action-by-action special case handling and instead handle ActionPlans and Actions uniformly, as with some of the other PR suggestions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web-workers are heavy instances (rather than simply being lightweight threads) since they replicate the underlying VM state. Spawning unnecessary web-workers would degrade performance since it increases the message passing overhead.

Copy link
Collaborator Author

@TalDerei TalDerei Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also many variants for an ActionPlan, which means we'd need to spawn more threads than the number of physical cores on the device.

Copy link
Collaborator Author

@TalDerei TalDerei Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented your suggestion as a first-pass strategy. The pre-built actions are now immutable by default. I'll see how the performance scales in terms of execution time and latency in the webgpu repository and report back.

Not to bikeshed, but this will increase the size / complexity of the exposed build_action wasm function. I don't think this would have any affect, if anything it would be minor.

@hdevalence
Copy link
Member

I'm worried about this work getting tripped up on other changes -- like I mentioned on Discord, I didn't realize that the change to remove the custom ordering logic from the TransactionPlan's EffectHash implementation didn't happen over the summer, so the plan I suggested was more complex than I'd intended, and now this PR has a larger scope than we would have liked.

For that reason, I tried carrying it further, filling in some missing match arms and cleaning up some of the code.

@hdevalence hdevalence marked this pull request as ready for review November 17, 2023 07:01
Copy link
Member

@hdevalence hdevalence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge this on green CI unless there are other changes you'd like to get in.

We should squash the commits when merging, the history is a mess because of the rebasing.

We can fix any straggler bits in follow-up PRs.

Comment on lines +169 to +171
// TODO: the first may no longer be a spend because of ordering changes.
// Let's not try to fix this at the moment. Later we can put a "canonical ordering" into the planner.
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As follow-up, we should add a TransactionPlan::sort_actions(&mut self) and change the Planner to call it just before returning the final TransactionPlan, then re-enable this test (which implicitly depended on action ordering)

@TalDerei TalDerei requested a review from grod220 November 17, 2023 09:32
@TalDerei
Copy link
Collaborator Author

We still need to make the relevant updates to the docs, for instance updating transaction_signing.md to reflect the removal of the UnauthTransaction type.

@@ -10,6 +10,7 @@ use web_sys::DomException;
use penumbra_tct::error::{InsertBlockError, InsertEpochError, InsertError};

pub type WasmResult<T> = Result<T, WasmError>;
pub type WasmOption<T> = Option<T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused, can remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines +6 to +13
pub mod note_record;
pub mod planner;
pub mod storage;
pub mod swap_record;
pub mod tx;
pub mod utils;
pub mod view_server;
pub mod wasm_planner;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to expose these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make the modules visible to wasm_bindgen_test, otherwise the test suite will fail since the modules are private.

web-sys = { version = "0.3.64", features = ["console"] }
serde_json = "1.0.107"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but can move this to dev-dependencies as it's only in tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to dev-dependencies

.key_path(Some(&IdbKeyPath::new(note_key)))
.to_owned();
let note_object_store = evt.db().create_object_store_with_params(
"SPENDABLE_NOTES",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use constants.tables for table names

let db_req: OpenDbRequest = IdbDatabase::open_u32(&constants.name, constants.version)?;
let mut db_req: OpenDbRequest = IdbDatabase::open_u32(&constants.name, constants.version)?;

// Conditionally create object stores in the `IdbDatabase` database for testing purposes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit odd being only for testing purposes, but existing in the production code. Can we move this to our test suite?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to migrate this to the test suite has been extremely unsuccessful over the past couple of days. It boils down to 1. our decision to use IdbDatabase, a wrapper for the IndexDB database, and 2. the tight coupling of the IndexedDBStorage instance when we initialize the ViewServer / WasmPlanner. Triggering the creation of an object store in the database requires a an onupgradeneeded event, which requires a connection handle to the database instance. Unfortunately, the handle is consumed in the creation of the database. Anyways, we can't create the database in the test suite directly because any changes made to it won't affect the database instance specific to the ViewServer / WasmPlanner.

Dependency injection via a constructor or setter would resolve this, but It doesn't seem to work in this environment since Serialize is not implemented for indexed_db_futures::IdbDatabase.

For instance, something like

pub async fn set_storage(&mut self, storage: JsValue) -> WasmResult<()>{
      let storage = serde_wasm_bindgen::from_value(storage)?;
      self.storage = Some(storage)
  }

would require somehow converting a JsValue into IndexedDBStorage or vice-versa.

TLDR; the conditional inside the storage module is the simplest way to mock IndexDB calls and bypass the aforementioned limitations in this environment. I think we can keep it for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed that is tricky. But if the db instance is expected, we couldn't create the object store? Like this in the test:

let database: *const IdbDatabase = storage_ref.get_database();
database.create_object_store(...);

However, if this is not possible, think it would be worth extracting this logic into a separate method (create_tables_for_test()) so the ::new() method isn't pull of this business logic.

Copy link
Collaborator Author

@TalDerei TalDerei Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work since it'll complain that the creation of an object store requires an upgradeneeded event.

memo_key = Some(memo_plan.key);
}

//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a left over comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines +38 to +39
// Limit the use of Penumbra Rust libraries since we're mocking JS calls
// that are based on constructing objects according to protobuf definitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we import the structs instead of re-defining them here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd need to mark the fields on the IndexedDbConstants and Tables structs as pub, since the test suite can't access the private fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be a worthwhile tradeoff instead of re-writing the structs. What do you think?

Copy link
Collaborator Author

@TalDerei TalDerei Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdevalence are there any issues with making these fields pub?


// Convert note to `SpendableNoteRecord`.
let spendable_note: SpendableNoteRecord =
serde_json::from_str(spendable_note_json).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a way we can use serde_wasm_bindgen with JsValue here instead of serde_json 🤔 . Maybe it doesn't matter though.

/// auth_data: `pb::AuthorizationData`
/// Returns: `pb::Transaction`
#[wasm_bindgen]
pub fn build_parallel(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really tell, but is this where web workers will be spawned later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web-workers will call the build_action method in the wasm planner, and then the prebuilt actions will be slotted into build_parallel where the final transaction can be assembled.

.unwrap();
console_log!("Serial transaction is: {:?}", serial_transaction);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any assertions in the test, should we add some?

Copy link
Collaborator Author

@TalDerei TalDerei Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions here would be tricky because we're essentially comparing if a serial transaction and parallel transaction match. The only difference in their transaction payload should be the binding signature since it requires randomness. I've been instead manually inspecting if they match in the dev-console, but some kind of assertion would be nice.

@hdevalence
Copy link
Member

Sounds good. I think it would be good to make some issues to track follow up work, but to merge this in the meantime for the avoidance of other possible conflicts.

@hdevalence hdevalence merged commit 2347ecb into main Nov 17, 2023
6 checks passed
@hdevalence hdevalence deleted the tx-decompose branch November 17, 2023 16:16
conorsch added a commit to penumbra-zone/osiris that referenced this pull request Nov 30, 2023
Corresponds to changes in [0], which split up the `build_concurrent`
logic.

[0] penumbra-zone/penumbra#3289
conorsch added a commit to penumbra-zone/galileo that referenced this pull request Dec 11, 2023
Corresponds to changes in [0], which split up the `build_concurrent`
logic.

[0] penumbra-zone/penumbra#3289
conorsch added a commit to penumbra-zone/galileo that referenced this pull request Dec 11, 2023
Corresponds to changes in [0], which split up the `build_concurrent`
logic.

[0] penumbra-zone/penumbra#3289
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants