Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLT-7583 Validator optimizations #12

Merged
merged 14 commits into from Dec 22, 2023
Merged

PLT-7583 Validator optimizations #12

merged 14 commits into from Dec 22, 2023

Conversation

bwbush
Copy link
Contributor

@bwbush bwbush commented Dec 9, 2023

  1. Use PlutusTx.asData for State.
  2. Use PlutusTx.asData for Action.
  3. Faster Value.geq, but with fallback.
  4. Tracing and profiling, including flamegraphs.
  5. Severed dependency of validators on marlowe-cardano repository, but tests still rely on that (for consistency).

Note to reviewers

Here is an efficient way to compare the validator files where were renamed or relocated:

git clone git@github.com:input-output-hk/marlowe-plutus --single-branch \
  --branch PLT-7583 marlowe-plutus.new
git clone git@github.com:input-output-hk/marlowe-plutus --single-branch \
  --branch main marlowe-plutus.main
git clone git@github.com:input-output-hk/marlowe-cardano --single-branch \
  --branch PLT-8148-plutus1.15.0.0-feature marlowe-cardano.feature
vimdiff marlowe-plutus.new/marlowe-plutus/src/Language/Marlowe/Plutus/Semantics/Types/Address.hs \
        marlowe-cardano.feature/marlowe/src/Language/Marlowe/Core/V1/Semantics/Types/Address.hs 
vimdiff marlowe-plutus.new/marlowe-plutus/src/Language/Marlowe/Plutus/Semantics/Types.hs \
        marlowe-cardano.feature/marlowe/src/Language/Marlowe/Core/V1/Semantics/Types.hs 
vimdiff marlowe-plutus.new/marlowe-plutus/src/Language/Marlowe/Plutus/Semantics.hs \
        marlowe-cardano.feature/marlowe/src/Language/Marlowe/Core/V1/Semantics.hs 
vimdiff marlowe-plutus.new/marlowe-plutus/src/Language/Marlowe/Plutus/ScriptTypes.hs \
        marlowe-cardano.feature/marlowe/src/Language/Marlowe/Scripts/Types.hs 
vimdiff marlowe-plutus.new/marlowe-plutus/src/Language/Marlowe/Plutus/Script.hs \
        marlowe-plutus.main/marlowe-plutus/src/Language/Marlowe/Plutus/Semantics.hs 

@bwbush bwbush self-assigned this Dec 9, 2023
@bwbush bwbush requested review from palas, jhbertra and paluh and removed request for jhbertra December 9, 2023 16:33
@bwbush bwbush marked this pull request as ready for review December 9, 2023 17:16
Copy link
Contributor

@palas palas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. The diffs where very useful too.
@paluh and I reviewed them, and we have some comments. One he will post himself.
We are not sure whether it is possible to keep makeIsDataIndexed with the asData, but if it is possible it would be desirable. So that is why I am asking for changes.
In addition to that, I've noticed that ScriptTypes.hs corresponds to Script/Types.hs, not sure if it would be better to have the same naming in both to make it easier to compare.
In the same line, we could order things in marlowe-cardano in the same way than in marlowe-plutus. But that would be a different PR, ofc.

| Assert Observation Contract
deriving stock (Generic, Data)
deriving newtype (ToData, FromData, UnsafeFromData, Haskell.Eq, Haskell.Ord, Haskell.Show)
|]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find a makeIsDataIndexed for Contract type. Should there be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for Action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using asData makes makeIsDataIndexed redundant because the patterns include that information. I'll double-check the TH source code for makeIsDataIndexed, just to be 100% sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same line, we could order things in marlowe-cardano in the same way than in marlowe-plutus.

If you're referring to the ordering in the semantics files, we'd have to change those in marlowe-cardano because the order in marlowe-plutus is (unfortunately) constrained by the way GHC handles TH: i.e., identifiers don't resolve during compilation of the module unless the ordering declares them in a particular sequence.

I think we should reorder and clean up the marlowe module in marlowe-cardano, maybe when we upgrade to Conway near the start of PI5. Typeclass instances etc. are scattered about the file in a not very logical order. @jhbertra has some ideas for improvements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if it is not possible to be explicit in marlowe-plutus then it makes sense

Copy link

@paluh paluh Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, makeIsDataIndexed is replaced by asData, whose splice expands to explicitly numbering the constructors in their order of appearance. The deriving stock Data refers to the type Data.Data (Data) in the base library and doesn't particularly relate to Plutus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if it is not possible to be explicit in marlowe-plutus then it makes sense

I've added a comment documenting this in 392246d.

Copy link

@paluh paluh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwbush, we had an interesting discussion with @palas that seems pretty important and is related to the semantic changes after migration to asData. I'm not sure if we understand asData so please correct me if I'm wrong below.

Our understanding of asData semantics:

  • asData makes the decoding of the Contract lazy (without memoization) - we decode BuiltinData on demand when we pattern match on a value.

  • If we consider a decoding failure, it will happen lazily as well.

  • If the above is true then we can probably say that every value of type A which was previously represented as strict data type (so the set of values did not contain , right?) in the context of asData is turned into A ∪ { ⊥ }.

  • If the above is true then this is probably a problem from formalization point of view because we have to change the spec (semantics would work over Contract ∪ { ⊥ } instead of Contract).

  • What is also important is that we can show using a specific example that lazy validation which triggers exceptions can change semantics of the Marlowe program (example follows).

Change of the Marlowe Semantics - example:

  • Let's analyze specific example which causes violation of existing termination guarantees.

  • Let's compare evaluation of our specific example to the evaluation of the same but merkleized contract with strict data representation. Merkleization encodes laziness in a bit different manner but can probably be approximated as Case (Contract ∪ { ⊥ }) meaning if the hash is invalid (or lost) then decoding of a specific branch can "stale forever".

  • The difference with merkleization is that the timeout branch of When is always revealed in a merkleized contract. The timeout branch itself can contain a contract which may include a When contract with a timeout which has to be revealed as well, and so on. Therefore, the timeout branch must contain a fully decoded "timeout based" path to a Close.

  • In the case of asData used over Contract, there's a possibility that a malicious actor forges a datum on the chain and puts an invalid piece of data in the timeout branch which actually has to contain Contract value. Decoding of this timeout continuation will trigger runtime exception.

  • The difference really boils down to Contract ∪ { ⊥ } vs Case (Contract ∪ { ⊥ }) in this case.

  • Let's consider a hypothetical contract decoded lazily using asData:

    • When [ Case (Deposit ...) (When [] timeout INVALID_DATA)] timeout Close
    • If the user performs the Deposit the semantics won't enforce decoding of the INVALID_DATA in the same step (we don't have to pattern match on this case when timeout is not reached).
    • If that is true then we end up in a continuation which locks the funds.
  • If we consider the same but strictly decoded merkleized contract:

    • When [ Case (Deposit ...) hashOfTheContinuatation] timeout Close
    • When we are applying input we have to provide the next part of the contract corresponding to the hashOfTheContinuation fully decoded.
    • So in this case we actually have to decode contract When [] timeout INVALID_DATA together with INVALID_DATA.
    • So if the INVALID_DATA doesn't encode Contract and is part of the initial hash then the input application will not happen because we cannot provide a contract which is correctly decoded and is corresponding to the hash value.
    • So merkleization doesn't cause locking in this particular case because we it is impossible to perform the Deposit.

Discussion:

  • Is the above understanding of asData correct?

  • If that is true shouldn't migration to asData be considered as a breaking change and enforce changes in formalization of the semantics?

  • We can say that the above attack vector is unrealistic because datum decoding and inspection of the contract is a natural step which every user or DApp should perform before applying any input. So is this a really harmful change in the semantics? Should we allow this new semantics on the chain?

  • Should we perhaps apply asData only to the Case data type so it preserves similar timeout branch semantics to the merkleized contract?

  • If we do this (apply to the Case data type only), maybe we can just use data Contract = ... | When [Case BuiltinData] Timeout Contract | ... (which somewhat mimics merkleized case):

    • It requires full decoding of the timeout branch (all the way down to the Close)
    • It explicitly says which parts of the contract decoding can crash in the evaluation loop of the semantics at runtime (with asData we have the same problem but encoded implicitly I think).

CC: @jhbertra

@bwbush bwbush requested a review from jhbertra December 15, 2023 19:31
@bwbush
Copy link
Contributor Author

bwbush commented Dec 20, 2023

Let's consider a hypothetical contract decoded lazily using asData:

When [ Case (Deposit ...) (When [] timeout INVALID_DATA)] timeout Close
If the user performs the Deposit the semantics won't enforce decoding of the INVALID_DATA in the same step (we don't have > to pattern match on this case when timeout is not reached).
If that is true then we end up in a continuation which locks the funds.

The above might not be correct because the Marlowe-Cardano specification requires that that output contract from computeTransaction equals the output contract in the Datum and we use an instance Eq Contract that forces deserialization (i.e., it doesn't compare the serialized bytes but instead compares each field after deserializing them). I need to construct a corrupt datum to that INVALID_DATA would be detected in the first post-creation transaction.

@paluh
Copy link

paluh commented Dec 20, 2023

I started wondering - maybe asData generates a validation pass over BuiltInData (when performing fromBuiltinData) which validates the structure against the schema required by the type?

@bwbush
Copy link
Contributor Author

bwbush commented Dec 20, 2023

Here is evidence that corrupt data is only detected later in the contracts's lifecycle.

1. Removed `PlutusTx.asData` for `Contract`.
2. Added `PlutusTx.asData` for `Cases`, with cabal flag default `False`.
3. Added cabal flag for `PlutusTx.asData` for `Action`, with default `True`.
@bwbush bwbush requested review from paluh and ramsay-t and removed request for jhbertra December 21, 2023 18:28
@bwbush
Copy link
Contributor Author

bwbush commented Dec 21, 2023

In addition to that, I've noticed that ScriptTypes.hs corresponds to Script/Types.hs, not sure if it would be better to have the same naming in both to make it easier to compare.

Fixed in c0b42fc.

@bwbush
Copy link
Contributor Author

bwbush commented Dec 21, 2023

@palas, @paluh, @ramsay-t in 69decf6 I removed asData Contract, added asData Case that defaults to not being used (--flag=-asdata-case), and set asData Action that defaults to being used (--flag=+asdata-action). Thus, the spine on Contact is fully deserialized, but deserialization of Action is "lazy". This seems the best compromise.

Copy link

@ramsay-t ramsay-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me - we are retaining the option to turn on the faster asData stuff, and the risk seems no worse than with bad Merkelisation. We can revisit this in the New Year if we look at some sort of pre-checking for Marlowe contracts.

@paluh
Copy link

paluh commented Dec 22, 2023

Cool. Thanks!
P.S.
We have to discuss which version we want to expose as current next year and probably also plan how we will expose the "unaudited" one in the Runtime.

@bwbush bwbush requested review from paluh and removed request for paluh December 22, 2023 14:52
@shlevy shlevy merged commit d9c3093 into main Dec 22, 2023
4 checks passed
@bwbush
Copy link
Contributor Author

bwbush commented Dec 22, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants