Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioned serialization schemes for types #3014

Closed
JaredCorduan opened this issue Aug 31, 2022 · 12 comments · Fixed by #3138
Closed

Versioned serialization schemes for types #3014

JaredCorduan opened this issue Aug 31, 2022 · 12 comments · Fixed by #3138
Assignees
Labels
💳 technical-debt Issues related to technical debt we introduced

Comments

@JaredCorduan
Copy link
Contributor

JaredCorduan commented Aug 31, 2022

The problem

When we encounter a problem with our deserializers, such as in #3003, #2965 and #2444, it can be very difficult to implement a fix. It is difficult because we can only fix such issues during a hard fork, and leading up to the hard fork we must maintain two serializations for the same type in order to not cause unintended network splitting (the problematic version must be used before the hard fork, and the fixed version is used afterwards). This can be especially tricky with the FromCBOR typeclass, since it is not always easy to search for where all the problematic uses are located.

A solution

Create a new typeclass to replace ToCBOR and FromCBOR which depends not only on the type but also on the protocol version. The first instances will all be implemented identically to our current ToCBOR and FromCBOR instances for all protocol versions up to the current one.

Note that this change that will effect the downstream components. We will need to coordinate with consensus and node, and many other teams.

@JaredCorduan JaredCorduan added the 💳 technical-debt Issues related to technical debt we introduced label Aug 31, 2022
@michaelpj
Copy link
Contributor

My $0.02.

We have a similar problem in plutus with our serialization for scripts. I think the considerations are the same although t they're slightly different libraries. What we do at the moment is we don't use the typeclass that the library provides, but rather we just use explicit decoder values. For CBOR this would be something like:

-- don't do this
instance Serialise Foo where
  decode :: Decoder s Foo

-- do this
decodeFoo :: ProtocolVersion -> Decoder s Foo

It's a bit clunky, but it works okay. You can also then take a mixed approach: if you need to decode something that isn't version-sensitive (an integer, say), you can just call decode and use the typeclass version.

I guess we could define a typeclass for this, but that does force you to pick an interface for all the decoders to share. And in fact I lied above: we don't just pass the protocol version, in some cases we also pass a predicate that indicates whether particular builtins are allowed. So we do make use of the variability. Something to consider, anyway.

@HeinrichApfelmus
Copy link
Contributor

I second @michaelpj 's suggestion — for a slightly different reason.

In cardano-wallet, we always have trouble finding the piece of code that is responsible for serializing/deserializing a specific type, because it's hard to figure out which instance of which type class is selected (this information is inferred from the code, but not written explicitly). So, usually we end up with "Hm, I think it's ok to use toCBOR here, but I'm not quite sure". The "not quite sure part" makes me feel uneasy — it's acceptable for the Show type, but not if I want to explicitly control the format.

Put differently: I think that specifying the format explicitly on the value-level as opposed to implicitly on the type-level would help with whatever issues we have. 🤔

@JaredCorduan
Copy link
Contributor Author

In cardano-wallet, we always have trouble finding the piece of code that is responsible for serializing/deserializing a specific type, because it's hard to figure out which instance of which type class is selected (this information is inferred from the code, but not written explicitly). So, usually we end up with "Hm, I think it's ok to use toCBOR here, but I'm not quite sure". The "not quite sure part" makes me feel uneasy — it's acceptable for the Show type, but not if I want to explicitly control the format.

The reason that you probably cannot always trust the inferred serialization is because when we do find bugs, we cannot easily fix the bug, and have to work around it at an upstream type. My hope for a versionedToCBOR would be that you could always trust it.

@HeinrichApfelmus
Copy link
Contributor

The reason that you probably cannot always trust the inferred serialization is because when we do find bugs, we cannot easily fix the bug, and have to work around it at an upstream type. My hope for a versionedToCBOR would be that you could always trust it.

Yes and no. The trouble is that I'm often not sure which type I call toCBOR on — it might not even be the type Tx that I'm interested in. In that sense, serializeTx :: Tx -> ByteString would give me some additional type safety / visibility.

@JaredCorduan
Copy link
Contributor Author

Yes and no. The trouble is that I'm often not sure which type I call toCBOR on — it might not even be the type Tx that I'm interested in. In that sense, serializeTx :: Tx -> ByteString would give me some additional type safety / visibility.

Our types are often too polymorphic for that. Here's a good example that I stole from @lehins :

data AlonzoTx era = AlonzoTx
  { body :: !(Core.TxBody era),
    wits :: !(Core.TxWits era),
    isValid :: !IsValid,
    auxiliaryData :: !(StrictMaybe (AuxiliaryData era))
  }

We need to rely on type classes (or something equivalent) in order to decode body and wits.

@lehins
Copy link
Collaborator

lehins commented Sep 13, 2022

@HeinrichApfelmus This a classic argument type classes vs monomorphic function. Some people say you should embrace them, some say they are evil. You can make the same argument for almost any prominent type class in Haskell community. Be it To/FromJSON, Vector and MVector type class, any serialization library (binary, serialize, persist, etc), mtl vs transformers, etc.

My argument is, type classes are strictly superior in most cases, because you can always specialize the type class function, but you can't make a specialized function more polymorphic.

The trouble is that I'm often not sure which type I call toCBOR on — it might not even be the type Tx that I'm interested in.

So, to your trouble I can say, restrict the type and you'll know what you are interested in:

  • toCBOR @Tx or toCBOR :: Tx -> Encoding

  • vs serializeTx :: Tx -> ByteString

@lehins
Copy link
Collaborator

lehins commented Sep 13, 2022

To expand on the example @JaredCorduan posted above, Core.TxBody era and Core.TxWits era are type families parameterized on era, which means if we were to go the monomorphic route we'd have to write a separate deserializer for AlonzoTx for every era that it is used in (i.e. Alonzo, Babbage, Conway and possibly later ones as well) and for every crypto the era is parameterized on. So all of the sudden instead of one function that does deserialization we end up with six separate, albeit very similar deserializing functions. We have many types like this in ledger codebase, so it will simply not work for ledger. The only reason why monomorphic deserializing functions work well in plutus codebase, is because types in Plutus aren't parametrized on type arguments.

@michaelpj and @HeinrichApfelmus Thank you for your suggestions though.

@michaelpj
Copy link
Contributor

I'm not saying that the deserialisers need to be monomorphic! In fact, the Plutus ones are highly polymorphic. Just that having them be functions rather than typeclass methods avoids tying you to an upstream interface, which can be helpful.

@lehins
Copy link
Collaborator

lehins commented Sep 13, 2022

@michaelpj Do you mind sharing a link to a few modules in a Plutus repo where there examples of highly polymorphic deserializers. I'd like see some concrete examples of what you mean.

@HeinrichApfelmus
Copy link
Contributor

So, to your trouble I can say, restrict the type and you'll know what you are interested in:

toCBOR @Tx or toCBOR :: Tx -> Encoding

Fair enough. 🤔 My trouble is about imposing some type constraint, whether that is done through @ or a name suffix Tx does not matter.

I guess that the specific function from the ledger that made me nervous is toCBORForMempoolSubmission — I don't know if that is equal to, or supposed to be equal to the toCBOR instance unless I look at the source code. The alternative is toCBORForSizeComputation, and I felt that I would be better off using one explicitly rather than hoping that the toCBOR implementation does the right thing for me.

@lehins
Copy link
Collaborator

lehins commented Sep 13, 2022

I guess that the specific function from the ledger that made me nervous is toCBORForMempoolSubmission — I don't know if that is equal to, or supposed to be equal to the toCBOR instance unless I look at the source code. The alternative is toCBORForSizeComputation, and I felt that I would be better off using one explicitly rather than hoping that the toCBOR implementation does the right thing for me.

I have no idea why toCBORForMempoolSubmission is even exported, it is an internal detail and toCBOR is always the right choice.

@michaelpj
Copy link
Contributor

Do you mind sharing a link to a few modules in a Plutus repo where there examples of highly polymorphic deserializers. I'd like see some concrete examples of what you mean.

Ah, I see the issue. We don't have any examples with type families, that does make things tricky because you need to pass around deserializers for the parts, either as explicit values or as typeclass dictionaries, and the typeclass dictionaries sure are more convenient.

In plutus we do indeed use typeclasses for values of the actual types that we are polymorphic over, but usually that's just a few bits deep down. Anyway, here's the file for reference: https://github.com/input-output-hk/plutus/blob/master/plutus-core/untyped-plutus-core/src/UntypedPlutusCore/Core/Instance/Flat.hs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💳 technical-debt Issues related to technical debt we introduced
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants