Data structures for identities and verification.#85
Data structures for identities and verification.#85massimiliano-mantione merged 15 commits intomasterfrom
Conversation
FintanH
left a comment
There was a problem hiding this comment.
Looking good. The Entity struct makes more sense than the previous Entity trait for sure.
I have some questions and suggestions. Also, it seems like Clippy has some good suggestions for clean up when I loaded the branch into my vim :)
| result | ||
| } | ||
|
|
||
| pub fn set_name(mut self, name: String) -> Self { |
There was a problem hiding this comment.
I would say mut self returning Self is an anti-pattern.
I'm also sceptical of having pub on the fields and then having functions for setting and clearing them. If you want the user to be able to access them, then I'd get rid of the functions. If you don't want them to access them, then I would say remove the pubs on EntityData.
There was a problem hiding this comment.
This is meant to be used as a "builder":
User::default()
.set_something(...)
.set_something_else(...)
.build()?
So all the "setters" need to consume the data and return a "new one", of course they actually just move a mutated one.
Properties are public because this is just a temporary data structure with no invariant at all, it is meant to be used in the most convenient way when building a new entity.
There was a problem hiding this comment.
Ah, my bad! In my head, I was thinking of &mut instead of just mut. This makes sense now 👌 Although I wonder if it's more "idiomatic" to use a reference and mutate that. Calling resident Rust aficionado @cloudhead 👀
There was a problem hiding this comment.
I still think that having the fields public and having the setters doesn't make much sense. I'm thinking of it in the sense of how someone should use this API. Should they set the fields manually or use the functions? Pick one, so that they don't have to guess and there isn't a disagreement on which is the right way :)
There was a problem hiding this comment.
I don't have a strong opinion on this, but I suspect that both are needed here.
The "mutator" methods consume the input and produce a new one, so they can only be used in a "chained invocation" way, where the last invocation is the builder or anyway something that consumes the value.
On the other hand public properties allow a more traditional "make a value, do something with it, then so something else" coding style.
I don't know exactly how this API will be used so I provided both so that the API user can choose the style.
In general public properties would be enough (once you have them you can do whatever you want) but I see that the "chained mutations on a builder" is a common pattern and leads to more readable code so I provided that as well.
There was a problem hiding this comment.
Maybe it's at this point where I need to understand why this data structure exists in the first place :)
There was a problem hiding this comment.
Re: builders: both forms are okay, the only advantage of borrowing is that you can also use it this way:
let builder = Builder::new();
builder.set("foo", "bar");
builder.set("quk", "bar");
builder.build()Otherwise, you're forced to use a one-liner approach or repeat the let binding a bunch of times.
There was a problem hiding this comment.
Oh, I see... I am allowing that pattern by having public properties.
Without public properties I would need to borrow instead of move in all setters, and provide getters.
In this case, where all the invariants are checked at the end anyway, I feel that having public properties is more flexible and does not have downsides.
But I don't have strong opinions, if asked I can make setters borrow (take and retuen a &ref mut) and add getters.
The build method would still move and consume the builder instance (closing the build).
| } | ||
|
|
||
| pub fn compute_hash(&self) -> Result<Multihash, Error> { | ||
| Ok(Sha2_256::digest(&self.canonical_data()?)) |
There was a problem hiding this comment.
Is there any specific reason for using SHA2-256?
There was a problem hiding this comment.
Because it is unlikely to have collisions, and if I am not mistaken it is the next hash function that git will use (transitioning away from SHA1).
There was a problem hiding this comment.
But we don’t benefit in any way from git’s hashing, or do we? As SHA2 is already considered obsolete, I’d lean towards using a recent algorithm (either SHA3 or BLAKE3, which claims to be much faster). There’s also the consideration that hashing is likely to become a bottleneck for testing, so swapping for a cryptographically insecure but fast algorithm becomes desirable. As a follow-up, is this suggest to abstract over the hash algorithm, and introduce a “canonical” hashing module.
There was a problem hiding this comment.
+1, but I'd like to add the concern that it will likely be useful for the Registry to verify these documents at some point in the future, and there we are a lot more limited in terms of hashing functions (we have to compile them in the wasm blob), so using the same hashing function (Blake2b) would make things interop more easily.
There was a problem hiding this comment.
It's the default in polkadot/substrate (it might be possible to change though), besides also being a really good hashing function.
There was a problem hiding this comment.
it might be possible to change
Hm ok, good to know. Also dissapointed we can't use Blake2b++, being so much betterer
There was a problem hiding this comment.
I agree and I added one more checkbox in the #87 description to track this.
| Ok(key.sign(&self.canonical_data()?)) | ||
| } | ||
|
|
||
| pub async fn sign( |
There was a problem hiding this comment.
It's unclear what this method does, and why: we're passing a secret key, and a Signatory, which could be either our own or someone else's public key. Then we sign the entity data with the secret key, stating that the signer is whoever is represented by by.
There was a problem hiding this comment.
This method signs the entity.
The secret key is obviously needed, but then the code does the following:
- Get the public key from the secret key.
- Check that the entity has not been already signed using this same key.
- Check that this key is allowed to sign the entity and the
byargument is correct: either the key is owned byselfor it belongs to the given certifier. - If all checks passed, build the signature and add it to the signature set.
Leaving key revocation handling to another PR, does this logic make sense?
There was a problem hiding this comment.
I see. It’s the effects that threw me off here — maintaining the invariants that the identity document belongs to the private key (which is somewhat weak anyways), and that the doc is present in local storage could be factored out and expressed in the types.
| signature: &Signature, | ||
| resolver: &impl Resolver<User>, | ||
| ) -> Result<(), Error> { | ||
| self.check_key(key, by, resolver).await?; |
There was a problem hiding this comment.
iiuc, this checks that the most recent user identity has this public key (ie. it was not revoked). If I'm not mistaken, check_history will fail then if any signature valid in the past was revoked.
| revisions: R, | ||
| ) -> Result<(), HistoryVerificationError> | ||
| where | ||
| R: Stream<Item = Entity<T>> + Unpin, |
There was a problem hiding this comment.
Ok, so you're walking the history from the most recent revision, which simplifies some things. However, I'm missing where we'd verify that the hash of the initial version matches the identity hash.
I also think this shouldn't return unit -- we will need to be able to distinguish between (a) history is valid, (b) head requires your signature or head quorum is not reached, so reset to head^, (c) history failed verification at revision X, and (d) an error occurred.
There was a problem hiding this comment.
I added the parent hash verification (sorry, I just forgot it).
The HistoryVerificationError type encodes all the different error cases, and if I am not mistaken the only one not explicitly detected is distinguishing the two subcases of (b).
More generally we don't have a "protocol" for handling "draft" entities waiting to be signed.
They need to be moved from device to device (because the private keys needed to sign them should not leave the devices) but it is still not clear if this "movement" should be implemented by committing them to the entity history branch and pulling from other devices or "out of band" outside the git branch replication machinery.
In this code an entity with missing signatures is considered invalid (and the verification error states so).
There was a problem hiding this comment.
The errors are fine, but we will need to return something in order to implement whatever interaction flow will be used to prompt for a signature. This will also depend on who the verifier is, and we will need to inspect up to which revision (both native and in-doc) we verified. This suggests to return the entity itself, tagged by the quorum state.
There was a problem hiding this comment.
I added the "verification status", not verification methods update this status.
Missing signatures are not a "verification error" if you check a single revision: the entity has a status that states that it must be signed but it is otherwise OK.
However missing signatures become errors when checking the full history.
| info: T, | ||
| } | ||
|
|
||
| impl<T> Entity<T> |
There was a problem hiding this comment.
It's a bit unclear what the actual (crate-)public entry points are: the check_* methods seem more like private helpers, and could use some docs and/or more descriptive names. I also don't quite understand when we will deserialise the EntityData (and check the hash).
There was a problem hiding this comment.
Was this cleared up during our call? 🤔 I think the entry point is something I'm still confused on. Something I've found useful is writing the crate docs, for example in surf or tracker.
|
|
||
| /// Build and `Entity` from its data (the second step of deserialization) | ||
| /// It guarantees that the `hash` is correct | ||
| pub fn from_data(data: data::EntityData<T>) -> Result<Self, Error> { |
There was a problem hiding this comment.
Wouldn't this be better served as an impl of TryFrom?
|
|
||
| /// Convenience method to check if an entity is valid | ||
| pub async fn is_valid(&self, resolver: &impl Resolver<User>) -> bool { | ||
| self.check_validity(resolver).await.is_ok() |
There was a problem hiding this comment.
This seems superfluous if the caller can just invoke an is_ok.
FintanH
left a comment
There was a problem hiding this comment.
Some feedback around the documentation.
|
|
||
| /// A type expressing *who* is signing an `Entity`: | ||
| /// | ||
| /// * either a specific user (identified by their URN), |
There was a problem hiding this comment.
You can move these doc comments to the enum branches themselves. I'll add a suggestion below to demonstrate.
| /// - They can be signed, either with a key they own, or using a key belonging | ||
| /// to a different entity (the certifier); note that when applying multiple | ||
| /// signatures, signatures are not themselves signed (what is signed is always | ||
| /// only the entity itself). |
There was a problem hiding this comment.
Suggestion for wording this:
/// Note that we only sign the entity itself. We do not sign the signatures associated with the entity.
The current wording comes across as complex to me.
| /// - Each revision must be signed by all its owned keys and trusted certifiers. | ||
| /// - Each subsequent revision must be signed by a quorum of the previous keys | ||
| /// and certifiers, to prove that the entity evolution is actually under the | ||
| /// control of its current "owners" (the idea is taken from TUF). |
There was a problem hiding this comment.
| /// control of its current "owners" (the idea is taken from TUF). | |
| /// control of its current "owners" (the idea is taken from [TUF](https://theupdateframework.io/)). |
| /// FIXME: probably we should merge owned keys and certifiers when checking | ||
| /// the quorum rules (now we are handling them separately) | ||
| pub fn check_update(&self, previous: &Self) -> Result<(), UpdateVerificationError> { | ||
| if self.revision() <= previous.revision() { |
There was a problem hiding this comment.
You mention this in the comment, but I think it's pretty important that the updates are monotonic.
There was a problem hiding this comment.
Why do you think it is important? I understand Massi's point more as an API ergonomics concern.
The reason TUF implies that there be no holes in the sequence is so you can't specify an arbitrarily large one. Otoh, this may make stalemate situations more complicated to resolve -- our system can by (ab-)used for voting schemes, which is not a consideration for TUF.
We can start with adhering strictly to TUF (next revision must be exactly previous + 1), and loosen it to a (small) window later.
There was a problem hiding this comment.
Yeah a couple things come to mind, though maybe we already have solutions for them:
- Do we have a way to tell if we're missing an update, between eg. Rev 31 and Rev 12319?
- Are there overflow attacks we should look out for?
- Given two updates to the same parent state, can we tell that they are conflicting (and not sequenced), even though they may have different revisions?
I think it makes things easier to think about at the very least, if you know ahead of time what the revision number of the next update is going to be, given that updates can be made concurrently.
There was a problem hiding this comment.
Do we have a way to tell if we're missing an update, between eg. Rev 31 and Rev 12319?
I think in this case you should be able to tell since the hash doesn't match, ala Merkle trees and all that. But someone can correct me if I'm wrong here.
There was a problem hiding this comment.
Yes it's all hash-linked -- which is not the case in TUF. The only reason really we have this number is that it seemed unclear how one would reify the order in a patch-based system, but I now think that it would be through native means.
Keep in mind that the logic here will verify only one peer's view of a given identity document. In order to reach a "consensus" (e.g. when talking about what the master head is), the views of all maintainers have to agree. If they don't the revision number will make things more complicated: say there's a 50-50 split between two camps of maintainers at proposed rev 42, but the tiebreaker doesn't agree with neither of them, they can't just propose rev 43, because that would violate the gap-less condition. This is quite straightforward if that condition was lifted, or there would be no revision at all.
There was a problem hiding this comment.
Gotcha, yeah I didn't catch that it was hash linked.
There was a problem hiding this comment.
Having revision numbers with a strict +1 rule will make key revocation checks way simpler.
I would discuss this in the "key revocation handling" PR I will post as soon as this is merged (see #87).)
There was a problem hiding this comment.
Please consider removing the numeric revision altogether, for the reasons outlined above. Since we said we'll need a cache to make lookups O(1), we could just as well key by hash.
FintanH
left a comment
There was a problem hiding this comment.
Left some final feedback around the code.
I don't feel like I want to block this PR in any way, but I feel like the builder pattern seems harder to maintain in the long run. There's the big build-up phase of the data and then all the validation logic happening in once place. I'm more in the camp of Parse, don't validate. So if I was trying to ensure variants I would have types that can only be created through functions that check these invariants. Then these functions compose to build the final data. For example:
struct Revision(u64);
impl TryFrom<u64> for Revision {
type Error = RevisionMustBePositive
fn try_from(r: u64) -> Result<Revision, Self::Error> {
if r > 0 then Ok(Revision(r)) else Err(RevisionMustBePositive(r))
}
}This keeps the functions small, easy to test, and easy to swap in and out.
Apologies if this comes off as a rant, but I'd like to express how I think about these kinds of problems. Just wanted to explain what I was saying on the call earlier and see if it resonates with you or anyone else.
Thanks for the work ❤️ 🌱
| /// Build and `Entity` from its data (the second step of deserialization) | ||
| /// It guarantees that the `hash` is correct | ||
| pub fn from_data(data: data::EntityData<T>) -> Result<Self, Error> { | ||
| if data.name.is_none() { |
There was a problem hiding this comment.
Can the name be an empty string?
| use thiserror::Error; | ||
|
|
||
| #[derive(Debug, Error)] | ||
| pub enum Error { |
There was a problem hiding this comment.
I'd suggest moving the error types to their own module. It would minimise this module a bit more for exploration since it's quite large. But it's up to you.
| info: T, | ||
| } | ||
|
|
||
| impl<T> Entity<T> |
There was a problem hiding this comment.
Was this cleared up during our call? 🤔 I think the entry point is something I'm still confused on. Something I've found useful is writing the crate docs, for example in surf or tracker.
|
@FintanH I agree that most of the checks for well-formed could actually be pushed down to the parsing phase with no loss in preciseness of the error messages (which I think was the original motivation). There is, however, one validation left which requires the value to be complete: verifying the hash. So the truth is probably somewhere in between. |
| let mut cleaned = EntityData::<T>::default(); | ||
| cleaned.name = self.name.to_owned(); | ||
| cleaned.revision = self.revision.to_owned(); | ||
| cleaned.hash = self.hash.to_owned(); |
There was a problem hiding this comment.
Question: Will hash ever be Some(hash) when you're calling compute_hash?
Hrmmm, could you not just have a function from the well-formed types to the hash? This has the benefit that you never try hash malformed data as well. I think the con here is that the function is rigid, i.e. it needs to take all the parameters necessary. |
Just a note that this isn't so much about the builder pattern, as it is about early validation. You could totally have, for example: Builder::new().set_revision(Revision::parse(rev)?).set_user(User::parse(user)?)If you wanted the strong typing plus the builder pattern. |
|
@cloudhead: What functionality does the builder pattern provide then? As I understood it, the point was that you can create the I feel like I'm missing something 😅 |
|
@FintanH I didn't look deeply into this particular use-case, so I can only speak generally: it's useful when you have a lot of parameters, and/or many of those parameters are optional. The alternatives are:
Thing::new(a, b, Some(c), None, None)
Thing { a, b, c, d, ..Default::default() }
Thing::new(ThingParams { a, b, c, d })
|
18da1bd to
9bc38da
Compare
I have reimplemented User and Project as type instances of a generic
Entity<T>, where Entity supports TUF verification using URI resolution where needed.There is an
EntityDatastruct that has two main roles:This because Entity (and therefore User and Project) are essentially immutable.
They have the invariant that their hash must always be correct, therefore it is not possible to mutate them without either breaking the invariant or recomputing the hash.
The only thing that can be changed are signatures, which can be added with the
signfunction.This means that the invariants should be checked on deserialization (when passing from EntityData to Entity).
There is a
Resolverasync trait that is a placeholder for the resolution machinery, and revision histories are consumed as an async Stream during verification.There is a small open point about the use of
bs58for keys and hashes representation, it could be changed for another kind of encoding.