Data structures for identities and verification. by massimiliano-mantione · Pull Request #85 · radicle-dev/radicle-link

massimiliano-mantione · 2020-04-03T10:10:42Z

I have reimplemented User and Project as type instances of a generic Entity<T>, where Entity supports TUF verification using URI resolution where needed.

There is an EntityData struct that has two main roles:

a "builder" pattern intermediate object, and
an intermediate step for [de]serialization.

This because Entity (and therefore User and Project) are essentially immutable.
They have the invariant that their hash must always be correct, therefore it is not possible to mutate them without either breaking the invariant or recomputing the hash.

The only thing that can be changed are signatures, which can be added with the sign function.

This means that the invariants should be checked on deserialization (when passing from EntityData to Entity).

There is a Resolver async trait that is a placeholder for the resolution machinery, and revision histories are consumed as an async Stream during verification.

There is a small open point about the use of bs58 for keys and hashes representation, it could be changed for another kind of encoding.

FintanH

Looking good. The Entity struct makes more sense than the previous Entity trait for sure.

I have some questions and suggestions. Also, it seems like Clippy has some good suggestions for clean up when I loaded the branch into my vim :)

FintanH · 2020-04-03T10:47:57Z

+        result
+    }
+
+    pub fn set_name(mut self, name: String) -> Self {


I would say mut self returning Self is an anti-pattern.

I'm also sceptical of having pub on the fields and then having functions for setting and clearing them. If you want the user to be able to access them, then I'd get rid of the functions. If you don't want them to access them, then I would say remove the pubs on EntityData.

This is meant to be used as a "builder":

User::default() .set_something(...) .set_something_else(...) .build()?

So all the "setters" need to consume the data and return a "new one", of course they actually just move a mutated one.

Properties are public because this is just a temporary data structure with no invariant at all, it is meant to be used in the most convenient way when building a new entity.

Ah, my bad! In my head, I was thinking of &mut instead of just mut. This makes sense now 👌 Although I wonder if it's more "idiomatic" to use a reference and mutate that. Calling resident Rust aficionado @cloudhead 👀

I still think that having the fields public and having the setters doesn't make much sense. I'm thinking of it in the sense of how someone should use this API. Should they set the fields manually or use the functions? Pick one, so that they don't have to guess and there isn't a disagreement on which is the right way :)

I don't have a strong opinion on this, but I suspect that both are needed here.

The "mutator" methods consume the input and produce a new one, so they can only be used in a "chained invocation" way, where the last invocation is the builder or anyway something that consumes the value.

On the other hand public properties allow a more traditional "make a value, do something with it, then so something else" coding style.

I don't know exactly how this API will be used so I provided both so that the API user can choose the style.

In general public properties would be enough (once you have them you can do whatever you want) but I see that the "chained mutations on a builder" is a common pattern and leads to more readable code so I provided that as well.

Maybe it's at this point where I need to understand why this data structure exists in the first place :)

Re: builders: both forms are okay, the only advantage of borrowing is that you can also use it this way:

let builder = Builder::new(); builder.set("foo", "bar"); builder.set("quk", "bar"); builder.build()

Otherwise, you're forced to use a one-liner approach or repeat the let binding a bunch of times.

Oh, I see... I am allowing that pattern by having public properties.
Without public properties I would need to borrow instead of move in all setters, and provide getters.

In this case, where all the invariants are checked at the end anyway, I feel that having public properties is more flexible and does not have downsides.

But I don't have strong opinions, if asked I can make setters borrow (take and retuen a &ref mut) and add getters.
The build method would still move and consume the builder instance (closing the build).

kim

Overall promising!

kim · 2020-04-03T14:01:49Z

+    }
+
+    pub fn compute_hash(&self) -> Result<Multihash, Error> {
+        Ok(Sha2_256::digest(&self.canonical_data()?))


Is there any specific reason for using SHA2-256?

Because it is unlikely to have collisions, and if I am not mistaken it is the next hash function that git will use (transitioning away from SHA1).

But we don’t benefit in any way from git’s hashing, or do we? As SHA2 is already considered obsolete, I’d lean towards using a recent algorithm (either SHA3 or BLAKE3, which claims to be much faster). There’s also the consideration that hashing is likely to become a bottleneck for testing, so swapping for a cryptographically insecure but fast algorithm becomes desirable. As a follow-up, is this suggest to abstract over the hash algorithm, and introduce a “canonical” hashing module.

+1, but I'd like to add the concern that it will likely be useful for the Registry to verify these documents at some point in the future, and there we are a lot more limited in terms of hashing functions (we have to compile them in the wasm blob), so using the same hashing function (Blake2b) would make things interop more easily.

Blake2b

Why is this, actually?

It's the default in polkadot/substrate (it might be possible to change though), besides also being a really good hashing function.

it might be possible to change

Hm ok, good to know. Also dissapointed we can't use Blake2b++, being so much betterer

I agree and I added one more checkbox in the #87 description to track this.

kim · 2020-04-03T14:37:02Z

+        Ok(key.sign(&self.canonical_data()?))
+    }
+
+    pub async fn sign(


It's unclear what this method does, and why: we're passing a secret key, and a Signatory, which could be either our own or someone else's public key. Then we sign the entity data with the secret key, stating that the signer is whoever is represented by by.

This method signs the entity.
The secret key is obviously needed, but then the code does the following:

Get the public key from the secret key.

Check that the entity has not been already signed using this same key.

Check that this key is allowed to sign the entity and the by argument is correct: either the key is owned by self or it belongs to the given certifier.

If all checks passed, build the signature and add it to the signature set.

Leaving key revocation handling to another PR, does this logic make sense?

I see. It’s the effects that threw me off here — maintaining the invariants that the identity document belongs to the private key (which is somewhat weak anyways), and that the doc is present in local storage could be factored out and expressed in the types.

kim · 2020-04-03T15:06:05Z

+        signature: &Signature,
+        resolver: &impl Resolver<User>,
+    ) -> Result<(), Error> {
+        self.check_key(key, by, resolver).await?;


iiuc, this checks that the most recent user identity has this public key (ie. it was not revoked). If I'm not mistaken, check_history will fail then if any signature valid in the past was revoked.

kim · 2020-04-03T15:20:16Z

+        revisions: R,
+    ) -> Result<(), HistoryVerificationError>
+    where
+        R: Stream<Item = Entity<T>> + Unpin,


Ok, so you're walking the history from the most recent revision, which simplifies some things. However, I'm missing where we'd verify that the hash of the initial version matches the identity hash.

I also think this shouldn't return unit -- we will need to be able to distinguish between (a) history is valid, (b) head requires your signature or head quorum is not reached, so reset to head^, (c) history failed verification at revision X, and (d) an error occurred.

I added the parent hash verification (sorry, I just forgot it).

The HistoryVerificationError type encodes all the different error cases, and if I am not mistaken the only one not explicitly detected is distinguishing the two subcases of (b).

More generally we don't have a "protocol" for handling "draft" entities waiting to be signed.
They need to be moved from device to device (because the private keys needed to sign them should not leave the devices) but it is still not clear if this "movement" should be implemented by committing them to the entity history branch and pulling from other devices or "out of band" outside the git branch replication machinery.

In this code an entity with missing signatures is considered invalid (and the verification error states so).

The errors are fine, but we will need to return something in order to implement whatever interaction flow will be used to prompt for a signature. This will also depend on who the verifier is, and we will need to inspect up to which revision (both native and in-doc) we verified. This suggests to return the entity itself, tagged by the quorum state.

I added the "verification status", not verification methods update this status.

Missing signatures are not a "verification error" if you check a single revision: the entity has a status that states that it must be signed but it is otherwise OK.

However missing signatures become errors when checking the full history.

kim · 2020-04-03T15:25:44Z

+    info: T,
+}
+
+impl<T> Entity<T>


It's a bit unclear what the actual (crate-)public entry points are: the check_* methods seem more like private helpers, and could use some docs and/or more descriptive names. I also don't quite understand when we will deserialise the EntityData (and check the hash).

Was this cleared up during our call? 🤔 I think the entry point is something I'm still confused on. Something I've found useful is writing the crate docs, for example in surf or tracker.

xla

This looks very promising.

xla · 2020-04-06T09:12:17Z

+
+    /// Build and `Entity` from its data (the second step of deserialization)
+    /// It guarantees that the `hash` is correct
+    pub fn from_data(data: data::EntityData<T>) -> Result<Self, Error> {


Wouldn't this be better served as an impl of TryFrom?

xla · 2020-04-06T09:17:19Z

+
+    /// Convenience method to check if an entity is valid
+    pub async fn is_valid(&self, resolver: &impl Resolver<User>) -> bool {
+        self.check_validity(resolver).await.is_ok()


This seems superfluous if the caller can just invoke an is_ok.

FintanH

Some feedback around the documentation.

FintanH · 2020-04-06T09:27:02Z

+
+/// A type expressing *who* is signing an `Entity`:
+///
+///  * either a specific user (identified by their URN),


You can move these doc comments to the enum branches themselves. I'll add a suggestion below to demonstrate.

FintanH · 2020-04-06T09:34:21Z

+/// - They can be signed, either with a key they own, or using a key belonging
+///   to a different entity (the certifier); note that when applying multiple
+///   signatures, signatures are not themselves signed (what is signed is always
+///   only the entity itself).


Suggestion for wording this:

/// Note that we only sign the entity itself. We do not sign the signatures associated with the entity.

The current wording comes across as complex to me.

FintanH · 2020-04-06T09:35:41Z

+/// - Each revision must be signed by all its owned keys and trusted certifiers.
+/// - Each subsequent revision must be signed by a quorum of the previous keys
+///   and certifiers, to prove that the entity evolution is actually under the
+///   control of its current "owners" (the idea is taken from TUF).


Suggested change

/// control of its current "owners" (the idea is taken from TUF).

/// control of its current "owners" (the idea is taken from [TUF](https://theupdateframework.io/)).

cloudhead · 2020-04-06T15:39:55Z

+    /// FIXME: probably we should merge owned keys and certifiers when checking
+    /// the quorum rules (now we are handling them separately)
+    pub fn check_update(&self, previous: &Self) -> Result<(), UpdateVerificationError> {
+        if self.revision() <= previous.revision() {


You mention this in the comment, but I think it's pretty important that the updates are monotonic.

Why do you think it is important? I understand Massi's point more as an API ergonomics concern.

The reason TUF implies that there be no holes in the sequence is so you can't specify an arbitrarily large one. Otoh, this may make stalemate situations more complicated to resolve -- our system can by (ab-)used for voting schemes, which is not a consideration for TUF.

We can start with adhering strictly to TUF (next revision must be exactly previous + 1), and loosen it to a (small) window later.

Yeah a couple things come to mind, though maybe we already have solutions for them:

Do we have a way to tell if we're missing an update, between eg. Rev 31 and Rev 12319?

Are there overflow attacks we should look out for?

Given two updates to the same parent state, can we tell that they are conflicting (and not sequenced), even though they may have different revisions?

I think it makes things easier to think about at the very least, if you know ahead of time what the revision number of the next update is going to be, given that updates can be made concurrently.

Do we have a way to tell if we're missing an update, between eg. Rev 31 and Rev 12319?

I think in this case you should be able to tell since the hash doesn't match, ala Merkle trees and all that. But someone can correct me if I'm wrong here.

Yes it's all hash-linked -- which is not the case in TUF. The only reason really we have this number is that it seemed unclear how one would reify the order in a patch-based system, but I now think that it would be through native means.

Keep in mind that the logic here will verify only one peer's view of a given identity document. In order to reach a "consensus" (e.g. when talking about what the master head is), the views of all maintainers have to agree. If they don't the revision number will make things more complicated: say there's a 50-50 split between two camps of maintainers at proposed rev 42, but the tiebreaker doesn't agree with neither of them, they can't just propose rev 43, because that would violate the gap-less condition. This is quite straightforward if that condition was lifted, or there would be no revision at all.

Gotcha, yeah I didn't catch that it was hash linked.

Having revision numbers with a strict +1 rule will make key revocation checks way simpler.
I would discuss this in the "key revocation handling" PR I will post as soon as this is merged (see #87).)

Please consider removing the numeric revision altogether, for the reasons outlined above. Since we said we'll need a cache to make lookups O(1), we could just as well key by hash.

FintanH

Left some final feedback around the code.

I don't feel like I want to block this PR in any way, but I feel like the builder pattern seems harder to maintain in the long run. There's the big build-up phase of the data and then all the validation logic happening in once place. I'm more in the camp of Parse, don't validate. So if I was trying to ensure variants I would have types that can only be created through functions that check these invariants. Then these functions compose to build the final data. For example:

struct Revision(u64);

impl TryFrom<u64> for Revision {
  type Error = RevisionMustBePositive
  fn try_from(r: u64) -> Result<Revision, Self::Error> {
    if r > 0 then Ok(Revision(r)) else Err(RevisionMustBePositive(r))
  }
}

This keeps the functions small, easy to test, and easy to swap in and out.

Apologies if this comes off as a rant, but I'd like to express how I think about these kinds of problems. Just wanted to explain what I was saying on the call earlier and see if it resonates with you or anyone else.

Thanks for the work ❤️ 🌱

FintanH · 2020-04-06T19:55:22Z

+    /// Build and `Entity` from its data (the second step of deserialization)
+    /// It guarantees that the `hash` is correct
+    pub fn from_data(data: data::EntityData<T>) -> Result<Self, Error> {
+        if data.name.is_none() {


Can the name be an empty string?

FintanH · 2020-04-06T20:06:58Z

+use thiserror::Error;
+
+#[derive(Debug, Error)]
+pub enum Error {


I'd suggest moving the error types to their own module. It would minimise this module a bit more for exploration since it's quite large. But it's up to you.

FintanH · 2020-04-06T20:09:17Z

+    info: T,
+}
+
+impl<T> Entity<T>


Was this cleared up during our call? 🤔 I think the entry point is something I'm still confused on. Something I've found useful is writing the crate docs, for example in surf or tracker.

kim · 2020-04-07T06:32:54Z

@FintanH I agree that most of the checks for well-formed could actually be pushed down to the parsing phase with no loss in preciseness of the error messages (which I think was the original motivation). There is, however, one validation left which requires the value to be complete: verifying the hash. So the truth is probably somewhere in between.

FintanH · 2020-04-07T10:53:30Z

+        let mut cleaned = EntityData::<T>::default();
+        cleaned.name = self.name.to_owned();
+        cleaned.revision = self.revision.to_owned();
+        cleaned.hash = self.hash.to_owned();


Question: Will hash ever be Some(hash) when you're calling compute_hash?

FintanH · 2020-04-07T10:56:42Z

@FintanH I agree that most of the checks for well-formed could actually be pushed down to the parsing phase with no loss in preciseness of the error messages (which I think was the original motivation). There is, however, one validation left which requires the value to be complete: verifying the hash. So the truth is probably somewhere in between.

Hrmmm, could you not just have a function from the well-formed types to the hash? This has the benefit that you never try hash malformed data as well. I think the con here is that the function is rigid, i.e. it needs to take all the parameters necessary.

cloudhead · 2020-04-07T15:41:09Z

There's the big build-up phase of the data and then all the validation logic happening in once place. I'm more in the camp of Parse, don't validate.

Just a note that this isn't so much about the builder pattern, as it is about early validation. You could totally have, for example:

Builder::new().set_revision(Revision::parse(rev)?).set_user(User::parse(user)?)

If you wanted the strong typing plus the builder pattern.

FintanH · 2020-04-07T15:48:53Z

@cloudhead: What functionality does the builder pattern provide then? As I understood it, the point was that you can create the EntityData free-form and then calling Entity::from_data validates the data inside.

I feel like I'm missing something 😅

cloudhead · 2020-04-07T16:02:22Z

@FintanH I didn't look deeply into this particular use-case, so I can only speak generally: it's useful when you have a lot of parameters, and/or many of those parameters are optional.

The alternatives are:

A constructor function that takes lots of parameters, some of which are Option -- this is ugly and you have no clue what's going on:

Thing::new(a, b, Some(c), None, None)

Literal syntax, but this requires all the fields to be set, or have a Default instance, and doesn't allow any logic during the construction:

Thing { a, b, c, d, ..Default::default() }

A mix of the two, this requires an extra type:

Thing::new(ThingParams { a, b, c, d })

The builder pattern.

massimiliano-mantione requested a review from a team as a code owner April 3, 2020 10:10

massimiliano-mantione requested a review from a team April 3, 2020 10:10

FintanH suggested changes Apr 3, 2020

View reviewed changes

massimiliano-mantione mentioned this pull request Apr 3, 2020

WIP on entity data. #74

Closed

kim reviewed Apr 3, 2020

View reviewed changes

massimiliano-mantione mentioned this pull request Apr 4, 2020

Signatures and verifications #87

Closed

10 tasks

xla reviewed Apr 6, 2020

View reviewed changes

FintanH reviewed Apr 6, 2020

View reviewed changes

cloudhead reviewed Apr 6, 2020

View reviewed changes

Comment thread librad/src/id/entity.rs

cloudhead reviewed Apr 6, 2020

View reviewed changes

FintanH approved these changes Apr 6, 2020

View reviewed changes

FintanH reviewed Apr 7, 2020

View reviewed changes

massimiliano-mantione added 13 commits April 15, 2020 07:02

New entity data and verification code.

62973b7

Made resolver async.

cb3cd35

Made hystory walking async.

6edb2a8

Relaxed deps revision requirements.

d1f07d1

Properly display all errors.

2ff9263

Addressed most code review comments.

4010850

Addressed clippy issues.

c2185f8

Removed dead code.

ee29c97

Verify revision chain integrity.

589238a

Code gardening.

b8e67db

Added doc comments.

19aa213

Addressed most code review messages.

1e6bf64

Comment gardening.

7209a7a

Fixed rebase automerge.

9bc38da

massimiliano-mantione force-pushed the entity-verification-WIP branch from 18da1bd to 9bc38da Compare April 15, 2020 05:07

Made test less strict.

ba35785

massimiliano-mantione merged commit 577e1da into master Apr 15, 2020

massimiliano-mantione deleted the entity-verification-WIP branch April 15, 2020 10:10

	/// control of its current "owners" (the idea is taken from TUF).
	/// control of its current "owners" (the idea is taken from [TUF](https://theupdateframework.io/)).

Conversation

massimiliano-mantione commented Apr 3, 2020

Uh oh!

FintanH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

cloudhead Apr 7, 2020 •

edited

Loading