Skip to content
This repository was archived by the owner on Apr 16, 2025. It is now read-only.

Data structures for identities and verification.#85

Merged
massimiliano-mantione merged 15 commits intomasterfrom
entity-verification-WIP
Apr 15, 2020
Merged

Data structures for identities and verification.#85
massimiliano-mantione merged 15 commits intomasterfrom
entity-verification-WIP

Conversation

@massimiliano-mantione
Copy link
Copy Markdown
Contributor

I have reimplemented User and Project as type instances of a generic Entity<T>, where Entity supports TUF verification using URI resolution where needed.

There is an EntityData struct that has two main roles:

  • a "builder" pattern intermediate object, and
  • an intermediate step for [de]serialization.

This because Entity (and therefore User and Project) are essentially immutable.
They have the invariant that their hash must always be correct, therefore it is not possible to mutate them without either breaking the invariant or recomputing the hash.

The only thing that can be changed are signatures, which can be added with the sign function.

This means that the invariants should be checked on deserialization (when passing from EntityData to Entity).

There is a Resolver async trait that is a placeholder for the resolution machinery, and revision histories are consumed as an async Stream during verification.

There is a small open point about the use of bs58 for keys and hashes representation, it could be changed for another kind of encoding.

@massimiliano-mantione massimiliano-mantione requested a review from a team as a code owner April 3, 2020 10:10
@massimiliano-mantione massimiliano-mantione requested a review from a team April 3, 2020 10:10
Copy link
Copy Markdown
Contributor

@FintanH FintanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. The Entity struct makes more sense than the previous Entity trait for sure.

I have some questions and suggestions. Also, it seems like Clippy has some good suggestions for clean up when I loaded the branch into my vim :)

Comment thread librad/src/keys/device.rs Outdated
Comment thread librad/src/id/entity/data.rs
Comment thread librad/src/id/entity/data.rs
Comment thread librad/src/id/entity/data.rs
Comment thread librad/src/id/entity/data.rs Outdated
result
}

pub fn set_name(mut self, name: String) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say mut self returning Self is an anti-pattern.

I'm also sceptical of having pub on the fields and then having functions for setting and clearing them. If you want the user to be able to access them, then I'd get rid of the functions. If you don't want them to access them, then I would say remove the pubs on EntityData.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to be used as a "builder":

User::default()
  .set_something(...)
  .set_something_else(...)
  .build()?

So all the "setters" need to consume the data and return a "new one", of course they actually just move a mutated one.

Properties are public because this is just a temporary data structure with no invariant at all, it is meant to be used in the most convenient way when building a new entity.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad! In my head, I was thinking of &mut instead of just mut. This makes sense now 👌 Although I wonder if it's more "idiomatic" to use a reference and mutate that. Calling resident Rust aficionado @cloudhead 👀

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that having the fields public and having the setters doesn't make much sense. I'm thinking of it in the sense of how someone should use this API. Should they set the fields manually or use the functions? Pick one, so that they don't have to guess and there isn't a disagreement on which is the right way :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this, but I suspect that both are needed here.

The "mutator" methods consume the input and produce a new one, so they can only be used in a "chained invocation" way, where the last invocation is the builder or anyway something that consumes the value.

On the other hand public properties allow a more traditional "make a value, do something with it, then so something else" coding style.

I don't know exactly how this API will be used so I provided both so that the API user can choose the style.

In general public properties would be enough (once you have them you can do whatever you want) but I see that the "chained mutations on a builder" is a common pattern and leads to more readable code so I provided that as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's at this point where I need to understand why this data structure exists in the first place :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: builders: both forms are okay, the only advantage of borrowing is that you can also use it this way:

let builder = Builder::new();
builder.set("foo", "bar");
builder.set("quk", "bar");
builder.build()

Otherwise, you're forced to use a one-liner approach or repeat the let binding a bunch of times.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see... I am allowing that pattern by having public properties.
Without public properties I would need to borrow instead of move in all setters, and provide getters.

In this case, where all the invariants are checked at the end anyway, I feel that having public properties is more flexible and does not have downsides.

But I don't have strong opinions, if asked I can make setters borrow (take and retuen a &ref mut) and add getters.
The build method would still move and consume the builder instance (closing the build).

Comment thread librad/src/id/entity.rs
Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs Outdated
Copy link
Copy Markdown
Contributor

@kim kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall promising!

Comment thread librad/Cargo.toml
}

pub fn compute_hash(&self) -> Result<Multihash, Error> {
Ok(Sha2_256::digest(&self.canonical_data()?))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason for using SHA2-256?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is unlikely to have collisions, and if I am not mistaken it is the next hash function that git will use (transitioning away from SHA1).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don’t benefit in any way from git’s hashing, or do we? As SHA2 is already considered obsolete, I’d lean towards using a recent algorithm (either SHA3 or BLAKE3, which claims to be much faster). There’s also the consideration that hashing is likely to become a bottleneck for testing, so swapping for a cryptographically insecure but fast algorithm becomes desirable. As a follow-up, is this suggest to abstract over the hash algorithm, and introduce a “canonical” hashing module.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, but I'd like to add the concern that it will likely be useful for the Registry to verify these documents at some point in the future, and there we are a lot more limited in terms of hashing functions (we have to compile them in the wasm blob), so using the same hashing function (Blake2b) would make things interop more easily.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blake2b

Why is this, actually?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the default in polkadot/substrate (it might be possible to change though), besides also being a really good hashing function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be possible to change

Hm ok, good to know. Also dissapointed we can't use Blake2b++, being so much betterer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and I added one more checkbox in the #87 description to track this.

Comment thread librad/src/id/entity.rs
Ok(key.sign(&self.canonical_data()?))
}

pub async fn sign(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what this method does, and why: we're passing a secret key, and a Signatory, which could be either our own or someone else's public key. Then we sign the entity data with the secret key, stating that the signer is whoever is represented by by.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method signs the entity.
The secret key is obviously needed, but then the code does the following:

  • Get the public key from the secret key.
  • Check that the entity has not been already signed using this same key.
  • Check that this key is allowed to sign the entity and the by argument is correct: either the key is owned by self or it belongs to the given certifier.
  • If all checks passed, build the signature and add it to the signature set.

Leaving key revocation handling to another PR, does this logic make sense?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It’s the effects that threw me off here — maintaining the invariants that the identity document belongs to the private key (which is somewhat weak anyways), and that the doc is present in local storage could be factored out and expressed in the types.

Comment thread librad/src/id/project.rs
Comment thread librad/src/id/uri.rs
Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs
signature: &Signature,
resolver: &impl Resolver<User>,
) -> Result<(), Error> {
self.check_key(key, by, resolver).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc, this checks that the most recent user identity has this public key (ie. it was not revoked). If I'm not mistaken, check_history will fail then if any signature valid in the past was revoked.

Comment thread librad/src/id/entity.rs Outdated
revisions: R,
) -> Result<(), HistoryVerificationError>
where
R: Stream<Item = Entity<T>> + Unpin,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so you're walking the history from the most recent revision, which simplifies some things. However, I'm missing where we'd verify that the hash of the initial version matches the identity hash.

I also think this shouldn't return unit -- we will need to be able to distinguish between (a) history is valid, (b) head requires your signature or head quorum is not reached, so reset to head^, (c) history failed verification at revision X, and (d) an error occurred.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the parent hash verification (sorry, I just forgot it).

The HistoryVerificationError type encodes all the different error cases, and if I am not mistaken the only one not explicitly detected is distinguishing the two subcases of (b).

More generally we don't have a "protocol" for handling "draft" entities waiting to be signed.
They need to be moved from device to device (because the private keys needed to sign them should not leave the devices) but it is still not clear if this "movement" should be implemented by committing them to the entity history branch and pulling from other devices or "out of band" outside the git branch replication machinery.

In this code an entity with missing signatures is considered invalid (and the verification error states so).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors are fine, but we will need to return something in order to implement whatever interaction flow will be used to prompt for a signature. This will also depend on who the verifier is, and we will need to inspect up to which revision (both native and in-doc) we verified. This suggests to return the entity itself, tagged by the quorum state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the "verification status", not verification methods update this status.

Missing signatures are not a "verification error" if you check a single revision: the entity has a status that states that it must be signed but it is otherwise OK.

However missing signatures become errors when checking the full history.

Comment thread librad/src/id/entity.rs
info: T,
}

impl<T> Entity<T>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear what the actual (crate-)public entry points are: the check_* methods seem more like private helpers, and could use some docs and/or more descriptive names. I also don't quite understand when we will deserialise the EntityData (and check the hash).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this cleared up during our call? 🤔 I think the entry point is something I'm still confused on. Something I've found useful is writing the crate docs, for example in surf or tracker.

Copy link
Copy Markdown
Contributor

@xla xla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very promising.

Comment thread librad/Cargo.toml
Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs

/// Build and `Entity` from its data (the second step of deserialization)
/// It guarantees that the `hash` is correct
pub fn from_data(data: data::EntityData<T>) -> Result<Self, Error> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be better served as an impl of TryFrom?

Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs Outdated

/// Convenience method to check if an entity is valid
pub async fn is_valid(&self, resolver: &impl Resolver<User>) -> bool {
self.check_validity(resolver).await.is_ok()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems superfluous if the caller can just invoke an is_ok.

Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs
Copy link
Copy Markdown
Contributor

@FintanH FintanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback around the documentation.

Comment thread librad/src/id/entity.rs Outdated

/// A type expressing *who* is signing an `Entity`:
///
/// * either a specific user (identified by their URN),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move these doc comments to the enum branches themselves. I'll add a suggestion below to demonstrate.

Comment thread librad/src/id/entity.rs
Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs
/// - They can be signed, either with a key they own, or using a key belonging
/// to a different entity (the certifier); note that when applying multiple
/// signatures, signatures are not themselves signed (what is signed is always
/// only the entity itself).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for wording this:

/// Note that we only sign the entity itself. We do not sign the signatures associated with the entity.

The current wording comes across as complex to me.

Comment thread librad/src/id/entity.rs
/// - Each revision must be signed by all its owned keys and trusted certifiers.
/// - Each subsequent revision must be signed by a quorum of the previous keys
/// and certifiers, to prove that the entity evolution is actually under the
/// control of its current "owners" (the idea is taken from TUF).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// control of its current "owners" (the idea is taken from TUF).
/// control of its current "owners" (the idea is taken from [TUF](https://theupdateframework.io/)).

Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs Outdated
Comment thread librad/src/id/entity.rs
Comment thread librad/src/id/entity.rs
/// FIXME: probably we should merge owned keys and certifiers when checking
/// the quorum rules (now we are handling them separately)
pub fn check_update(&self, previous: &Self) -> Result<(), UpdateVerificationError> {
if self.revision() <= previous.revision() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention this in the comment, but I think it's pretty important that the updates are monotonic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it is important? I understand Massi's point more as an API ergonomics concern.

The reason TUF implies that there be no holes in the sequence is so you can't specify an arbitrarily large one. Otoh, this may make stalemate situations more complicated to resolve -- our system can by (ab-)used for voting schemes, which is not a consideration for TUF.

We can start with adhering strictly to TUF (next revision must be exactly previous + 1), and loosen it to a (small) window later.

Copy link
Copy Markdown
Contributor

@cloudhead cloudhead Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a couple things come to mind, though maybe we already have solutions for them:

  1. Do we have a way to tell if we're missing an update, between eg. Rev 31 and Rev 12319?
  2. Are there overflow attacks we should look out for?
  3. Given two updates to the same parent state, can we tell that they are conflicting (and not sequenced), even though they may have different revisions?

I think it makes things easier to think about at the very least, if you know ahead of time what the revision number of the next update is going to be, given that updates can be made concurrently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to tell if we're missing an update, between eg. Rev 31 and Rev 12319?

I think in this case you should be able to tell since the hash doesn't match, ala Merkle trees and all that. But someone can correct me if I'm wrong here.

Copy link
Copy Markdown
Contributor

@kim kim Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's all hash-linked -- which is not the case in TUF. The only reason really we have this number is that it seemed unclear how one would reify the order in a patch-based system, but I now think that it would be through native means.

Keep in mind that the logic here will verify only one peer's view of a given identity document. In order to reach a "consensus" (e.g. when talking about what the master head is), the views of all maintainers have to agree. If they don't the revision number will make things more complicated: say there's a 50-50 split between two camps of maintainers at proposed rev 42, but the tiebreaker doesn't agree with neither of them, they can't just propose rev 43, because that would violate the gap-less condition. This is quite straightforward if that condition was lifted, or there would be no revision at all.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, yeah I didn't catch that it was hash linked.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having revision numbers with a strict +1 rule will make key revocation checks way simpler.
I would discuss this in the "key revocation handling" PR I will post as soon as this is merged (see #87).)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider removing the numeric revision altogether, for the reasons outlined above. Since we said we'll need a cache to make lookups O(1), we could just as well key by hash.

Copy link
Copy Markdown
Contributor

@FintanH FintanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some final feedback around the code.

I don't feel like I want to block this PR in any way, but I feel like the builder pattern seems harder to maintain in the long run. There's the big build-up phase of the data and then all the validation logic happening in once place. I'm more in the camp of Parse, don't validate. So if I was trying to ensure variants I would have types that can only be created through functions that check these invariants. Then these functions compose to build the final data. For example:

struct Revision(u64);

impl TryFrom<u64> for Revision {
  type Error = RevisionMustBePositive
  fn try_from(r: u64) -> Result<Revision, Self::Error> {
    if r > 0 then Ok(Revision(r)) else Err(RevisionMustBePositive(r))
  }
}

This keeps the functions small, easy to test, and easy to swap in and out.

Apologies if this comes off as a rant, but I'd like to express how I think about these kinds of problems. Just wanted to explain what I was saying on the call earlier and see if it resonates with you or anyone else.

Thanks for the work ❤️ 🌱

Comment thread librad/src/id/entity.rs
/// Build and `Entity` from its data (the second step of deserialization)
/// It guarantees that the `hash` is correct
pub fn from_data(data: data::EntityData<T>) -> Result<Self, Error> {
if data.name.is_none() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the name be an empty string?

Comment thread librad/src/id/entity.rs
Comment thread librad/src/id/entity.rs
use thiserror::Error;

#[derive(Debug, Error)]
pub enum Error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving the error types to their own module. It would minimise this module a bit more for exploration since it's quite large. But it's up to you.

Comment thread librad/src/id/entity.rs
info: T,
}

impl<T> Entity<T>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this cleared up during our call? 🤔 I think the entry point is something I'm still confused on. Something I've found useful is writing the crate docs, for example in surf or tracker.

Comment thread librad/src/id/entity.rs Outdated
@kim
Copy link
Copy Markdown
Contributor

kim commented Apr 7, 2020

@FintanH I agree that most of the checks for well-formed could actually be pushed down to the parsing phase with no loss in preciseness of the error messages (which I think was the original motivation). There is, however, one validation left which requires the value to be complete: verifying the hash. So the truth is probably somewhere in between.

let mut cleaned = EntityData::<T>::default();
cleaned.name = self.name.to_owned();
cleaned.revision = self.revision.to_owned();
cleaned.hash = self.hash.to_owned();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Will hash ever be Some(hash) when you're calling compute_hash?

@FintanH
Copy link
Copy Markdown
Contributor

FintanH commented Apr 7, 2020

@FintanH I agree that most of the checks for well-formed could actually be pushed down to the parsing phase with no loss in preciseness of the error messages (which I think was the original motivation). There is, however, one validation left which requires the value to be complete: verifying the hash. So the truth is probably somewhere in between.

Hrmmm, could you not just have a function from the well-formed types to the hash? This has the benefit that you never try hash malformed data as well. I think the con here is that the function is rigid, i.e. it needs to take all the parameters necessary.

@cloudhead
Copy link
Copy Markdown
Contributor

There's the big build-up phase of the data and then all the validation logic happening in once place. I'm more in the camp of Parse, don't validate.

Just a note that this isn't so much about the builder pattern, as it is about early validation. You could totally have, for example:

Builder::new().set_revision(Revision::parse(rev)?).set_user(User::parse(user)?)

If you wanted the strong typing plus the builder pattern.

@FintanH
Copy link
Copy Markdown
Contributor

FintanH commented Apr 7, 2020

@cloudhead: What functionality does the builder pattern provide then? As I understood it, the point was that you can create the EntityData free-form and then calling Entity::from_data validates the data inside.

I feel like I'm missing something 😅

@cloudhead
Copy link
Copy Markdown
Contributor

@FintanH I didn't look deeply into this particular use-case, so I can only speak generally: it's useful when you have a lot of parameters, and/or many of those parameters are optional.

The alternatives are:

  1. A constructor function that takes lots of parameters, some of which are Option -- this is ugly and you have no clue what's going on:
Thing::new(a, b, Some(c), None, None)
  1. Literal syntax, but this requires all the fields to be set, or have a Default instance, and doesn't allow any logic during the construction:
Thing { a, b, c, d, ..Default::default() }
  1. A mix of the two, this requires an extra type:
Thing::new(ThingParams { a, b, c, d })
  1. The builder pattern.

@massimiliano-mantione massimiliano-mantione merged commit 577e1da into master Apr 15, 2020
@massimiliano-mantione massimiliano-mantione deleted the entity-verification-WIP branch April 15, 2020 10:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants