Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DSSE support for in-toto #503

Merged
merged 34 commits into from
Mar 15, 2023
Merged

Add DSSE support for in-toto #503

merged 34 commits into from
Mar 15, 2023

Conversation

PradyumnaKrishna
Copy link
Contributor

Fixes issue #445: Add support for DSSE

Description of the changes being introduced by the pull request:
This pull request aims to add DSSE support for in-toto, as DSSE signature wrapper is completed in securesystemslib.
I started with adding DSSE support in in_toto_run and created a convertor that converts Link or Layout object to DSSE Envelope.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Copy link
Member

@adityasaky adityasaky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

in_toto/runlib.py Outdated Show resolved Hide resolved
in_toto/runlib.py Show resolved Hide resolved
indent = None if compact_json else 1
separators = (',', ':') if compact_json else (',', ': ')

with open(filename, 'w', encoding='utf8') as fp:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be offloaded to a dump method in the DSSE envelope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible but Metadata wrapper for python-tuf uses different naming convention, to_file and from_file. In my opinion either this should be done in the in-toto itself or create a method that is suitable for both projects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion vis-à-vis the naming convention, especially since the eventual plan for Metablock is to deprecate it. I think it makes sense for the DSSE library to implement it rather than the implementation extending it.

LOG.info("Storing link metadata to '{}'...".format(filename))
link_metadata.dump(filename)
LOG.info("Storing link metadata to '{}'...".format(filename))
link_metadata.dump(filename)

return link_metadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be straightforward to implement for in_toto_record_* and in_toto_mock as well!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to hold off porting this feature to in_toto_record and in_toto_mock until the metadata architecture has crystallized (see #503 (review))

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I appreciate how non-invasive your patch is, but the interface and its implementation becomes more and more convoluted. I think we should seize the opportunity of introducing DSSE -- I assume it will become an API breaking change -- and clean up a little.

I'd like us to take inspiration from the latest developments in python-tuf and securesystemslib wrt metadata-related abstractions. Most notably (plus some ideas):

  • securesystemslib.signer.Signer -- DSSE envelope already uses this, we could make Metablock.sign use it too. I would even suggest to remove signing_key, gpg_keyid, gpg_use_default, gpg_home args from in_toto_run and have it accept a Signer instead.

  • tuf.api.metadata.Signed -- Base class for inner tuf metadata, akin to in-toto's Signable. We could port it to securesystemslib and use it for any inner tuf or in-toto metadata, regardless of the signature wrapper.

  • tuf.api.metadata.Metadata -- Container class for traditional signature wrapper, akin to in-toto's Metablock, but more modern and flexible, e.g. it uses the Signer interface and other useful abstractions like StorageBackendInterface. It would be nice to port this to securesystemslib and use it for TUF and in-toto.

  • tuf.api.serialization -- Provides abstract interfaces and default implementations to de/serialize different wireline formats for traditional metadata (see Metadata[Des|S]erializer), and to serialize payload (see SignedSerializer). I wonder if we can extend/reuse/replicate this module for DSSE.

in_toto/runlib.py Show resolved Hide resolved
@PradyumnaKrishna
Copy link
Contributor Author

@lukpueh thank you for providing some thoughts.

Here is my opinion about them, and I think we can continue with the Signer implementation and review again.

  • Signer

    • We can make Metablock.sign to use this and it will reduce the duplication from the implementation.
    • Accepting Signer in in_toto_run wouldn't make a major impact because somewhere we have to parse command line arguments and convert them into required Signer.
  • Signable and TUF Metadata

    • TUF metadata do not support GPGSignatures, and it would be tricky to add them to it because tuf metadata works on different type of Key implementation.
    • In future, in-toto will deprecate1 its current metadata wrapper (old-style envelope) because of it's flaw. Porting the metadata would be unnecessary job.
    • Lets say if we port the metadata wrapper to securesystemslib, then we have to change all the Metablock references to metadata, and that will be a major and breaking change for in-toto. Doing that is similar to integrating DSSE in in-toto.
    • Porting Signed is not effective, because in the end Signed is a Link or Layout and there will be no changes.
    • I believe in abstraction between metadata wrappers, lets think about it more.
  • Serialization
    The serialization will be inetegrated in both wrappers (DSSE and Metablock) and will be achieved easily.

I will push a patch soon, maybe tomorrow. After that we can discuss this again.

Footnotes

  1. https://github.com/in-toto/ITE/blob/master/ITE/5/README.adoc#backwards-compatibility

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff! in_toto_run looks a lot better now

@@ -137,14 +139,15 @@ def type_(self):
return self.signed.type_


def sign(self, key):
def sign(self, key=None, signer: Signer=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the effort to maintain backwards compatibility, but I'm more in favor of maintaining a clean API.

My preferred solution would be to replace the current sign(key: dict) and sign_gpg(keyid: str) with a sign(signer: Signer) and maybe add a isinstance(signer, Signer) check with an error message that points out the breaking change. We can remove/reword the error message after a few releases.

If we don't want to break the API (yet), we can also add a new sign method with a different name and deprecate current sign and sign_gpg more slowly.

In theupdateframework/python-tuf#2010 (comment) we had a similar problem. There we ended up with a mix of above two suggestion, i.e. create a new function with a different name and remove the old ones right away. The reason why we changed the name was that it was not as straight-forward as here to point out to the user that the function was called with the old argument types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need to worry too much about the signing interface of Metablock if we decide to henceforth only support creation of DSSE envelopes.

separators = (',', ': ')
if compact_json:
indent = None
separators = (',', ':')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Metablock also has a compact_json attribute, which is ignored here. This should really only be a temporary solution. Adding a TODO comment or creating a GitHub issue might help to not forget.

@lukpueh
Copy link
Member

lukpueh commented Aug 4, 2022

Thank you for your assessment, @PradyumnaKrishna! I have some replies inline

  • Signer

    • We can make Metablock.sign to use this and it will reduce the duplication from the implementation.

See #503 (comment).

  • Accepting Signer in in_toto_run wouldn't make a major impact because somewhere we have to parse command line arguments and convert them into required Signer.

True. But it seems like a job that we could factor out to something like a Signer factory that's called in the cli. Seems like it would make the library more ergonomic. But we don't have to do this now.

  • Signable and TUF Metadata

    • TUF metadata do not support GPGSignatures, and it would be tricky to add them to it because tuf metadata works on different type of Key implementation.

TUF's Metadata class already uses securesystemslib's Signer and Signature objects, and it makes sense for it to use the Key implementation that you ported to securesystemslib instead of it's own. This would allow Metadata to support gpg signatures.

Moreover, there is a feature request to port the entire Metadata class to securesystemlib and use it for both TUF and in-toto (see secure-systems-lab/securesystemslib#272 (comment)).

But again, we don't have to do this right away. So I'm fine if we work with Metablock for the time being.

  • In future, in-toto will deprecate1 its current metadata wrapper (old-style envelope) because of it's flaw. Porting the metadata would be unnecessary job.

True. But even if we no longer support creating metadata with the traditional wrapper, we will probably need to support reading it and verifying its signatures for a while. So it might make sense to have a common class in securesystemslib.

  • Lets say if we port the metadata wrapper to securesystemslib, then we have to change all the Metablock references to metadata, and that will be a major and breaking change for in-toto. Doing that is similar to integrating DSSE in in-toto.

True. And we should carefully consider whether we want a breaking change or not. I think it's okay to break the API, as long as we stay backwards compatible with old metadata, i.e. support reading and verifying the traditional metadata wrapper.

  • Porting Signed is not effective, because in the end Signed is a Link or Layout and there will be no changes.

It makes sense if we have a common Metadata class in securesystemslib that contains a generic Signed object and provides methods to serialize/deserialize, sign/verify, etc. We could subclass the same Signed class then in in-toto (Link, Layout) and tuf (Root, Targets, Snapshot, Timestamp).

  • I believe in abstraction between metadata wrappers, lets think about it more.

👍

@adityasaky
Copy link
Member

Note: we want to be able to plug the same Link object into both signature wrappers (and sign them with the same keys) to meet the requirements of the transition phase. https://github.com/in-toto/ITE/blob/master/ITE/5/README.adoc#backwards-compatibility

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stellar work, @PradyumnaKrishna. I'm very impressed how well you navigate through the code base. Please address comments inline and we can discuss how to roll out this feature, and also move on to adding dsse support in verifylib.

in_toto/models/metadata.py Show resolved Hide resolved

return signature

def sign_key(self, key):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware that I said this was my preferred solution, but it now looks like this is the only breaking change of your DSSE integration. Maybe we can leave Metablock.sign as it was and find a different name for a signing function that accepts a Signer? You then won't be able to call metablock_or_envelope.sign() anymore, but it seems easier to also rename Envelope.sign, than to adopt a breaking change in Metablock.

We only need to find a good name. 😬 What about create_signature, or create_sig? The former clashes a bit with the old verify_signature. But if you choose the latter, you could call your new verify function verify_sig or so... I'm very open for better naming suggestions.

in_toto/models/common.py Outdated Show resolved Hide resolved

# pylint: disable-next=abstract-method
class MetadataLoader(SerializationMixin, JSONSerializable):
"""A Metadata loader to load metadata using suitable method."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With in-toto verification in mind, does it make sense to provide a more general metadata abstraction for traditional Metablock and new dsse Envelope objects? The following routine is typical for verifylib regardless of the metadata wrapper.

1. Load metadata from disk
2. Verify signatures
3. Extract payload for further processing

use_dsse = True
else:
link_metadata.verify(verification_key)
link = link_metadata.signed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above above more general metadata abstraction that provides a common verify method.

tests/test_runlib.py Show resolved Hide resolved
in_toto/runlib.py Show resolved Hide resolved

elif securesystemslib.formats.SIGNATURE_SCHEMA.matches(signature):
valid = key.verify(
Signature.from_dict(signature), self.signed.signable_bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, Metablock.signatures is already a list of Signature objects, so you don't have to case handle here. Can you add it to the issue with Metablock enhancements that may or may not be worth providing before we deprecate it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metablock.signatures is not a list of Signature objects right now. Also, I found a flaw in the verify method added previously, Signature.from_dict destroys the dict of signature provided.

The solution to add a method similar to the DSSE having same arguments would be tricky. The one way I can think of is to convert the whole Metablock wrapper to use Signer, Key and Signature as objects.

Right now I removed the method and using the Metablock verify signature method for verification.

@PradyumnaKrishna
Copy link
Contributor Author

PradyumnaKrishna commented Aug 22, 2022

@lukpueh I found that 7f3a801 was failing for some testcase which was not expected. It raises FileNotFoundError and StorageError.

ba415f8 This does not raises ThresholdVerificationError on some testcases.

Edit: MetadataLoader was working fine, A wrong testcase was affecting others.

@lukpueh
Copy link
Member

lukpueh commented Aug 23, 2022

ba415f8 This does not raises ThresholdVerificationError on some testcases.

Looks like too much indentation for this block:

# For each step, verify that we have enough validly signed links from
# distinct authorized functionaries. Links signed by different subkeys of
# the same main key are counted only once towards the threshold.
valid_authorized_links_cnt = (len(verified_key_link_dict) -
(len(used_main_keyids) - len(set(used_main_keyids))))
# TODO: To guarantee that links are signed by different functionaries
# we rely on the layout to not carry duplicate verification keys under
# different dictionary keys, e.g. {keyid1: KEY1, keyid2: KEY1}
# Maybe we should add such a check to the layout validation? Or here?
if valid_authorized_links_cnt < step.threshold:
raise ThresholdVerificationError("Step '{}' requires at least '{}'"
" links validly signed by different authorized functionaries. Only"
" found '{}'".format(step.name, step.threshold,
valid_authorized_links_cnt))

It used to execute after this loop, now it runs in every iteration:

for link_keyid, link in chain_link_dict.get(step.name, {}).items():

I admit that the function has become long enough for mistakes like this to easily happen. Maybe it makes sense to factor out some things to helper functions. I'd at least try to not make it any longer by adding DSSE support. With a common Metablock/Envelope abstraction that has a verify method, this should be possible, right?


def __init__(self, metadata):
self.metadata: Union[Envelope, Metablock] = metadata
self.dsse = isinstance(metadata, Envelope)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think composition is the best suited pattern here, it requires case handling in every function for objects of different classes that could have the same interfaces for serialization, signature creation/verification and payload extraction.

What if AnyMetadata was a base class for Metablock and Envelope that only implements the from_dict factory as you have it now, and defines common interfaces for above tasks? I can see two challenges there.

  1. current signature verification methods expect different arguments for Metablock and Envelope
    --> A good solution would be to implement a new additional verify method on Metablock that accepts the same arguments as Envelope's verify method.

  2. Envelope is implemented in securesystemslib, so it can't inherit from AnyMetadata in in-toto
    --> I can see multiple solutions:

    1. Use protocols to not require subclassing. This would require either dropping 3.7 or adding an additional dependency to pull in the backport
    2. Create a glue Envelope class in in-toto, which could inherit both from AnyMetadata and securesystemslib's Envelope. This solutions seems a bit messy.
    3. Implement AnyMetadata and a Metablock-like class in securesystemslib. The latter is also requested in #272. However, this would mean a breaking change.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that that it is possible to define a base class AnyMetadata for Envelope and Metablock and use them in it's from_dict method. I am surprised to see this behaviour. But if we are going to have a base class AnyMetadata then these all metadata need to be present in a single file otherwise they will cause circular import.

It is not possible to do first task without refactoring Metablock (see #503 (comment)) and I am not sure about second task either.

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first batch of review comments. All minor stuff really. The PR is in great shape! 💯

Could you please also look at the sphinx docs and see if we need to explicitly enable docs for new classes (AnyMetadata, Envelope, serializers?).

Please also make sure that important methods inherited from securesystemslib (e.g. from/to_bytes/file are shown in the docstring of the inheriting class).

in_toto/models/metadata.py Show resolved Hide resolved
in_toto/models/metadata.py Outdated Show resolved Hide resolved
class Envelope(AnyMetadata, SSlibEnvelope):
"""in-toto copy of DSSE Envelope."""

def get_payload(self, deserializer: Optional[BaseDeserializer] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

in_toto/models/metadata.py Outdated Show resolved Hide resolved
in_toto/models/metadata.py Outdated Show resolved Hide resolved
@lukpueh
Copy link
Member

lukpueh commented Sep 6, 2022

... there is a bit of redundant functionality in Metablock now. I'm fine leaving as is and decide later if we want to just sit out the DSSE transition phase and remove the whole Metablock class at once, or clean it up in a breaking change before.

Still, would you mind creating an issue for the following items?

  • JSONSerializer can be used instead of Metablock.__repr__ and Metablock.compact_json
  • Metablock.from_file and Metadata.to_file (both via SerializationMixin) can be used instead of Metadata.load and Metablock.dump
  • I don't think we need attr anymore on Metablock

More things to ticketize:

  • create_sig and verify_sigs can be used instead of sign, sign_gpg, verify_signature.
  • Maybe we can find better names for create_sig and verify_sigs (sign and verify_signature are already taken)

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding docs! Here's a review of just those:

Things that need to be fixed in securesystemslib for docstrings that get pulled in via inheritence:

  • default_serializer and default_deserializer don't need to show up in the docstrings, maybe we can just make them internal, i.e. _default_serializer and _default_deserializer, which should automatically exclude them.
  • please remove mention of Serializable, which no longer exists
  • please use double-backticks consistently and full import paths for class names etc., eg. ``securesystemslib.serialization.JSONDeserializer``

doc/source/model.rst Outdated Show resolved Hide resolved
in_toto/models/metadata.py Show resolved Hide resolved
in_toto/models/serialization.py Outdated Show resolved Hide resolved
in_toto/models/serialization.py Outdated Show resolved Hide resolved
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments on models subpackage...

payload=self.signable_bytes,
payload_type=ENVELOPE_PAYLOAD_TYPE,
signatures=[]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: IMO, the contained item does not necessarily need to know about the container. Does the method make more sense as static method on AnyMetadata, or classmethod on Envelope?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classmethod on envelope is a better approach I guess.

in_toto/models/metadata.py Outdated Show resolved Hide resolved
in_toto/models/metadata.py Outdated Show resolved Hide resolved
in_toto/models/metadata.py Outdated Show resolved Hide resolved
in_toto/models/metadata.py Outdated Show resolved Hide resolved
in_toto/models/metadata.py Outdated Show resolved Hide resolved
signature = Signature.from_dict(copy.deepcopy(signature))

else:
raise securesystemslib.exceptions.SignatureVerificationError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful here. A single malformed or unknown signature should not fail the entire verification. We only fail if good signatures don't meet the threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case was failing that's why I raised this exception.

def test_verify_layout_signatures_fail_with_malformed_signature(self):
"""Layout signature verification fails with malformed signatures. """

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you have there before? IMO that particular test should also fail if you continue here instead of raising an error. Then the error is raised further below, because the threshold isn't met. Right?

But it looks like the old verifylib.verify_layout_signatures would have also failed for a malformed signature. Interestingly, in both cases it depends on where in the signature list the malformed signature is located, and whether enough valid signatures can be matched by keyid and successfully verified before. It also looks like old and new functions would fail for slightly different sequences of keyid-matching, valid, and malformed signatures .

Regardless, it shouldn't matter for in_toto_verify, because Metablock.validate() is called on load and will error on malformed signatures before we get to verify them.

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a review of verifylib.py and other module, where I didn't find anything comment-worthy.

Files still to be reviewed are runlib.py and tests.

in_toto/in_toto_sign.py Outdated Show resolved Hide resolved
in_toto/in_toto_sign.py Outdated Show resolved Hide resolved
in_toto/verifylib.py Show resolved Hide resolved
in_toto/verifylib.py Outdated Show resolved Hide resolved
in_toto/verifylib.py Outdated Show resolved Hide resolved
@lukpueh
Copy link
Member

lukpueh commented Oct 11, 2022

Things that need to be fixed in securesystemslib for docstrings that get pulled in via inheritence:

  • default_serializer and default_deserializer don't need to show up in the docstrings, maybe we can just make them internal, i.e. _default_serializer and _default_deserializer, which should automatically exclude them.
  • please remove mention of Serializable, which no longer exists
  • please use double-backticks consistently and full import paths for class names etc., eg. securesystemslib.serialization.JSONDeserializer

Thanks for addressing these in in-toto/securesystemslib#13! When re-building the docs, I noticed that a reader of SerializationMixin.{from, to}_{bytes, file} won't know about the default de/serializers, which might be useful usage information.

Should we mention cls._default_(de)serializer in SerializationMixin.{from, to}_{bytes, file} and show docstrings of the overridden methods (e.g. AnyMetadata.cls._default_(de)serializer) after all?

Alternatively, we could just mention the default de/serializers (i.e. JSON) to the class docstrings of the classes that inherit {from, to}_{bytes, file} (i.e. AnyMetadata, Metablock, Envelope).

Any thoughts?

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a review of the modules I haven't reviewed so far, that is runlib, test_runlib and test_verifylib.

in_toto/runlib.py Outdated Show resolved Hide resolved
in_toto/runlib.py Show resolved Hide resolved
in_toto/runlib.py Show resolved Hide resolved
tests/test_runlib.py Show resolved Hide resolved
@lukpueh
Copy link
Member

lukpueh commented Oct 11, 2022

Just finished reviewing the full PR. This is amazing work, @PradyumnaKrishna! Let's try to get it merged soon. Here's what's missing:

  • Address comments from reviews
  • Add more tests
  • Ticketize remaining issues (see e.g. #503 (comment), #503 (comment), #503 (comment))
  • Figure out to what extent this PR breaks the API
    • Is it only a few exceptions and a few semi-internal functions in verifylib/runlib?
    • Can we live with those, or should we aim for full backwards compatibility?
    • Will this require a major release?
  • Upstream and release in-toto/securesystemslib
  • Update securesystemslib dependency

Are you available to work on any of these things, @PradyumnaKrishna?

@PradyumnaKrishna
Copy link
Contributor Author

Thanks for addressing these in in-toto/securesystemslib#13! When re-building the docs, I noticed that a reader of SerializationMixin.{from, to}_{bytes, file} won't know about the default de/serializers, which might be useful usage information.

Should we mention cls._default_(de)serializer in SerializationMixin.{from, to}_{bytes, file} and show docstrings of the overridden methods (e.g. AnyMetadata.cls._default_(de)serializer) after all?

Alternatively, we could just mention the default de/serializers (i.e. JSON) to the class docstrings of the classes that inherit {from, to}_{bytes, file} (i.e. AnyMetadata, Metablock, Envelope).

Any thoughts?

I guess we can mention those default deserializer and serializer in AnyMetadata and also mention those two special classes in the class doc-string of SerializationMixin.

Also there is one small issue that I found during testing. This one line of code breaks for DSSE because Envelope signatures are object and Metablock signatures are dict.
The easiest fix would be if-else statement but I would like your opinion.

Are you available to work on any of these things, @PradyumnaKrishna?

Yes, I am available, I have couple uncommitted changes that I will push this week and address other issues. I will not be available next week but continue my work after that.

@lukpueh
Copy link
Member

lukpueh commented Oct 11, 2022

we can mention those default deserializer and serializer in AnyMetadata

👍

... also mention those two special classes in the class doc-string of SerializationMixin.

Which special classes?

The easiest fix would be if-else statement but I would like your opinion.

Good find! I guess if/else is okay.

Yes, I am available, I have couple uncommitted changes that I will push this week and address other issues. I will not be available next week but continue my work after that.

🚀

@PradyumnaKrishna
Copy link
Contributor Author

Which special classes?
I mean we can mention _default_serializer and _default_deserializer in the class doc-string of SerializationMixin.

@lukpueh
Copy link
Member

lukpueh commented Oct 11, 2022

I mean we can mention _default_serializer and _default_deserializer in the class doc-string of SerializationMixin.

I think that's not necessary. The SerializationMixin class docstring won't show in the in-toto docs. And I was more worried about users of {AnyMetadata, Envelope, Metablock}.{from, to}_{bytes, file}, than about users of SerializationMixin.

@@ -45,7 +46,7 @@ def validate(self):


@attr.s(repr=False, init=False)
class Signable(ValidationMixin):
class Signable(ValidationMixin, SerializationMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SerializationMixin also adds from_bytes, from_file and to_file method, which we don't need, and, at least the former, won't work. Right? Maybe it's better to just reimplement to_bytes instead of using the mixin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will remove SerializationMixin.

class Envelope(AnyMetadata, SSlibEnvelope):
"""in-toto copy of DSSE Envelope."""

def get_payload(self, deserializer: Optional[BaseDeserializer] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a look at the test_in_toto_{sign, record, run}.py additions. I don't think we need to duplicate all these cli tests for dsse, especially since the tests don't assert whether dsse was actually used.

In my mind the goal of the cli tests is to assert that certain combinations of cli options result in a specific exit value. And I think we get good coverage for that without testing all combinations of all options. So for the sake of dsse, it might be enough to test only one or two or a couple of exemplary invocations with --use-dsse, or with dsse metadata as inputs.

More thorough tests of combinations of parameters should be performed for the in_toto_{run, record, verify} library functions. There we should also assert that the input or output metadata files are actually dsse.

Does this make sense?

in_toto_verify(signed_layout, self.alice_pub_dict,
substitution_parameters={"EDITOR":"vim"})

self.assertEqual(self.layout.steps[0].expected_command[0], "vim")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's thorough testing. I wouldn't expect this to be affected by the wrapper. But you never know. :)

"""Test CLI command record start/stop with various arguments. """

# Start/stop recording using rsa key
args = ["--step-name", "test1", "--key", self.rsa_key_path, "--use-dsse"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You test with "--use-dsse" here but not below. Doesn't that make below tests duplicates of TestInTotoRecordTool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot to add it, will fix them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you fix them, please see my comment above: #503 (review)

I'm not sure if we even need that many dsse-flavored cli tests.

keys = [SSlibKey.from_securesystemslib_key(verification_key)]

keys = create_key_list_from_key_dict(
{authorized_keyid: verification_key})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembered that we have to be careful when using from_dict methods (which you call in create_key_list_from_key_dict), as they usually have the side-effect of popping elements from the passed dict.

The reason why we can still use verification_key["keyid"] below is that SslibKey.from_dict only pops some items put not the keyid and GPGKey.from_dict does not pop at all yet. The latter may change with in-toto/securesystemslib#7.

We should hopefully catch the change with our tests, but it might be worth already updating the code to not use key dicts after they have been passed to create_key_list_from_key_dict (and add a note about the side-effect to the function docstring).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found GPGKey.from_dict and SSlibKey.from_securesystemslib_key both doesn't have any side effects currently.

Comment on lines 1349 to 1351
use_metablock:
Use metablock to generate metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use_metablock:
Use metablock to generate metadata.

in_toto_sign now utilizes methods that are common for Metablock
and DSSE signature wrappers.

in_toto_sign was the last command for which adding DSSE support
was left. Now it completes adding DSSE support fo complete
in_toto library.

Further Testcase and Documentation updates are required.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Add more informational logging and fixed documentations at several
places.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
AnyMetadata methods utilizes conditions to provide common
interfaces for some methods, which is not a good pattern to
follow. Hence, It is decided to Refactor them and add them to
Metablock and Envelope.

Refactoring Envelope and Metablock two methods:
* `deserialize_payload`: This method will return signed object
  after matching its type.
* `match_paylaod`: Matches type of the signed object.

`verify_signatures` method is not added to Metablock because
of current incompatibility of Metablock with Signer, Key and
Signature objects. It requires a major refactor in order to add
these methods to the Metablock.

AnyMetadata is reverted back to MetadataLoader.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
To avoid breaking change, `sign` method is renamed to `create_sig`
and `sign_key` is renamed back to `sign`.
All occurances of those methods are changes.

verify_sigs method is introduced identical to DSSE envelope
verify_sigs which can verify multiple signatures at once.
A deepcopy of those signatures are created before verifying because
Signature and GPGSignature destorys dict passed in from_dict method.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
in-toto's Envelope is a subclass of the DSSE Envelope present at
securesystemslib with the in-toto payload deserializer.
For creation and verification of singature, similar interface for
Metablock is added.

The references to previous Envelope are changed, and other methods
like payload extraction, signing and verification etc.
Signed.create_envelope still create SSlibEnvelope, if it creates
a in-toto copy of Envtope then it will cause a circular import.

Some tests are fixed, public_key dict conversion to Key are added
wherever required for signature verification without any breaking
change.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
AnyMetadataDeserializer and PayloadDeserializer moved to different
module, `models.serialization`. To avoid circular imports these
deserializers are imported inside default_deserializer of Metadata.
pylint werning are disabled for these imports.

Some documentation correction has been done, and sphinx docs are
enabled for new Metadata classes and deserializers.

in-toto's DSSE envelope `payload_type` is now stored under a
constant variable.

create_envelope() method of Signed now utilizes in-toto version
of DSSE envelope rather than securesystemslib version.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Minor Documentation changes, removed references to Metablock in
verifylib.py. Added subsection in shpnix docs.
Removed unrealated chanages in metadata.py.
default_serializer and default_deserializer class functions are
renamed according to the latest upstream changes made in
securesystemslib.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
A small fix in in_toto.runlib.in_toto_record_stop, where code
breaks for DSSE due to different signature type. In-toto current
metadata wrapper have signatures in the form of dict, where DSSE
Envelope signatures are in the form of objects. Hence, it breaks
for DSSE which is fixed by adding a if-else condition.

A new function introduced in verifylib that creates a list of
SSlibkey from the dict format. This function helps in reducing
code duplication.

in_toto.verifylib.get_summary_link now returns the link object
which was returned in the form of metadata previously. This change
does not cause major breaking change, all the corresponding
related functions are also updated.

Code optimization in in_toto.verifylib.in_toto_verify. This change
is related to the change in summary link. Payload extraction take
place twice during in_toto_verify, which has major impact on DSSE
metadata because deserialization have several steps.

1. In in_toto.verifylib.verify_sublayouts Link(s) are extracted for
the purpose of converting Layout(s) into Link(s).
2. in_toto_verifylib.get_links extracts Link(s) and creates a
new dict for the other steps.

Merging both functions makes the task simpler, and have potential
improvement. verify_sublayouts make changes to chain_link_dict
and returns it which only contains Link objects for the next steps
of in_toto_verify.

This change wouldn't have any side effect as get_summary_link now
returns the Link object instead of metadata object.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Canonical JSON is not a type/subset of JSON and hence encoding
and decoding of canonical json works differently. Which creates
an problem in decoding canonical json, as we can't use
JSONDeserializer to deserializer it.

Using JSON for DSSE payload is favourable in this situation
because we don't have to create new deserializer and supports
both serialization and deserialization from and to the Signable
object.

create_envelope is refactored to from_signable method of in_toto
Envelope.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Test Cases are added for in-toto-run/record/mock/sign commands for
DSSE metadata. Mostly tests are duplicate tests with use_dsse
parameter.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Adds in-toto envelope payload parsing and envelope creation
test cases.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
chain_link_dict data structure represents a dictionary containing
link object while it is used to store link metadata for the
functionary steps in in_toto_verify. It creates confusion around
the data structure and the data contained in the variable.

chain_link_dict is renamed to steps_metadata for the part till it
contains only link metadata per functionary steps.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Testcase for in_toto_verify with DSSE metadata files are added.
Found expires field in dsse layout and removed it to fix some
failing tests.
GPG testcase metadata files are generated using GPG Keys provided
in the keyring.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Remove Serialization Module and implements `load` and `dump`
methods over DSSE and Metablock wrappers, for the simplicity of
the metadata wrappers.
Metadata Abstraction have `to_dict`, `from_dict`, `load`, `dump`,
`create_signature`, `verify_signatures` and `get_payload`
functions to be used for simplicity.

Alias created for sign and verify methods in in-toto DSSE
envelope. and raises NotImplementedError for GPG signers.

Now, Envelope and Metablock share common methods for signing,
verifying of signatures and payload parsing, which results in
easier implementation and easy switching between the wrappers.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Refactor in_toto library according updated securesystemslib. All
occurances of Metadata are updated according to newer Metadata
abstraction.

Now in_toto uses `GPGSigner`, `GPGKey` and `GPGSignature` from
`in_toto.models._signer` which is removed from securesystemslib.
securesystemslib also removed serialization module.

Here is a summary of changes in Metadata methods:
AnyMetadata   ->  Metadata
.from_file()  ->  .load()
.to_file()    ->  .dump()
.verify_sigs  ->  .verify_signatures()
.create_sig   ->  .create_signature()

Exceptions are updated as well:
SignatureVerificationError -> securesystemslib.exceptions \
.VerificationError
StorageError -> System Defined Exceptions

Signed-off-by: Pradyumna Krishna <git@onpy.in>
verify_signatures method for `Metablock` requires unnecessary
additions like `create_key_list_from_key_dict` helper function.
It causes issue for GPG verification, where no exception is
raised for key expiration. ALso, some key may get dropped if
they doesn't matches with the public key schemas provided by
securesystemslib.

As a result a `verify_sigantures(keys: List[Key], threshold: int)`
is replaced with `verify_siganture(verification_key: dict)` where
it verifies with a signature with a single key. Using this no
changes are required for existing Metablock verification method
and works with GPG as well.

All references are refactored and `create_key_list_from_key_dict`
has been removed from the code. `verify_metadata_signatures`
function in verifylib is added back that verifies metadata
signatures, removed previously `verify_layout_signatures`.

Documentation fixes are made and tests has been updated as well.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Removes Serialization module from Sphinx docs and updates
`AnyMetadata` with `Metadata`.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
To lower diff in PR all unrelated changes has been removed, such
as extra space, line, indirect references etc.

SignatureVerificationError is raised instead of VerificationError
in create_signature methods of Envelope and Metablock to make
in_toto backward-compatible.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Type Annotations are wrong between from_dict and load methods of
super and sub classes of Metadata, and in_toto doesn't uses these
annotations. So, the easier fix is to remove them from the
Metadata classes.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
To maintain backward compatibility the following changes has
been made.
* compact_json argument from ``Metadata.dump`` is removed,
  reverting back to previous Metablock compact_json attribute.
* Raising FormatError in ``Metablock.load``.

Small fix in test_envelope, regarding setup test dir. No need to
setup test dir due to only reading operation are used in test.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
securesystemslib is upgraded to version 0.27.0 to provide DSSE
Envelope for in-toto DSSE support (currently experimental).

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Replaces securesystemslib GPGSigner with in-toto GPGSigner in
in_toto_sign.

Replaces Metablock references in documentation with Metadata.

Signed-off-by: Pradyumna Krishna <git@onpy.in>
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your persistence, @PradyumnaKrishna! 💯

@lukpueh lukpueh merged commit d20ace7 into in-toto:develop Mar 15, 2023
@adityasaky
Copy link
Member

Fantastic work, @PradyumnaKrishna!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants