Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode smessage #504

Merged
merged 4 commits into from Jan 14, 2019
Merged

Decode smessage #504

merged 4 commits into from Jan 14, 2019

Conversation

blag
Copy link
Contributor

@blag blag commented Jan 13, 2019

This PR fixes VerifyKey.verify() when smessage is not raw bytes.

I also add two tests, one for a hex-encoded smessage and another with a base64-encoded smessage. In both tests, signature is decoded to raw bytes before being passed to VerifyKey.verify().

I also make it more clear in the documentation that signature MUST always be raw bytes, even if smessage can be encoded.

This PR is a different fix then #256, as pointed out in this comment.

Special thanks to @lmctv for guiding me towards this solution.

@lmctv
Copy link
Contributor

lmctv commented Jan 13, 2019

It seems the doctest failure in signing.rst you can see at https://travis-ci.org/pyca/pynacl/jobs/478988632 is the only remaining problem.

@blag
Copy link
Contributor Author

blag commented Jan 14, 2019

I did not realize that the code in the documentation was actually tested - that's hella cool!

Copy link
Contributor

@lmctv lmctv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think the documentation changes are putting a bit too much emphasis on the detached signatures case, we can get a better balance later. The PR as it stands corrects a real bug. Approved.

@lmctv
Copy link
Contributor

lmctv commented Jan 14, 2019

@blag Thank you for your persistence.

@alex
Copy link
Member

alex commented Jan 14, 2019

This looks backwards incompatible for anyone relying on the current behavior. Was this taken into account before merging this?

@blag
Copy link
Contributor Author

blag commented Jan 15, 2019

This does not change any of the previous behavior (when it worked). This PR does fix some previous broken behavior.

Here's the previous behavior:

encoder smessage signature signature + smessage encoder.decode(smessage)
Raw bytes bytes bytes
Raw bytes None
Hex hex bytes bytes+hex ❌ (raises exception)
Hex hex None
Base64 base64 bytes bytes+base64 ❌ (raises exception)
Base64 base64 None

Here's the new behavior:

encoder smessage signature signature + encoder.decode(smessage) encoder.decode(smessage) encoder.decode(smessage)
Raw bytes bytes bytes
Raw bytes None bytes
Hex hex bytes bytes+bytes
Hex hex None bytes
Base64 base64 bytes bytes+bytes
Base64 base64 None bytes

@alex Do you have any specific use case that is now broken?

@blag blag deleted the decode-smessage branch January 15, 2019 01:14
@alex
Copy link
Member

alex commented Jan 15, 2019

As far as I can tell, if you were previously passing both signature and smessage encoded, they'd be contacted and decoded together. Now only smessage is will be decoded.

@blag
Copy link
Contributor Author

blag commented Jan 15, 2019

@alex See the discussion in #256.

@alex
Copy link
Member

alex commented Jan 15, 2019

Yes, so whether it worked was dependent on what encoding you were using. This doesn't change the fact that there were sets of arguments that used to work, which no longer do.

Frankly, I'd like to just remove encoder/decoder support entirely, as it's full of edge cases like this, and it adds basically no value.

@blag
Copy link
Contributor Author

blag commented Jan 15, 2019

Yes, so whether it worked was dependent on what encoding you were using. This doesn't change the fact that there were sets of arguments that used to work, which no longer do.

That's a fair point. Take it up with @lmctv, it's not my decision.

Frankly, I'd like to just remove encoder/decoder support entirely, as it's full of edge cases like this, and it adds basically no value.

So, if I understand you correctly, you don't want this PR to break anything, but you would be happy to break everybody else's code that uses encoders?

If we support encoders, and we want to support signatures that have different encodings than the message, then we really need to be able to specify two different encoders:

def verify(self, smessage, signature=None, encoder=encoding.RawEncoder, sig_encoder=encoding.RawEncoder):

Then, decoding and combining smessage and signature would reduce to this (for every type and combination of encoder):

if signature is not None:
    smessage = sig_encoder.decode(signature) + encoder.decode(smessage)

If that's amenable to you, I'll whip up a PR for it.

@alex
Copy link
Member

alex commented Jan 15, 2019

My proposal would be to deprecate them, and emit a warning, with a hope of someday in the future removing them, not simply break everything right away.

@reaperhulk
Copy link
Member

@lmctv yeah okay while the previous behavior is weird, this breaks BC. Separately encoding/decoding both for all encoders seems to be the backwards compatible compromise here. I agree with Alex that we should also strongly consider deprecating encoders entirely with an eye to removing them ASAP (which is probably years, but might as well start now)

@lmctv
Copy link
Contributor

lmctv commented Jan 16, 2019

  1. Would you prefer preparing a revert PR yourself, or merging a revert PR?
  2. I was thinking about something like: keep encoding for the "all in one" cases, deprecating the .message and the . signature attributes and replacing them with .raw_message and .raw_signature for advanced use. Could this work?

lmctv added a commit to lmctv/pynacl that referenced this pull request Jan 16, 2019
This reverts commit 3384aa8.

Full discussion at pyca#504
@lmctv
Copy link
Contributor

lmctv commented Jan 16, 2019

@reaperhulk I prepared a revert PR to save some time.

@blag blag mentioned this pull request Jan 16, 2019
@blag
Copy link
Contributor Author

blag commented Jan 17, 2019

@lmctv I have some questions around points of clarification:

keep encoding for the "all in one" cases

I don't know what you are referring to here. What are "all in one" cases?

deprecating the .message and the . signature attributes

That's fine, the entire SignedMessage class is a little weird anyway. It has three ways to use it:

  • bytes(smessage) - get the (encoded) signed message
  • smessage.signature - get the (separately, possibly differently, encoded) signature
  • smessage.message - get the (encoded) message

But users of this class still have to keep track of what encodings were used for each part, and they can differ (the "detached signature" case). It would make more sense - to me, if you are going to have a SignedMessage class at all - to only store the un-encoded byte array and then simply encode parts out on demand, something like this:

import nacl.bindings
import nacl.encoding

class SignedMessage(bytes):
    # Check Python 2 & 3 compatibility with this function signature
    def __init__(self, bytestream, encoder=None):
        # Guard against double encoding, since some users might pass
        # in hex-encoded or base64-encoded bytestreams, that we would
        # then double-encode when read out via .signature or .message
        if encoder is None:
            raise NotImplementedError("You must specify the encoding of "
                                      "SignedMessage objects with 'encoder'.")
        super(SignedMessge, self).__init__(encoder.decode(bytestream), **kwargs)

    @property
    def signature(self, encoder=nacl.encoding.RawEncoder):
        return encoder.encode(self[:nacl.bindings.crypto_sign_BYTES])

    @property
    def message(self, encoder=nacl.encoding.RawEncoder):
        return encoder.encode(self[nacl.bindings.crypto_sign_BYTES:])

That way there's no need for raw_ attributes, there's no need for recipients to keep track of how the signed message was encoded, and recipients can decode each part as they need to.

That is a rewrite of the API, however, so it would necessitate probably a major version bump.

replacing them with .raw_message and .raw_signature for advanced use

The prefix raw_ to me implies that these attributes would always be byte arrays. Is that what you mean? That still keeps a variant of the current, weird SignedMessage API.

Edit: clarified a few things.

@blag
Copy link
Contributor Author

blag commented Jan 18, 2019

@lmctv Can I get your feedback on ^ please.

lmctv added a commit to lmctv/pynacl that referenced this pull request Apr 13, 2019
This reverts commit 3384aa8.

Full discussion at pyca#504
lmctv added a commit to lmctv/pynacl that referenced this pull request Jun 23, 2019
This reverts commit 3384aa8.

Full discussion at pyca#504
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants