Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API request: decouple hashing process from signature generation / verification #1648

Closed
catlee opened this issue Feb 5, 2015 · 40 comments
Closed

Comments

@catlee
Copy link

catlee commented Feb 5, 2015

Currently to sign or verify a signature you need the entire message data and key in the same place so you can create the signer or verifier object. e.g. from the docs:

>>> signer = private_key.signer(
...     padding.PSS(
...         mgf=padding.MGF1(hashes.SHA256()),
...         salt_length=padding.PSS.MAX_LENGTH
...     ),
...     hashes.SHA256()
... )
>>> message = b"A message I want to sign"
>>> signer.update(message)
>>> signature = signer.finalize()

I'd like to be able to decouple this process so that I generate the hash of the message in one place, and then sign it in another place. So something like this:

>>> message = b"A message I want to sign"
>>> hasher = hashes.SHA256()
>>> hasher.update(message)
>>> digest = hasher.digest()

# Elsewhere....

>>> signer = private_key.signer(
...     padding.PSS(
...         mgf=padding.MGF1(hashes.SHA256()),
...         salt_length=padding.PSS.MAX_LENGTH
...     ),
...     hashes.SHA256()
... )
>>> signature = signer.sign_digest(digest)
@reaperhulk
Copy link
Member

We talked about this a bit in #1579 but at the time we didn't have any explicit user requests like yours. Your proposal of adding a sign_digest solves the issue with needing to know the hash to embed the proper OID in the signature.

@reaperhulk
Copy link
Member

Pinging @public and @alex for opinions. Since I'm still mostly without a decent net connection I haven't thought about this deeply yet, but more eyes are good :)

@alex
Copy link
Member

alex commented Feb 6, 2015

I'm still not sure I understand why this is desired; you say you want to generate the hash in one place and the signature in another, but not why.

@catlee
Copy link
Author

catlee commented Feb 6, 2015

In our build infrastructure we generate a lot of data that needs to be
signed. Right now we have the keys on dedicated servers, and the build
machines submit files to be signed over the network. This introduces quite
a bit of network load, since you're transferring all the unsigned data to
the signing servers, and then downloading the signed bits back.

If instead we could generate what is essentially the hash of the data to be
signed and send that to the server, it can respond with just the signature,
which we can include in the final package. This would be orders of
magnitude less data to transfer. I'm looking at RSA-PKCS1-SHA1 signatures
right now, but I'm hoping to apply this technique to other types of signing
that we do.

On Fri, Feb 6, 2015 at 4:19 PM, Alex Gaynor notifications@github.com
wrote:

I'm still not sure I understand why this is desired; you say you want
to generate the hash in one place and the signature in another, but not why.


Reply to this email directly or view it on GitHub
#1648 (comment).

@dstufft
Copy link
Member

dstufft commented Feb 6, 2015

You could just sign the hash of the files instead of the file itself.

On Feb 6, 2015, at 4:25 PM, Chris AtLee notifications@github.com wrote:

In our build infrastructure we generate a lot of data that needs to be
signed. Right now we have the keys on dedicated servers, and the build
machines submit files to be signed over the network. This introduces quite
a bit of network load, since you're transferring all the unsigned data to
the signing servers, and then downloading the signed bits back.

If instead we could generate what is essentially the hash of the data to be
signed and send that to the server, it can respond with just the signature,
which we can include in the final package. This would be orders of
magnitude less data to transfer. I'm looking at RSA-PKCS1-SHA1 signatures
right now, but I'm hoping to apply this technique to other types of signing
that we do.

On Fri, Feb 6, 2015 at 4:19 PM, Alex Gaynor notifications@github.com
wrote:

I'm still not sure I understand why this is desired; you say you want
to generate the hash in one place and the signature in another, but not why.


Reply to this email directly or view it on GitHub
#1648 (comment).


Reply to this email directly or view it on GitHub.

@catlee
Copy link
Author

catlee commented Feb 6, 2015

The signature needs to be embedded into the file itself.

On Fri, Feb 6, 2015 at 4:28 PM, Donald Stufft notifications@github.com
wrote:

You could just sign the hash of the files instead of the file itself.

On Feb 6, 2015, at 4:25 PM, Chris AtLee notifications@github.com
wrote:

In our build infrastructure we generate a lot of data that needs to be
signed. Right now we have the keys on dedicated servers, and the build
machines submit files to be signed over the network. This introduces
quite
a bit of network load, since you're transferring all the unsigned data to
the signing servers, and then downloading the signed bits back.

If instead we could generate what is essentially the hash of the data to
be
signed and send that to the server, it can respond with just the
signature,
which we can include in the final package. This would be orders of
magnitude less data to transfer. I'm looking at RSA-PKCS1-SHA1 signatures
right now, but I'm hoping to apply this technique to other types of
signing
that we do.

On Fri, Feb 6, 2015 at 4:19 PM, Alex Gaynor notifications@github.com
wrote:

I'm still not sure I understand why this is desired; you say you want
to generate the hash in one place and the signature in another, but
not why.


Reply to this email directly or view it on GitHub
<
https://github.com/pyca/cryptography/issues/1648#issuecomment-73314256>.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#1648 (comment).

@alex
Copy link
Member

alex commented Feb 6, 2015

So, I think this use case makes sense, but I'm not sure about the
key.signer(...).sign_digest() API, it seems like key.sign_digest()
makes more sense?

On Fri, Feb 6, 2015 at 1:29 PM, Chris AtLee notifications@github.com
wrote:

The signature needs to be embedded into the file itself.

On Fri, Feb 6, 2015 at 4:28 PM, Donald Stufft notifications@github.com
wrote:

You could just sign the hash of the files instead of the file itself.

On Feb 6, 2015, at 4:25 PM, Chris AtLee notifications@github.com
wrote:

In our build infrastructure we generate a lot of data that needs to be
signed. Right now we have the keys on dedicated servers, and the build
machines submit files to be signed over the network. This introduces
quite
a bit of network load, since you're transferring all the unsigned data
to
the signing servers, and then downloading the signed bits back.

If instead we could generate what is essentially the hash of the data
to
be
signed and send that to the server, it can respond with just the
signature,
which we can include in the final package. This would be orders of
magnitude less data to transfer. I'm looking at RSA-PKCS1-SHA1
signatures
right now, but I'm hoping to apply this technique to other types of
signing
that we do.

On Fri, Feb 6, 2015 at 4:19 PM, Alex Gaynor notifications@github.com
wrote:

I'm still not sure I understand why this is desired; you say you
want
to generate the hash in one place and the signature in another, but
not why.


Reply to this email directly or view it on GitHub
<
https://github.com/pyca/cryptography/issues/1648#issuecomment-73314256>.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
<#1648 (comment)
.


Reply to this email directly or view it on GitHub
#1648 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

@reaperhulk
Copy link
Member

I agree with a one-shot sign_digest making more sense on the key itself rather than on the AsymmetricSignerContext. We would need to add this method to the interface for RSAPrivateKey, DSAPrivateKey and EllipticCurvePrivateKey, but that expansion + implementation can obviously be done in multiple PRs.

@reaperhulk
Copy link
Member

Some notes for future implementation for RSA:

In OpenSSL 1.0.1 we can use EVP_PKEY_sign to accomplish this easily. 0.9.8 is trickier.

We'll need to add RSA_padding_add_PKCS1_type_1 (http://wiki.openssl.org/index.php/Manual:RSA_padding_add_PKCS1_type_1(3)) to the bindings. Unfortunately that doesn't add the ASN.1 data that defines the message digest (unlike RSA_padding_add_PKCS1_PSS), so we'll need to find a way to add those bytes before calling the padding function. (See: http://tools.ietf.org/html/rfc3447#appendix-A.2.4 for information about the DigestInfo sequence).

Once that's done we can call RSA_private_encrypt (yes that signs) to sign it. This is the same function we already use for PSS on 0.9.8.

@reaperhulk
Copy link
Member

This is a much simpler problem on EC/ECDSA where the signature algorithm isn't embedded as metadata in the signature.

@reaperhulk reaperhulk added this to the Ninth Release milestone Feb 20, 2015
@stavxyz
Copy link

stavxyz commented May 14, 2015

+1

Implementing Chef's authentication protocol would require something like this, although in this case it is not a pre-computed digest, it's the raw data, as is. More specifically, to generate the API request headers. Here's my attempt in a gist.

https://docs.chef.io/auth.html
https://docs.chef.io/auth.html#header-format

@reaperhulk reaperhulk removed this from the Tenth Release milestone Jun 17, 2015
@reaperhulk
Copy link
Member

Thinking about this more I'm wondering if we want to deprecate the entirety of the asymmetric signature and verification contexts? What they do is entirely covered by replacing them with a signing method that looks like key.sign(data, padding_type, optional_hash_algorithm_name) and providing various examples of use. e.g.

# traditional hash-then-sign
ctx = hashes.Hash(hashes.SHA256(), backend)
ctx.update(data)
digest = ctx.finalize()
signature = key.sign(digest, padding.PKCS1v15(), hashes.SHA256())

This example is less elegant than the current asymmetric contexts in that you have to pass SHA256 twice (due to the DigestInfo syntax in the signature), but it also allows this:

# pass existing digest
signature = key.sign(digest, padding.PKCS1v15(), hashes.SHA256())

or even potentially signing ugly things like chef's auth protocol (this presupposes that we don't generate DigestInfo if no hash algorithm is passed):

signature = key.sign(data, padding.PKCSv15(), None)

@lvh
Copy link
Member

lvh commented Aug 19, 2015

@reaperhulk Interesting. I'm all for killing anything named context, but I do find the traditional hash-then-sign example confusing; it looks like you're hashing the hash and then signing. Can key.sign take a hash context, which knows both about the hash in question and the intermediate state?

@asyd
Copy link

asyd commented Aug 27, 2015

+1 ! I need no padding to implement the particle protocol (http://particle.io) in python!

@dstufft
Copy link
Member

dstufft commented Aug 27, 2015

@reaperhulk Doesn't the single shot API rule out signing something that doesn't fit into memory?

@reaperhulk
Copy link
Member

@dstufft not necessarily. If the API just signs whatever bytes you provide then you hash your data with a hashcontext incrementally and sign the digest bytes.

@dstufft
Copy link
Member

dstufft commented Aug 27, 2015

@reaperhulk Right, but that assumes you have control over which bytes you need to sign.

@reaperhulk
Copy link
Member

@dstufft When you say control do you mean "you can manipulate the structure of the bytes before having the RSA/DSA signer sign them" or something else?

@dstufft
Copy link
Member

dstufft commented Aug 27, 2015

Maybe I don't understand what you're actually proposing there since I'm not actually familiar with these APIs, I'm just concerned when we introduce one shot APIs instead of incremental APIs. What if someone wants to directly sign a 1GB chunk of data on a system that has less than a GB of RAM?

@dstufft
Copy link
Member

dstufft commented Aug 27, 2015

By wants I of course mean, they are interacting with something that mandates that.

@reaperhulk
Copy link
Member

With RSA/DSA signing the total payload you sign must be <= the size of the private key (how much less depends on a variety of factors, but usually it's quite a few bytes less). So, for example, anything you sign with RSA will be less than 256 bytes if you use a 2048-bit key. This forces you to sign either very small payloads or hashes, so any one shot API we define here would only move the incremental processing to a different part of the code. For example, right now if you want to sign a 2GB file it'd look something like:

signer = private_key.signer(
    padding.PKCS1v15(),
    hashes.SHA256()
)
for data in some_big_file:
    signer.update(data)

signature = signer.sign()

With a "one shot" API it'd look more like

ctx = hashes.Hash(hashes.SHA256(), backend)
for data in some_big_file:
    ctx.update(data)
digest = ctx.finalize()
signature = key.sign(digest, padding.PKCS1v15(), hashes.SHA256())

The confusion might be arising from the fact that the final arg of the block above appears to be specifying how the data will be hashed. In reality it's just a lousy API that tells the sign function how to construct an ASN.1 header describing what's in the digest.

We've tried very hard to escape raw signing functions, but I'm no longer sure that's the right approach. For composability (but perhaps not security) it would make more sense to just have key.sign do a raw sign and force you to pad everything beforehand. Then things like PKCS1 vs PSS and whether or not you have a DigestInfo sequence or not become things you do to the bytes before you pass them to the signer.

@stavxyz
Copy link

stavxyz commented Aug 28, 2015

To get this to work in the meantime, I created a no-op HashContext which gets used when None is passed as the hashing algorithm to the signer() method, like pkey.signer(padding.PKCS1v15(), None)

stavxyz@5414a5c

@danpascu
Copy link

I have a similar problem. I'm trying to implement the OTR protocol and in my case what I need to sign is the HMAC of a message, produced with a given key and the SHA256 algorithm. In my case the problem would be solved if the signer could be build with a HashContext rather than an algorithm, seeing that hmac.HMAC already implements the HashContext interface. For me this would do:

have private_key.signer(hash_context) instead of private_key.signer(hash_algorithm) like in:

signer = private_key.signer(hmac.HMAC(mac_key, hashes.SHA256(), backend))
signer.update(message)
signature = signer.finalize()

This would also preserve the existing functionality as one could pass hashes.Hash(algorithm) as the hash context to get the current behavior.

Same goes for the verifier.

Edit: in my case I need to use a DSA key to sign/verify the message, not sure how this applies to other asymmetric key algorithms.

joernheissler pushed a commit to joernheissler/cryptography that referenced this issue Jan 6, 2016
@Badg
Copy link
Contributor

Badg commented Feb 20, 2016

Realizing I'm late to the discussion, but I thought I'd add a belated +1 (and this seemed more appropriate a place than #1529). I think the argument for a streaming digest is a good one, but in my case it's a much bigger problem that I don't have access to the digest itself.

My use case is (for the purposes of this discussion) conceptually similar to IPFS in that I'm using the hash digest as a content address. The digest is then embedded into the file, and then that is signed -- so we need access to the digest. Actually, come to think of it, any application where the digest is embedded into a file/packet/message needs some kind of workaround for the current API.

There are some other quirks imposed by the parsing library we're using, that would make it easier (and a bit safer -- verifying the content address sooner rather than later seems pertinent) if we could separate signing from hashing, but it's not worth detailing here because it's essentially an equivalent argument to the performance considerations that @reaperhulk discussed.

@reaperhulk
Copy link
Member

One shot APIs for sign/verify are now available so we could potentially now implement this by allowing padding to be None.

@Badg
Copy link
Contributor

Badg commented Aug 27, 2016

I'm not clear on the innards of the backend calls, so it's entirely possible I'm completely off-base here (or maybe I'm talking about separating a different hash? either way, please let me know if I'm wandering!), but I'm not sure that's the case. At least with the openSSL backend, the RSA signature context is making an internal call to self._hash_ctx.finalize() about halfway through the context.finalize() operation and using that as the data to sign. Meanwhile the one-shot signing is just wrapping the existing hazmat api calls into a single function:

    def sign(self, data, padding, algorithm):
        signer = self.signer(padding, algorithm)
        signer.update(data)
        signature = signer.finalize()
        return signature

So it seems to me that, to circumvent the internal hashing, the context.finalize() method would need to be split into two separate operations: generating the data, and then signing it.

As an alternative (that I think is very inelegant, but gets the job done), there could be a dedicated noop hash function. I have a workaround for a project I'm working on that does this, by replacing the context._hash_ctx attribute after the context is created, for example:

class _NoopSHA512(hashes.SHA512):
    def __init__(self, noopdata, *args, **kwargs):
        self.__data = noopdata
        super().__init__(*args, **kwargs)
        self.algorithm = self

    def copy(self):
        ''' Total NOOP, because self cannot change.
        '''
        return self

    def update(self, data):
        # Noop noop noop
        pass

    def finalize(self):
        # Yay we get to do something!
        return self.__data
class NeedsASignature:
    def _sign(self, data):
        ''' Signing method.
        '''
        signer = self._signature_key.signer(
            padding.PSS(
                mgf = padding.MGF1(hashes.SHA512()),
                salt_length = _PSS_SALT_LENGTH
            ),
            hashes.SHA512()
        )
        signer._hash_ctx = _NoopSHA512(data)
        return signer.finalize()

I've verified this as working against an existing implementation using a different crypto library (a pycrypto fork), but unless I modified the internal _hash_ctx it was inevitably re-hashing the existing data.

Another (slightly less inelegant but still quick and dirty) approach would be to add an optional kwarg to context.finalize() to bypass the hash finalization, for example:

    def finalize(self, data_to_sign=None):
        if data_to_sign is None:
            data_to_sign = self._hash_ctx.finalize()

If desired, this could also be baked into the one-shot API, so that it calls sign directly on the data.

Is there an easier way that I'm missing? I'd be happy to help contribute a patch for this, particularly since I'm not very keen on my current workaround, but I would definitely defer to pyca on what approach seems best.

@reaperhulk
Copy link
Member

@Badg You aren't wrong about the internals right now, but there's nothing that prevents us from altering those internals to support new features. We could have an entirely separate code path for the one shot APIs if None is passed for padding.

My biggest concern with supporting this type of thing is that it exposes textbook RSA. The type of people we want to expose it for understand that risk, but things like this have a tendency to get cargo-culted around so whatever we ultimately design I want it to be as clear as possible that this is dangerous. Something like:

private_key.sign(data=data, padding=None, algorithm=None, danger_zone=True)

@alex
Copy link
Member

alex commented Aug 29, 2016

Wait, how does it expose textbook RSA? There are two seperate things here,
direct RSA signatures, and pre-hashed data.

On Sun, Aug 28, 2016 at 9:20 PM, Paul Kehrer notifications@github.com
wrote:

@Badg https://github.com/Badg You aren't wrong about the internals right
now
, but there's nothing that prevents us from altering those internals
to support new features. We could have an entirely separate code path for
the one shot APIs if None is passed for padding.

My biggest concern with supporting this type of thing is that it exposes
textbook RSA. The type of people we want to expose it for understand that
risk, but things like this have a tendency to get cargo-culted around so
whatever we ultimately design I want it to be as clear as possible that
this is dangerous. Something like:

private_key.sign(data=data, padding=None, algorithm=None, danger_zone=True)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1648 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAADBDZd-jrFNs4AIijTjT9PtMG9yDw2ks5qkjPogaJpZM4DcLWf
.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: D1B3 ADC0 E023 8CA6

@reaperhulk
Copy link
Member

@alex at least one of the use cases we've talked about in the past involved directly signing unpadded, unhashed data (I believe it was chef?)

@alex
Copy link
Member

alex commented Aug 29, 2016

Ok, so that's one thing. Let's solve one use case at a time :-) I think
hashing can be solved with algorithm=PreHashedData(SHA256())

On Sun, Aug 28, 2016 at 9:22 PM, Paul Kehrer notifications@github.com
wrote:

@alex https://github.com/alex at least one of the use cases we've
talked about in the past involved directly signing unpadded, unhashed data
(I believe it was chef?)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1648 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAADBAZ7scZ-3aODKXI7_uBgoPfjieYFks5qkjRfgaJpZM4DcLWf
.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: D1B3 ADC0 E023 8CA6

@Badg
Copy link
Contributor

Badg commented Aug 29, 2016

Ah, yeah, like @alex I was also getting confused re: use cases. I've just been considering pre-hashed data that still uses padding; I don't think I read up the chain far enough to realize padding removal was in the mix too.

Where the external API for pre-hashed data is concerned, is there a preference between:

  1. Passing an explicit None to private_key.sign(), something like this:
    private_key.sign(data=data, padding=padding, algorithm=None)
  2. Passing an explicit noop hash to private_key.sign(), something like this:
    private_key.sign(data=data, padding=padding, algorithm=PreHashed(SHA256()))

@reaperhulk
Copy link
Member

For the case where we want to precompute the hash we still need to know the digest algorithm (as that information is embedded in the ASN.1 structure of the padding as we construct it) so a PreHashed(SHA256()) arg is better than None.

@reaperhulk
Copy link
Member

Here's a potential approach for supporting Prehashed. I'm unsure where to put the Prehashed class right now and we probably will need to put something to error if you try to use prehashing with the signaturecontext.

@reaperhulk reaperhulk added this to the Sixteenth Release milestone Aug 29, 2016
@Badg
Copy link
Contributor

Badg commented Aug 31, 2016

Prehashed already raises a TypeError if you try to use it in the signature context, because it isn't a hashes.Hash() instance. Is that sufficient? I verified this with a dedicated test here.

It took me the whole morning to get a dev environment set up for cryptography (Windows development... ugh), so I can't really put much more time into it until the weekend, but @reaperhulk I can mirror your refactoring on the verification side, and add a test for that as well (half using context, half using single-shot, just like yours).

@reaperhulk
Copy link
Member

@Badg I think I want an explicit "Prehashed is only allowed with one shot APIs" or something just to avoid future issues being filed.

If you want to grab my changes and then implement it on the verify side, write documentation, and find a home for Prehashed I'm happy to review! If you don't have time I'll probably take a look at this again in a week or two so we can get it completed before 1.6 release.

@Badg
Copy link
Contributor

Badg commented Oct 15, 2016

I've been totally swamped by work and haven't had a chance to look at this any further. As much as I really want to help out I'm not entirely sure what my schedule is going to be like when I'm under such extreme financial pressure to push a product launch, so if you need to go for it, don't block on me!

@alex
Copy link
Member

alex commented Nov 20, 2016

done!

@bryan-diehl
Copy link

Another use case requiring Hash to be separate from Digital Signature:
I am using Openssl (command line for initial testing and then api for final code) to generate the SHA256 hash, and then generating the ECDSA Signature separately because in the final systme the ECDSA Signature will be generated in hardware TPM. But I need to be able to test the system without TPM using openssl (again command line initially and then api c code later).
So my request is to support ECDSA signature separate from Hash for both command line and for api.
Thank you!

@reaperhulk
Copy link
Member

This issue was closed 2 years ago because it was completed. You can do this via prehashed. An elliptic curve example is here: https://cryptography.io/en/latest/hazmat/primitives/asymmetric/ec/

@bryan-diehl
Copy link

Thank you!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

10 participants