-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API request: decouple hashing process from signature generation / verification #1648
Comments
We talked about this a bit in #1579 but at the time we didn't have any explicit user requests like yours. Your proposal of adding a |
I'm still not sure I understand why this is desired; you say you want to generate the hash in one place and the signature in another, but not why. |
In our build infrastructure we generate a lot of data that needs to be If instead we could generate what is essentially the hash of the data to be On Fri, Feb 6, 2015 at 4:19 PM, Alex Gaynor notifications@github.com
|
You could just sign the hash of the files instead of the file itself.
|
The signature needs to be embedded into the file itself. On Fri, Feb 6, 2015 at 4:28 PM, Donald Stufft notifications@github.com
|
So, I think this use case makes sense, but I'm not sure about the On Fri, Feb 6, 2015 at 1:29 PM, Chris AtLee notifications@github.com
"I disapprove of what you say, but I will defend to the death your right to |
I agree with a one-shot |
Some notes for future implementation for RSA: In OpenSSL 1.0.1 we can use We'll need to add Once that's done we can call |
This is a much simpler problem on EC/ECDSA where the signature algorithm isn't embedded as metadata in the signature. |
+1 Implementing Chef's authentication protocol would require something like this, although in this case it is not a pre-computed digest, it's the raw data, as is. More specifically, to generate the API request headers. Here's my attempt in a gist. https://docs.chef.io/auth.html |
Thinking about this more I'm wondering if we want to deprecate the entirety of the asymmetric signature and verification contexts? What they do is entirely covered by replacing them with a signing method that looks like # traditional hash-then-sign
ctx = hashes.Hash(hashes.SHA256(), backend)
ctx.update(data)
digest = ctx.finalize()
signature = key.sign(digest, padding.PKCS1v15(), hashes.SHA256()) This example is less elegant than the current asymmetric contexts in that you have to pass SHA256 twice (due to the DigestInfo syntax in the signature), but it also allows this: # pass existing digest
signature = key.sign(digest, padding.PKCS1v15(), hashes.SHA256()) or even potentially signing ugly things like chef's auth protocol (this presupposes that we don't generate DigestInfo if no hash algorithm is passed): signature = key.sign(data, padding.PKCSv15(), None) |
@reaperhulk Interesting. I'm all for killing anything named context, but I do find the traditional hash-then-sign example confusing; it looks like you're hashing the hash and then signing. Can key.sign take a hash context, which knows both about the hash in question and the intermediate state? |
+1 ! I need no padding to implement the particle protocol (http://particle.io) in python! |
@reaperhulk Doesn't the single shot API rule out signing something that doesn't fit into memory? |
@dstufft not necessarily. If the API just signs whatever bytes you provide then you hash your data with a hashcontext incrementally and sign the digest bytes. |
@reaperhulk Right, but that assumes you have control over which bytes you need to sign. |
@dstufft When you say control do you mean "you can manipulate the structure of the bytes before having the RSA/DSA signer sign them" or something else? |
Maybe I don't understand what you're actually proposing there since I'm not actually familiar with these APIs, I'm just concerned when we introduce one shot APIs instead of incremental APIs. What if someone wants to directly sign a 1GB chunk of data on a system that has less than a GB of RAM? |
By wants I of course mean, they are interacting with something that mandates that. |
With RSA/DSA signing the total payload you sign must be <= the size of the private key (how much less depends on a variety of factors, but usually it's quite a few bytes less). So, for example, anything you sign with RSA will be less than 256 bytes if you use a 2048-bit key. This forces you to sign either very small payloads or hashes, so any one shot API we define here would only move the incremental processing to a different part of the code. For example, right now if you want to sign a 2GB file it'd look something like: signer = private_key.signer(
padding.PKCS1v15(),
hashes.SHA256()
)
for data in some_big_file:
signer.update(data)
signature = signer.sign() With a "one shot" API it'd look more like ctx = hashes.Hash(hashes.SHA256(), backend)
for data in some_big_file:
ctx.update(data)
digest = ctx.finalize()
signature = key.sign(digest, padding.PKCS1v15(), hashes.SHA256()) The confusion might be arising from the fact that the final arg of the block above appears to be specifying how the data will be hashed. In reality it's just a lousy API that tells the sign function how to construct an ASN.1 header describing what's in the digest. We've tried very hard to escape raw signing functions, but I'm no longer sure that's the right approach. For composability (but perhaps not security) it would make more sense to just have |
To get this to work in the meantime, I created a no-op HashContext which gets used when |
I have a similar problem. I'm trying to implement the OTR protocol and in my case what I need to sign is the HMAC of a message, produced with a given key and the SHA256 algorithm. In my case the problem would be solved if the signer could be build with a HashContext rather than an algorithm, seeing that hmac.HMAC already implements the HashContext interface. For me this would do: have private_key.signer(hash_context) instead of private_key.signer(hash_algorithm) like in: signer = private_key.signer(hmac.HMAC(mac_key, hashes.SHA256(), backend)) This would also preserve the existing functionality as one could pass hashes.Hash(algorithm) as the hash context to get the current behavior. Same goes for the verifier. Edit: in my case I need to use a DSA key to sign/verify the message, not sure how this applies to other asymmetric key algorithms. |
Realizing I'm late to the discussion, but I thought I'd add a belated +1 (and this seemed more appropriate a place than #1529). I think the argument for a streaming digest is a good one, but in my case it's a much bigger problem that I don't have access to the digest itself. My use case is (for the purposes of this discussion) conceptually similar to IPFS in that I'm using the hash digest as a content address. The digest is then embedded into the file, and then that is signed -- so we need access to the digest. Actually, come to think of it, any application where the digest is embedded into a file/packet/message needs some kind of workaround for the current API. There are some other quirks imposed by the parsing library we're using, that would make it easier (and a bit safer -- verifying the content address sooner rather than later seems pertinent) if we could separate signing from hashing, but it's not worth detailing here because it's essentially an equivalent argument to the performance considerations that @reaperhulk discussed. |
One shot APIs for sign/verify are now available so we could potentially now implement this by allowing |
I'm not clear on the innards of the backend calls, so it's entirely possible I'm completely off-base here (or maybe I'm talking about separating a different hash? either way, please let me know if I'm wandering!), but I'm not sure that's the case. At least with the openSSL backend, the RSA signature context is making an internal call to def sign(self, data, padding, algorithm):
signer = self.signer(padding, algorithm)
signer.update(data)
signature = signer.finalize()
return signature So it seems to me that, to circumvent the internal hashing, the context.finalize() method would need to be split into two separate operations: generating the data, and then signing it. As an alternative (that I think is very inelegant, but gets the job done), there could be a dedicated noop hash function. I have a workaround for a project I'm working on that does this, by replacing the context._hash_ctx attribute after the context is created, for example: class _NoopSHA512(hashes.SHA512):
def __init__(self, noopdata, *args, **kwargs):
self.__data = noopdata
super().__init__(*args, **kwargs)
self.algorithm = self
def copy(self):
''' Total NOOP, because self cannot change.
'''
return self
def update(self, data):
# Noop noop noop
pass
def finalize(self):
# Yay we get to do something!
return self.__data class NeedsASignature:
def _sign(self, data):
''' Signing method.
'''
signer = self._signature_key.signer(
padding.PSS(
mgf = padding.MGF1(hashes.SHA512()),
salt_length = _PSS_SALT_LENGTH
),
hashes.SHA512()
)
signer._hash_ctx = _NoopSHA512(data)
return signer.finalize() I've verified this as working against an existing implementation using a different crypto library (a pycrypto fork), but unless I modified the internal Another (slightly less inelegant but still quick and dirty) approach would be to add an optional kwarg to context.finalize() to bypass the hash finalization, for example: def finalize(self, data_to_sign=None):
if data_to_sign is None:
data_to_sign = self._hash_ctx.finalize() If desired, this could also be baked into the one-shot API, so that it calls sign directly on the data. Is there an easier way that I'm missing? I'd be happy to help contribute a patch for this, particularly since I'm not very keen on my current workaround, but I would definitely defer to pyca on what approach seems best. |
@Badg You aren't wrong about the internals right now, but there's nothing that prevents us from altering those internals to support new features. We could have an entirely separate code path for the one shot APIs if My biggest concern with supporting this type of thing is that it exposes textbook RSA. The type of people we want to expose it for understand that risk, but things like this have a tendency to get cargo-culted around so whatever we ultimately design I want it to be as clear as possible that this is dangerous. Something like: private_key.sign(data=data, padding=None, algorithm=None, danger_zone=True) |
Wait, how does it expose textbook RSA? There are two seperate things here, On Sun, Aug 28, 2016 at 9:20 PM, Paul Kehrer notifications@github.com
"I disapprove of what you say, but I will defend to the death your right to |
@alex at least one of the use cases we've talked about in the past involved directly signing unpadded, unhashed data (I believe it was chef?) |
Ok, so that's one thing. Let's solve one use case at a time :-) I think On Sun, Aug 28, 2016 at 9:22 PM, Paul Kehrer notifications@github.com
"I disapprove of what you say, but I will defend to the death your right to |
Ah, yeah, like @alex I was also getting confused re: use cases. I've just been considering pre-hashed data that still uses padding; I don't think I read up the chain far enough to realize padding removal was in the mix too. Where the external API for pre-hashed data is concerned, is there a preference between:
|
For the case where we want to precompute the hash we still need to know the digest algorithm (as that information is embedded in the ASN.1 structure of the padding as we construct it) so a |
Here's a potential approach for supporting |
It took me the whole morning to get a dev environment set up for |
@Badg I think I want an explicit "Prehashed is only allowed with one shot APIs" or something just to avoid future issues being filed. If you want to grab my changes and then implement it on the verify side, write documentation, and find a home for |
I've been totally swamped by work and haven't had a chance to look at this any further. As much as I really want to help out I'm not entirely sure what my schedule is going to be like when I'm under such extreme financial pressure to push a product launch, so if you need to go for it, don't block on me! |
done! |
Another use case requiring Hash to be separate from Digital Signature: |
This issue was closed 2 years ago because it was completed. You can do this via prehashed. An elliptic curve example is here: https://cryptography.io/en/latest/hazmat/primitives/asymmetric/ec/ |
Thank you! |
Currently to sign or verify a signature you need the entire message data and key in the same place so you can create the signer or verifier object. e.g. from the docs:
I'd like to be able to decouple this process so that I generate the hash of the message in one place, and then sign it in another place. So something like this:
The text was updated successfully, but these errors were encountered: