Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash functions for public keys and scripts #388

Open
wants to merge 2 commits into
base: master
from

Conversation

@dr-orlovsky
Copy link
Contributor

dr-orlovsky commented Jan 10, 2020

Introduces functions to compute bitcoin hashes for PublicKey and Script which were strangely missed out

@dr-orlovsky dr-orlovsky changed the title Hashtypes fns Hash functions for public keys and scripts Jan 10, 2020
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jan 10, 2020

Codecov Report

Merging #388 into master will decrease coverage by 0.61%.
The diff coverage is 94.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #388      +/-   ##
==========================================
- Coverage   85.55%   84.94%   -0.62%     
==========================================
  Files          40       40              
  Lines        8660     7311    -1349     
==========================================
- Hits         7409     6210    -1199     
+ Misses       1251     1101     -150
Impacted Files Coverage Δ
src/util/key.rs 75.44% <100%> (+2.67%) ⬆️
src/blockdata/script.rs 78.65% <84.61%> (-1.11%) ⬇️
src/network/message_blockdata.rs 89.23% <0%> (-3.7%) ⬇️
src/blockdata/block.rs 78.83% <0%> (-1.59%) ⬇️
src/util/contracthash.rs 78.5% <0%> (-1.29%) ⬇️
src/blockdata/transaction.rs 93.92% <0%> (-0.94%) ⬇️
src/util/hash.rs 93.75% <0%> (-0.85%) ⬇️
src/util/bip32.rs 87.14% <0%> (-0.83%) ⬇️
src/consensus/encode.rs 85.15% <0%> (-0.44%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7587c4b...acbef2d. Read the comment docs.

Copy link
Collaborator

stevenroose left a comment

I'm not sure I like to have these methods spread around. I'd be more interested in some kind of ScriptHash::from_script constructor so all the methods are confined to the hash types. But I'm also not sure of this opinion yet.

@@ -216,6 +217,16 @@ impl Script {
/// Creates a new empty script
pub fn new() -> Script { Script(vec![].into_boxed_slice()) }

/// Returns 160-bit hash of the script
pub fn script_hash(&self) -> ScriptHash {
ScriptHash::hash(&self.serialize())

This comment has been minimized.

Copy link
@stevenroose

stevenroose Jan 13, 2020

Collaborator

This makes an unnecessary allocation. Please serialize directly into the hashengine.


/// Returns 256-bit hash of the script for P2WSH outputs
pub fn wscript_hash(&self) -> WScriptHash {
WScriptHash::hash(&self.serialize())

This comment has been minimized.

Copy link
@stevenroose

stevenroose Jan 13, 2020

Collaborator

Idem.

/// Returns bitcoin 160-bit hash of the public key
pub fn pubkey_hash(&self) -> PubkeyHash {
if self.compressed {
PubkeyHash::hash(&self.key.serialize())

This comment has been minimized.

Copy link
@stevenroose

stevenroose Jan 13, 2020

Collaborator

Idem about allocation.

/// Returns bitcoin 160-bit hash of the public key for witness program
pub fn wpubkey_hash(&self) -> WPubkeyHash {
if self.compressed {
WPubkeyHash::hash(&self.key.serialize())

This comment has been minimized.

Copy link
@stevenroose

stevenroose Jan 13, 2020

Collaborator

Idem about allocation.

@dr-orlovsky

This comment has been minimized.

Copy link
Contributor Author

dr-orlovsky commented Jan 14, 2020

@stevenroose I am not happy with all hash-related stuff still. Hashes are used everywhere, and even after hash type refactoring there is a lot of duplicated code (#391 (comment)) or unclear hash generation conventions (txid, wtxid, bitcoin_hash, we got the rid of, block_hash). I would prefer to have a compatible APIs for Pubkeys/Scripts, so if we generate a hash from the data structure with *_hash method, why we need to change it now? Or if we change it, isn't it that the rest of hash generations have to be refactored, to have Txid::from_tx(&tx) instead of tx.txid()?

My arguments on this matter are the following:

  • Hashes are part of the bitcoin protocol rules, so there is always a definite set of hashes that need to be created out of bitcoin data structures. Mostly it's a single hash from a data structure, with the only known exclusion of Transaction. From this point of view, having the hash generation methods right in the data structures like <datatype>.<hash_type>() is more logical than delegating this to Hash type constructors with Hash::from_<datatype>. The later also seems to be more verbose (compare block.block_hash() with BlockHash::from(&block)).
  • If we have sicked to some hash type naming conventions (#284 (comment) and below), we shall follow the same with names for hash generation methods: txid()->Txid, block_hash()->BlockHash, wpubkey_hash()->WPubkeyHash i.e. we convert snake case of hash type into a camel case of method name with the exclusion of single-letter camel cases.
@stevenroose

This comment has been minimized.

Copy link
Collaborator

stevenroose commented Jan 14, 2020

Hmm, yeah I agree semantically it makes more sense to have <type>.<hash-type>() methods.

I don't think the engine-based hash generation needs a shortcut. It's actually pretty common to do that in many other languages: you create a hasher (engine), write data in (we use consensus_encode) and then finalize the hasher (Hash::from_engine).

@elichai

This comment has been minimized.

Copy link
Collaborator

elichai commented Jan 14, 2020

I don't think the engine-based hash generation needs a shortcut. It's actually pretty common to do that in many other languages: you create a hasher (engine), write data in (we use consensus_encode) and then finalize the hasher (Hash::from_engine).

I'd actually even say that most languages have only a hasher by default (except obviously python) even the rust sha2 crate by default shows to use the engine.

That's not to say that one line hash can't be useful. it is very useful. but I don't think that here and there doing 3 lines is that bad

@dr-orlovsky

This comment has been minimized.

Copy link
Contributor Author

dr-orlovsky commented Jan 14, 2020

I think I can explain the pattern with the Engine. It's kind of Java-style which have appeared from Enterprise patterns and used for doing some work that require internal state, costly at initialization. So instead of initializing all internal state time after time for each operation, a class is preferred, that stores that pre-initialized state and does operation-specific tasks in form of instance functions. However, this is not the case if you initialize such engine each time, so in our situation from best practice perspective, we either have to add a class-level method for engine to do the hashing task, or work on the persistence of pre-initialized HashEngine instances with sort of singleton pattern or such.

@elichai

This comment has been minimized.

Copy link
Collaborator

elichai commented Jan 14, 2020

I think I can explain the pattern with the Engine. It's kind of Java-style which have appeared from Enterprise patterns and used for doing some work that require internal state, costly at initialization. So instead of initializing all internal state time after time for each operation, a class is preferred, that stores that pre-initialized state and does operation-specific tasks in form of instance functions. However, this is not the case if you initialize such engine each time, so in our situation from best practice perspective, we either have to add a class-level method for engine to do the hashing task, or work on the persistence of pre-initialized HashEngine instances with sort of singleton pattern or such.

The reason hashes usually come in engines is different.
it meant such that you don't need to store all the data in a buffer to hash it, you can stream it into the engine and whenever the "block" is full it will run the internal hash function and continue streaming, that way you can ie easily hash a 1GB file without doing any heap allocation or something like that (taking minimal memory)

@dr-orlovsky

This comment has been minimized.

Copy link
Contributor Author

dr-orlovsky commented Jan 14, 2020

However bitcoin hashes operate with a very small-sized data, "streamed" in 95% cases at once, so they do not benefit from this pattern at all. And when this is the case (like Taproot tagged hashes), the midstate is stored in a different way and embedded into the type itself...

@stevenroose

This comment has been minimized.

Copy link
Collaborator

stevenroose commented Jan 15, 2020

I think even for small values, we can benefit from this pattern. There is by definition 0 overhead of using an engine over using the HashTrait::hash function, because that one uses an engine internally. So the difference is the allocation of the data when using the latter method.

So even for very small bits of data, where allocation might be small, it's still more efficient. But then there is the problem that even for small values, the compiler nor the code can estimate the size of the allocation. So in many cases there will be frequent re-allocations.

The two main things that are hashed are block headers and transactions. Headers are always 80 bytes, but the compiler doesn't know that and our code also doesn't anticipate that. So here it depends on the default allocation size of Vec<u8> to take enough space whether it will be 1 or multiple allocations.

For transactions, it becomes a lot more complex. There is really no way for the compiler, nor the serialization code, to know the size of a serialized tx. Transactions can range in size from a ~100 bytes to many kilobytes. While serializing them into a Vec<u8>, the vector will be frequently re-allocated to make more space.

I really think the usage of the hash engines you mention is both standard practice and not a problem for readability or concise-ness. It's 2 lines of boilerplate instead of 1.

@stevenroose

This comment has been minimized.

Copy link
Collaborator

stevenroose commented Jan 15, 2020

So I would suggest to reduce this PR to just adding those hash methods where they are missing, using the same pattern used elsewhere.

let mut engine = HashYouWant::engine();
whatever_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
more_data_to_be_hashes.consensus_encode(&mut engine).expect("engines don't error");
HashYouWant::from_engine(engine)
@dr-orlovsky

This comment has been minimized.

Copy link
Contributor Author

dr-orlovsky commented Jan 15, 2020

Maybe it worth adding something like

macro_rules! hash {
    ($hash:ident, $($item:expr),+) => {
        {
            let mut engine = $hash::engine();
            $(
                $item.consensus_encode(&mut engine).expect("engines don't error");
            )+
            $hash::from_engine(engine)
        }
    }
}
@stevenroose

This comment has been minimized.

Copy link
Collaborator

stevenroose commented Jan 15, 2020

While that looks nice in theory, in many cases it won't work because also other data is pushed into the hash engine. And I feel like it's better to always do the same thing instead of have a macro that hides what's actually going on in half of the cases.

With rg -trust "::engine()" -A 10 I found 7 cases where we could apply your macro, and 8 cases where we couldn't:

  • blockdata/block.rs:
109:        let mut encoder = WitnessCommitment::engine();
110-        witness_root.consensus_encode(&mut encoder).unwrap();
111-        encoder.input(witness_reserved_value);
112-        WitnessCommitment::from_engine(encoder)
  • util/bip32.rs:
566:        let mut engine = XpubIdentifier::engine();
567-        self.public_key.write_into(&mut engine);
568-        XpubIdentifier::from_engine(engine)
  • util/bip143.rs:
49:            let mut enc = SigHash::engine();
50-            for txin in &tx.input {
51-                txin.previous_output.consensus_encode(&mut enc).unwrap();
52-            }
53-            SigHash::from_engine(enc)
--
57:            let mut enc = SigHash::engine();
58-            for txin in &tx.input {
59-                txin.sequence.consensus_encode(&mut enc).unwrap();
60-            }
61-            SigHash::from_engine(enc)
--
65:            let mut enc = SigHash::engine();
66-            for txout in &tx.output {
67-                txout.consensus_encode(&mut enc).unwrap();
68-            }
69-            SigHash::from_engine(enc)
  • util/address.rs:
246:        let mut hash_engine = PubkeyHash::engine();
247-        pk.write_into(&mut hash_engine);
--
268:        let mut hash_engine = WPubkeyHash::engine();
269-        pk.write_into(&mut hash_engine);
--
283:        let mut hash_engine = WPubkeyHash::engine();
284-        pk.write_into(&mut hash_engine);
@dr-orlovsky

This comment has been minimized.

Copy link
Contributor Author

dr-orlovsky commented Jan 15, 2020

True. Ok, agree with you approach then. Sorry for taking that long to discuss: I'd like to understand rationale for design decisions in rust-bitcoin better (to follow it in other projects on top of it), that's why keep digging :)

@dr-orlovsky

This comment has been minimized.

Copy link
Contributor Author

dr-orlovsky commented Jan 23, 2020

We have to decide on closing this issue. While there is not a lot of use cases for the macro in rust-bitcoin itself, in the library we are building for L2/L3 (https://github.com/lnp-bp/rust-lnpbp) there are some; so it's either have to go into rust-bitcoin or be kept on top of it. What would you suggest?

The reference implementation is here: https://github.com/pandoracore/rust-bitcoin/blob/ddb3efff95d8b39ce7a6fee0514c18d1a9c785b6/src/hash_types.rs#L84

If you are up for it I'll add it to this PR instead of the current code. Otherwise I will close the request.

@apoelstra

This comment has been minimized.

Copy link
Member

apoelstra commented Jan 24, 2020

My feeling is that this functionality belongs in miniscript. I'm also not thrilled about doing things to support p2sh, which I feel is insecure (only 80 bits collision resistant), and is less efficient and more complicated than p2wsh in all cases.

@dr-orlovsky

This comment has been minimized.

Copy link
Contributor Author

dr-orlovsky commented Jan 24, 2020

I try to move it there, will share WIP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.