Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse bitcoin_hash in memory #144

Merged
merged 1 commit into from
Dec 11, 2014
Merged

Reverse bitcoin_hash in memory #144

merged 1 commit into from
Dec 11, 2014

Conversation

swansontec
Copy link
Contributor

This is a pretty big change. Please see the email on the mailing list. Let's hold off on merging this for a day or two, to be sure it's stable.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

This looks good. I'm queuing up patches for -protocol/-client/-explorer.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

So is it true that the obelisk wire protocol hash is reversed and the in-memory representation is now "unreversed", meaning that we need to explicitly reverse in -protocol (and will need to in new -server code)? Doing so fixes test failures in -protocol.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

I'm going to templatize encode/decode_hash as follows:

template <typename Hash>
std::string encode_hash(const Hash& hash);
template <typename Hash>
bool decode_hash(Hash& out, const std::string& in)

Because we lost implementation for the other hash sizes.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

We have to make a decision for BX on hash encoding. A large number of tests are failing due to the reversal. Wire encoding is what is typically presented, such as in blockchain.info info. The easiest way to repair the tests, existing documentation, and to retain correlation with other services, is to reverse the hash for all presentation. That sort of begs the question however, why we wouldn't just treat the reversed format as canonical internally.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

I tried reversing the output for hash encode and decode, but the problem is much deeper than that. Any libbitcoin function that accepts or returns, directly or indirectly, a bitcoin_hash or a bitcoin_short_hash will exhibit the behavior change. So all code that has taken a dependency on these functions will need to be modified, and without a comprehensive test suite it would be nearly impossible to do this reliably. Even with such coverage in BX I cringe at having to make this change. Given that the public and wire representation of these hashes is generally the reversed representation I think we are much better off not attempting this change and instead treating these hashes as opaque data.

@genjix
Copy link
Contributor

genjix commented Dec 3, 2014

As an API our constraints are different to a monolithic blob, so I tried to keep the representation close to what people expect. There are 2 options: custom hash type with reversed representation, or reversing the data in hashing functions (which is what libbitcoin does).

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

@genjix I think that they key question is "what people expect". While there are a number of conversions internally to accomplish this, the API does expose what people expect to see. I think this is the proper choice as it hides the confusion in the library implementation instead of exposing it to API users.

BX uses most of the libbitcoin APIs and after looking through the tests failures, and trying to make it work, it's clear that in order to represent what people expect there would be conversions in a lot of places, where there are none presently.

@swansontec
Copy link
Contributor Author

I'm sorry, but both you guys you have it all wrong. In the old system, everything followed this double-reversal pattern:

  • sha256(sha256) → reverse → memory → reverse → serializer
  • sha256(sha256) → reverse → memory → reverse → signature
  • sha256(sha256) → reverse → memory → reverse → script ops
  • sha256(sha256) → reverse → memory → reverse → checksum
  • sha256(sha256) → reverse → memory → reverse → block difficulty
  • sha256(sha256) → reverse → memory → base16

As you can see, the only operation that isn't double-reversed is the base16 conversion. This means that everything external to the libraries is already the right way around! The obelisk wire protocol, for instance, has the bytes in the same order you would get from sha256(sha256), since the serializer was un-reversing them.

In the new system, things work like this:

  • sha256(sha256) → memory → serializer
  • sha256(sha256) → memory → signature
  • sha256(sha256) → memory → script ops
  • sha256(sha256) → memory → checksum
  • sha256(sha256) → memory → block difficulty
  • sha256(sha256) → memory → reverse → text

Now, instead of reversing bytes everywhere, we reverse the bytes in one place - the text conversion. This is why encode_hash and decode_hash must NOT be templates, since they ONLY for bitcoin hashes. Nothing else has this crazy reversing requirement.

So, I have good confidence that I can fix BX completely, reliably, and correctly by swapping out a grand total of 3 functions. None of the other libraries should need any changes whatsoever.

As far as outside library users like are concerned, yes, this breaks API. However, the current API is insane, and is not what anybody would expect. It is utterly inconsistent with all other Bitcoin software. Just look at that first chart again, and notice that every operation that touches the blockchain has to un-reverse the hash before it can proceed.

Here at AirBitz, these reversed hashes have bitten us pretty hard. Any time we go to cobble together a tool in another language, like Python, we have to account for the fact that the binary data inside our database is reversed. We would not be having these nightmares if the binary data had been the right way round in the first place.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

The experience I had after checking out the changes was that compilation was relatively easy to achieve and I had a handful of test failures in libbitcoin-protocol (all that hit the decode_hash calls) and a large number in libbitcoin-explorer.

The libbitcoin-protocol failures are fixed by reversing the bc::decode_hash() calls. This doesn't sound like what you expected, so please confirm.

In bx, after reversing primitives::btc160 and primitives::btc256 serialization << and deserialization >> there are still had a large number of test failures. But I'll take another look at it, since what you are saying makes sense. I may have missed something.

@swansontec
Copy link
Contributor Author

I'll take a poke as well. One requirement with this new branch is that all bitcoin_hashes must go through encode_hash and decode_hash for their text representations, and that no other hashes should go through these functions. We never had this requirement before. As a result, we are quite sloppy with our text ↔ binary conversions, especially in the unit tests. For example, we were using decode_hash to decode EC private keys in some of our unit tests, which obviously won't work now. The new base16_literal provides a nice alternative to decode_hash, at least for the unit tests.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

I noticed something along these lines, which is why I want to take another look. I'm tied up in the clang/osx Travis build configuration at the moment.

@swansontec
Copy link
Contributor Author

This looks like a bug in libbitcoin-protocol! The protobuf field is "bytes", and the description indicates that it's bytes, but the converter is actually shoving in base16 ASCII text.

@swansontec
Copy link
Contributor Author

I just ran the following procedure, so it looks like the hash reversal doesn't actually affect libbitcoin-protocol:

  1. Compile libbitcion at reverse~1, which is the last commit where the hashes are the old way around.
  2. Fix libbitcoin-protocol to use the proper functions, so it compiles and passes its tests again.
  3. Upgrade libbtitcon to reverse, where the branches are the right way around.
  4. Compile and run libbitcoin-protocol unit tests again.

Things still work at step 4, so we're good here. Pull requests coming soon, once I add a 'hash_literal' to libbitcoin.

@evoskuil
Copy link
Member

evoskuil commented Dec 3, 2014

Cool, I won't get to this until the morning but I'm hopeful based on seeing something I missed last time around.

@swansontec
Copy link
Contributor Author

Ok, I've updated the branch based on my work with libbitcoin-protocol. The base16 and hash conversions now have a bettery symmetry:

  • encode_base16 / encode_hash
  • decode_base16 / decode_hash
  • base16_literal / hash_literal

@swansontec
Copy link
Contributor Author

The protocol changes are in libbitcoin/libbitcoin-protocol#36

@swansontec
Copy link
Contributor Author

I have finished fixing bx. I was able to do the same 4-step process using reverse~1, so again, the change doesn't actually affect anything (assuming you use the right API's).

As with libbitcoin-protocol, there were some cases where decode_hash was being used for ec_secret's. There were also a lot of cases where base16(hash) was being used instead of the more-appropriate btc256(hash). That was about the extent of the problem, though.

The pull request is libbitcoin/libbitcoin-explorer#171

@swansontec
Copy link
Contributor Author

I have just done another update. This one fixes the problem with bx fetch-history. Apparently, our short_hashes are reversed in the obelisk binary protocol (and absolutely nowhere else). Ugh.

@swansontec swansontec mentioned this pull request Dec 4, 2014
@swansontec
Copy link
Contributor Author

I have created a new pull request, #147, which includes just the base16 changes. We can go ahead and pull that while we wait to decide on the hash reversal.

@evoskuil
Copy link
Member

evoskuil commented Dec 7, 2014

After further discussion with @swansontec, and narrowing the actual issue down to this simple commit, I'm convinced we should take this change. The expected representation is achieved through the encode/decode functions, which should always be used with to textually serialize our hashes in any case.

With this change, the reversal only takes place in the encode_hash
and decode_hash functions.
evoskuil added a commit that referenced this pull request Dec 11, 2014
Reverse bitcoin_hash in memory
@evoskuil evoskuil merged commit c8f5698 into version2 Dec 11, 2014
@evoskuil evoskuil deleted the reverse branch December 13, 2014 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants