How Bitcoin Hashing Works

stiggy87 edited this page Apr 11, 2013 · 5 revisions

Over the course of the inception of Bitcoin to the current state. There are wikis, forums, StackExchange questions, Github repos, etc. and each one of these has a different way of hashing the block.

Why the difference? Because everyone does optimization differently (some people don't change the nonce, some don't use the nonce, etc). This allows people to claim their miner is better, because they are faster. Are they doing correct bitcoin hashing? Who really knows.

This page is to give you the basic understanding of how hashing works. Why? Because there is no real central location for how everything works like in Open Source.

The Block Header

The Block Header is what everyone hashes. It is 80 bytes of data that will ultimately be hashed twice (not really, but we'll get into that later).

The header contains this info:

Name Byte Size Description
Version 4 Block version number
Previous Hash 32 This is the previous block header
Merkle Root 32 The hash based on all of the transactions in the block
Time 4 Current time stamp as seconds (unix format)
Bits 4 Target value in compact form
Nonce 4 User adjusted value starting from 0

In most bitcoin protocols is a thing called, "Getwork." For your miner to get work to do, you need to request it. When you send out a request you get a JSON Getwork return string that contain these values and some padding. This padding is used so you can really split the data into two 64-byte blocks to hash.

The Mid-state and Endianness

In a lot of code and discussions people are talking about the mid-state. This is the result of the first hash you do. The first hash is usually a network endian result of the first 64 bytes of the block header.

This means when you get work to do, it is not in the format you need. So for you to hash to the mid-state properly, you need to do endian swapping, or byte swapping of the first 64 bytes of data. Once this is done, you can push it into a SHA-256 hashing algorithm.

The output of this is a 256-bit (32 byte) mid-state. The good thing about this is, the mid-state only needs to be calculated once for the entire block.

Note: The current version of Getwork provides the mid-state, but it will be removed.

The Nonce and 2nd Hash (With the Target)

The nonce (rhymes with once), is a user editable 4 byte field to calculate the final hash. This will typically start at 0, and for every unsuccessful hash will be incremented and hashed again. It will continue this until 2^32 numbers are checked, and if the last one is invalid, a message will be sent to the network saying the Merkle Root extended nonce needs to be increased and the whole process starts again.

So how do you determine if the hash is valid or not? The target. The final hash needs to be less than or equal to the target.

This target is the "Bits" field, only it has to be padded. However, since getting such a low target (in most cases) is so difficult, so most miners choose the largest target value they can compare to and check to see if the hash they got meets the requirements and then sends it off.

Most miners compare the value to this: 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffff00000000 (byte swapped of course).

More info to be added later

Inconsistencies in the Miners and Community

One thing I've found over my course of writing a miner from scratch to now is the knowledge breakdown over time.

After scouring the Internet for more information on how miners works (and looking at some really poorly written code), I've come to the conclusion that there are a lot of inconsistencies in how miners work and what they do.

  1. In most implementations of miners I have seen, the mid-state is never used beyond the first hash. No information explains if it is used to prove work in pools or anywhere else.
  2. Endianness is the way the powerful mining developers use to confuse newbies. In reality, you get data in Big Endian (network endian) and you byte swap it so it is now Little Endian, and you hash that. The hash is now back in Big Endian, so you have to swap it again. Your end result should be the MSB (left most of the string) being nothing but 0's.
  3. Newbie Vs. Veteran... in Cryptology? Seriously, most of the information I've found ends up being newbies who are not familiar with cryptology (that is all the SHA-256 is used in) and they are trying to learn some very obscure mathematics and programming.
  4. Newbie Vs. Veteran... in Programming? Most of the C/C++ code I've seen for miners is horrendous. Most programmers will not do some basic memory management (like defining the absolutely length of an array which they know cause bitcoin data will not change size). Using a lot of pointers (and in some cases pointer functions) to do the work where it is not all that necessary. Plus, the commenting of code is non-existent. As a past professor has always told me, he would rather see more green than white on code. (The green/white reference is in most IDEs color for comments in green and the white is the background of the IDE). Finally, self-documenting code is non-existent (self-documenting code is where you write functions/variables in a way that whoever reads it understands what it is doing).
  5. Veterans having a lack of interest in helping newbies in general. Many forums I've seen always has about 25% of the veterans providing information, but not enough in laymen terms to help someone understand. Don't get me wrong there are some great ones out there who really bend over backwards to help others, but they are a minority.
  6. No explanation on basic units of measure. I've received a lot of emails asking what ZynqBTC's output is. I have no idea on how to measure this, and apparently no one really does, so everyone guesses. I've done some basic guess of every 1ns (1.0e-9 s) of processing a hash results in 1Mhash/s. So for an FPGA miner, a 200MHz clock gives you 200Mhash/s, but in reality this isn't true because a hash is not produced every 1ns, so if we go with a 5ns time, that means 40Mhash/s for 1 miner on 1 FPGA, and if you have 2 miners, that's 80Mhash/s. If you have 5 miners (what ZynqBTC has been known to implement with), you get 200Mhash/s. But in reality, if all miners are mining the same block at the same time (assuming nonces start the same time), you're back down to 40Mhash/s. Voodoo magic.
  7. Lack of central knowledge base that is USEFUL. I've scoured the endless numbers of wikis, and every entry lacks depth, laymen terms, and just glances over key pieces of data. Or even better, the lack of relevant information being next to each other. To understand that the first hash is only done once was found under Getwork wiki entry, but not in the process algorithm (huh?).

The list can go on, and it is really sad to see everyone claim miners are open source when they treat it like proprietary data. That is why my project will be done in such a way that if a user who implements it and still doesn't understand either doesn't want to understand, or I have failed.