Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pow: pure go ethash #3750

Merged
merged 7 commits into from
Mar 9, 2017
Merged

pow: pure go ethash #3750

merged 7 commits into from
Mar 9, 2017

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Mar 6, 2017

Acknowledgements

  • Huge thanks to @3esmit for offering access to his MIPS64 router to debug big endian ethash issues on and optimize memory usage patterns! This PR would be much less useful without it!
  • Also the PR deserves a shout out to @fjl for various memory allocation optimizations.

This PR replaces the C++ ethash implementation from https://github.com/ethereum/ethash with a custom one fully written in Go. The reasons for doing this are numerous.

Issues with C++ ethash

Features enabled by Go

  • Generating a single mining DAG can be parallelized to use all available cores on the machine. In C++ this is semi impossible without adding boost threads as a dependency. With Go we can use simple goroutines and use all out capacity to generate the DAG.
  • By managing the DAGs ourselves, the PR finally adds capability to rotate old ones out instead of eeping them stored on disk forever. This should help users running private networks where they don't want to constantly babysit the size of the ethash folder.
  • By managing the caches ourselves, the PR adds the capability to store verification caches on disk (yes, in a rotating matter), and loading them up if available. This is essential for lower capability devices (phones, routers), as generating a cache usually takes considerably more time than loading from disk. It also permits devices to quickly become usable after a restart instead of waiting for caches to build again.
  • Last but not least pure Go enables running on all platforms Go supports, even the exotic ones (e.g. mobiles, big endian) as well as all environments where Go transpilers exist (e.g GopherJS).

Performance

  • Generating the verification cache via C++ for epoch 0 on a Zenbook pro takes 650ms. The same thing using pure Go takes 850ms 800ms. However with the disk cache extension loading it form there (which happens always apart from the very first startup) takes 100ms is instantaneous due to memory mapping.
    • On a MIPS router the generating the first cache takes 35s, loading from disk 4.6s 0. The cache for block 2.6M takes 1m7s to generate but only 12s to load still memory maps instantly.
  • Generating the mining DAG via C++ for epoch 0 on a Zenbook pro takes a bit above 6 minutes. The same thing using pure Go + the threading it enables takes a bit above 1 minute.
  • Verifying a PoW for a random header using C++ took 2.7ms on the Zenbook whereas the pure Go version was benchmarked at 1.6ms 1.4ms. The comparison is not fully fair because I didn't write a full benchmark for C++.
  • Mining on 8 threads on the Zenbook was benchmarked via ethminer to have a hashrate of 200-250KH/s. Measuring the hashrate of the Go implementation (via gometrics) resulted in 240-250KH/s.
  • We can now memory map both mining datasets as well as verification caches, the latter reducing the hard memory requirement of geth on mainnet by about 120MB (3 recent caches + 1 future was stored until now).

Fixes

Extras

The PR introduces the following CLI flags (and eth.Config varialbes):

ETHASH OPTIONS:
  --ethash.cachedir            Directory to store the ethash verification caches (default = inside the datadir)
  --ethash.cachesinmem value   Number of recent ethash caches to keep in memory (16MB each) (default: 2)
  --ethash.cachesondisk value  Number of recent ethash caches to keep on disk (16MB each) (default: 3)
  --ethash.dagdir              Directory to store the ethash mining DAGs (default = inside home folder)
  --ethash.dagsinmem value     Number of recent ethash mining DAGs to keep in memory (1+GB each) (default: 1)
  --ethash.dagsondisk value    Number of recent ethash mining DAGs to keep on disk (1+GB each) (default: 2)

Notes

  • The PR drops support for all autodag flags and RPC endpoints. Automatic dag pre-generation is baked in and there's no way to disable it. There really isn't any meaningful scenario where this isn't needed, so it's just a lot less code and mess.
  • The PR currently hard codes cachedir == "ethash", cachesinmem == 2 and cachesondisk == 3 on mobile. This should be a sane default. We can surface these as parameters later, but i'd not go there for now.

@mention-bot
Copy link

@karalabe, thanks for your PR! By analyzing the history of the files in this pull request, we identified @fjl, @obscuren and @ebuchman to be potential reviewers.

@karalabe karalabe force-pushed the pure-go-ethash branch 2 times, most recently from 9f8ff72 to 7b0e06f Compare March 6, 2017 12:45
@karalabe karalabe requested review from fjl and obscuren March 6, 2017 16:31
@karalabe karalabe added this to the 1.6.0 milestone Mar 6, 2017
crypto/crypto.go Outdated
return result
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is easy to implement on top of hash.Hash, there is no need to add specialized variants to crypto/sha3.

import "hash"

func NewHasher(h hash.Hash) func ([]byte) []byte {
    return func (data []byte) []byte {
        h.Write(data)
        result := h.Sum(nil)
        h.Reset()
        return result
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe even

import "hash"

func NewHasher(h hash.Hash) func ([]byte) []byte {
    result := make([]byte, h.Size())
    return func (data []byte) []byte {
        h.Write(data)
        h.Sum(result[:0])
        h.Reset()
        return result
    }
}

// You should have received a copy of the GNU Lesser General Public License
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.

package pow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move ethash to pow/ethash. We can remove package pow if you want or keep it around for the interface (and FakePow). I'm suggesting this because having ethash here adds a ton of dependencies to the pow package, including "unsafe".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already moved it in my PoA PR, all of it into a consensus package, and ethash and rinkeby into their own packages. If it's ok, let's leave it to that PR to avoid a merge nightmare for me :D

pow/ethash.go Outdated
logger.Info("Loading ethash DAG from disk")
start := time.Now()
d.dataset = prepare(dsize, bufio.NewReader(dump))
logger.Info("Loaded ethash DAG from disk", "elapsed", common.PrettyDuration(time.Since(start)))
Copy link
Contributor

@fjl fjl Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to mmap DAGs and caches. mmap'ing means that the memory used by loaded DAGs doesn't count as used and is not tracked by GC.

Changing to mmap might mean that we need to store DAGs with native endianness. Your code translates everything from LE when loading. Doing this would destroy any advantages of mmap because the whole DAG would be paged in when touching it.

Another advantage would be that memory errors can be caught better. The libethash Go wrapper panics if mmap fails, leading to the infamous reports that you have marked as fixed in the description. These issues happen if the OS doesn't allow mmap of the file. If you mmap the file from Go, errors can be reported and overallocations (e.g. on 32bit) won't crash the process.

@karalabe
Copy link
Member Author

karalabe commented Mar 7, 2017

@fjl PTAL. Full mmap support done.

@karalabe
Copy link
Member Author

karalabe commented Mar 7, 2017

PS: Please don't squash when merging. I'd like to retain the commits mostly as is.

@fjl fjl changed the title Pure go ethash pow: pure go ethash Mar 7, 2017
Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor issues remain.

While trying this out I noticed that logging is super noisy:

DEBUG[03-07|23:36:53] Evicted ethash cache                     epoch=1 used=2017-03-07T23:34:21+0100
DEBUG[03-07|23:36:53] Using pre-generated cache                epoch=3
DEBUG[03-07|23:36:53] Requiring new future ethash cache        epoch=4
DEBUG[03-07|23:36:53] Failed to load old ethash cache          seed=0xb903bd7696740696b2b18bd1096a2873bb8ad0c2e7f25b00a0431014edb3f539 err="open /Users/fjl/Library/Ethereum/testnet/geth/ethash/cache-R23-b903bd7696740696b2b18bd1096a2873bb8ad0c2e7f25b00a0431014edb3f539.le: no such file or directory"
DEBUG[03-07|23:36:53] Generating ethash verification cache     seed=0xb903bd7696740696b2b18bd1096a2873bb8ad0c2e7f25b00a0431014edb3f539
INFO [03-07|23:36:55] Generated ethash verification cache      seed=0xb903bd7696740696b2b18bd1096a2873bb8ad0c2e7f25b00a0431014edb3f539 elapsed=1.646s

IMHO there shouldn't be an INFO-level message for every new cache. Please move it to DEBUG.
It would be nice if the "Generated..." message had the epoch instead of the seed hash (it's much easier to see what's going on).

The "Generating..." message can be removed.

"Using pre-generated...", "Requiring new future..." should be at TRACE level.

type ChainManager interface {
GetBlockByNumber(uint64) *types.Block
CurrentBlock() *types.Block
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete this interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already down in my PoA PR which cleans up the package as a whole (also dropping Blocks). Will rather postpone to that.

pow/ethash.go Outdated
if err != nil {
logger.Error("Failed to memory map ethash cache", "err", err)
}
}
Copy link
Contributor

@fjl fjl Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsafe if multiple processes try to generate the same DAG. You can work around it by creating the file with a temporary name first (e.g. .full-R23-hash.<pid>.tmp) then moving it into place when the content is valid. AFAIK there is no need to remap the file after the move (not sure about Windows though).

Copy link
Contributor

@fjl fjl Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for caches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up question would be how those .tmp would get deleted after a process crash. Maybe leave a TODO comment about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test succeeds with nprocs=1 but fails with nprocs=3:

func TestDiskCacheConcurrent(t *testing.T) {
	tmpdir, err := ioutil.TempDir("", "ethash-test")
	if err != nil {
		t.Fatal(err)
	}
	block := types.NewBlockWithHeader(&types.Header{
		Number:      big.NewInt(3311058), // Ethereum main net
		ParentHash:  common.HexToHash("0xd783efa4d392943503f28438ad5830b2d5964696ffc285f338585e9fe0a37a05"),
		UncleHash:   common.HexToHash("0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347"),
		Coinbase:    common.HexToAddress("0xc0ea08a2d404d3172d2add29a45be56da40e2949"),
		Root:        common.HexToHash("0x77d14e10470b5850332524f8cd6f69ad21f070ce92dca33ab2858300242ef2f1"),
		TxHash:      common.HexToHash("0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421"),
		ReceiptHash: common.HexToHash("0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421"),
		Difficulty:  big.NewInt(167925187834220),
		GasLimit:    big.NewInt(4015682),
		GasUsed:     big.NewInt(0),
		Time:        big.NewInt(1488928920),
		Extra:       []byte("www.bw.com"),
		MixDigest:   common.HexToHash("0x3e140b0784516af5e5ec6730f2fb20cca22f32be399b9e4ad77d32541f798cd0"),
		Nonce:       types.EncodeNonce(0xf400cd0006070c49),
	})

	var wg sync.WaitGroup
	nprocs := 3
	wg.Add(nprocs)
	for i := 0; i < nprocs; i++ {
		go func() {
			defer wg.Done()
			ethash := NewFullEthash(tmpdir, 0, 1, "", 0, 0)
			if err := ethash.Verify(block); err != nil {
				t.Error(err)
			}
		}()
	}
	wg.Wait()
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test. Albeit I figured caches won't conflict since they go into different folders and I don't expect people to mine with multi processes (especially with dag rotation). The problem nonetheless stands. Will try to fix.

A side effect is that if geth crashes (terminates) while generating a DAG/cache, currently that will corrupt it and we'll never know. So this is actually a better solution.

@karalabe
Copy link
Member Author

karalabe commented Mar 8, 2017

@fjl PTAL

I've implemented the generate-rename-open mechanism. This passes your test that generate the same things concurrently, at it should also solve issues if a generation is aborted midway through (i.e. no junk will be left there in the main file to be loaded the next time).

I also modified the log levels a bit. They now report epochs instead of seeds. Dropped the "generating...", and modified the "generated" so that it's only displayed if it took more than 3 seconds. The rationale is that embedded devices which might take 1+ minutes to generate a mainned dag should not just hang with no output.

Btw, a more elegant solution to logging the epochs inside generateCache and generateDataset would be to pass in a logger instance that already has the epoch, instead of passing the epoch just to make a logger out of it. Unfortunately this screws up Go's optimizer/inliner/something, as the performance drops by 50%. Making the logger inside is fast.

pow/ethash.go Outdated

os.Rename(temp, path)

d.dump, d.mmap, d.dataset, err = memoryMap(path, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the mapping stay established? It should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the memory mapping logic into its own function, e.g.

generateMappedFile(dir, name, size, func (dataset []uint32) { generateDataset(dataset, d.epoch, cache) })

This way it can be reused for both cache and DAG.

@karalabe
Copy link
Member Author

karalabe commented Mar 8, 2017

@fjl Done, PTAL

pow/ethash.go Outdated
if write {
file, err = os.OpenFile(path, os.O_RDWR, os.ModePerm)
} else {
file, err = os.OpenFile(path, os.O_RDONLY, os.ModePerm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.ModePerm is supposed to be used as a mask. Its value is 0777, all files will be created with read/write/execute permission for all users. It would be better to use a sane mode like 0644.

pow/ethash.go Outdated
if err != nil {
file.Close()
return nil, nil, nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is more complicated than it needs to be. You can just do:

flag, mapflag := os.O_RDONLY, mmap.RDONLY
if write {
    flag, mapflag = os.O_RDWR, mmap.RDWR
}
file, err := os.OpenFile(path, flag, 0644)
if err != nil {
    return nil, nil, nil, err
}
mem, err := mmap.Map(file, mapflag, 0)
if err != nil {
    file.Close()
    return nil, nil, nil, err
}

@karalabe
Copy link
Member Author

karalabe commented Mar 8, 2017

@fjl Done (squashed the last few commits), PTAL

pow/ethash.go Outdated
return nil, nil, nil, err
}
generator(buffer)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an error-checked call to mem.Flush() here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. PTAL

pow/ethash.go Outdated
if err = dump.Truncate(int64(size)); err != nil {
return nil, nil, nil, err
}
dump.Close()
Copy link
Contributor

@fjl fjl Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at this again, it seems weird that you're closing the file and reopen it just below. You can reuse the file that's already open.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. PTAL

pow/ethash.go Outdated
if err != nil {
logger.Error("Failed generate mapped ethash dataset", "err", err)

d.dataset = make([]uint32, dsize/2)
Copy link
Contributor

@fjl fjl Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like how errors are handled here. If mmap fails because of an out of memory condition, this allocation will crash the process. We could avoid that by always using mmap. If no file is needed (path is "") the mmap call can use the ANON option to allocate a non-file-backed buffer. If that fails, Search/Verify simply can't do any work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People will not use in memory DAGs anyway, this is only useful for testing. And for that we don't really care.

Copy link
Contributor

@fjl fjl Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that it will always try generate it in memory if mmap fails. This means that OOM crashes linked in the PR description are not fixed at all.

Copy link
Contributor

@fjl fjl Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to say: Go handles allocation failures by crashing. This is usually OK, but hurts if the program deals with large buffers like we do here. Using mmap means we can handle allocation errors gracefully, without crashing, regardless of whether the memory is file-backed or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require extra code to track that generation failed and handle that in Search/Verify. Given that there's no meaningful way to get out of this problem, we might as well crash :P

@fjl
Copy link
Contributor

fjl commented Mar 9, 2017

Will merge when CI is done.

Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file names look compatible now. DAG for epoch 0 is identical to the one generated by libethash.

@fjl fjl merged commit f3579f6 into ethereum:master Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants