Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd, core, eth, les, node: chain freezer on top of db rework #19244

Merged
merged 9 commits into from May 16, 2019

Conversation

Projects
None yet
7 participants
@karalabe
Copy link
Member

commented Mar 8, 2019

This PR implements a long desired feature of moving ancient chain segments out of the active database into immutable append-only flat files.

There are a few goals of this new mechanism:

By storing less data in the active key/value database, compactions should in theory be a lot faster since there's less to shuffle around. In practice we already prefixed chain items with their block numbers, so compactions might not benefit too much in themselves.
By storing less data in the active key/value database, the same amount of cache and file descriptor allowance should provide better performance, since more state trie nodes can be kept in active memory.
By moving the ancient chain items into their own flat files, we can split the database into an active set requiring a fast SSD disk and an immutable set for which a slow HDD disk suffices. This should permit more people to run Ethereum nodes without huge SSDs (--datadir.ancient=/path/on/hdd).
By keeping ancient chain items in flat files, we can piggyback on the operating system's built in memory manager and file cacher, allowing Geth to potentially max out available memory without actually having to hog it. This should reduce the strain on Go's GC, reduce the amount of memory used by Geth, while at the same time increase overall system stability.

@@ -51,6 +54,24 @@ func DeleteCanonicalHash(db ethdb.Deleter, number uint64) {
}
}

// readAllHashes retrieves all the hashes assigned to blocks at a certain heights,

This comment has been minimized.

Copy link
@ligi

ligi Mar 8, 2019

Member
Suggested change
// readAllHashes retrieves all the hashes assigned to blocks at a certain heights,
// readAllHashes retrieves all the hashes assigned to blocks at certain heights,
@@ -22,10 +22,44 @@ import (
"github.com/ethereum/go-ethereum/ethdb/memorydb"
)

// freezerdb is a databse wrapper that enabled freezer data retrievals.

This comment has been minimized.

Copy link
@ligi

ligi Mar 8, 2019

Member
Suggested change
// freezerdb is a databse wrapper that enabled freezer data retrievals.
// freezerdb is a databse wrapper that enables freezer data retrievals.
// freezerBlockGraduation is the number of confirmations a block must achieve
// before it becomes elligible for chain freezing. This must exceed any chain
// reorg depth, since the freezer also deletes all block siblings.
freezerBlockGraduation = 60000

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

Consider making this a variable that is passed into the freezer so that its value can correspond to eth/downloader/MaxForkAncestry in some way.

freezer := &freezer{
tables: make(map[string]*freezerTable),
}
for _, name := range []string{"hashes", "headers", "bodies", "receipts", "diffs"} {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

Nit: IMHO "diffs" is an overloaded terminology. I would have used tds.

// reserving it for go-ethereum. This would also reduce the memory requirements
// of Geth, and thus also GC overhead.
type freezer struct {
tables map[string]*freezerTable // Data tables for storing everything

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

I understand that using a map for the freezerTable adds some niceties like being able to iterate across the tables. However, I would consider making each freezer table an explicit member of the struct. It makes it immediately clear what the freezer is storing without having to look at newFreezer.

}
freezer.tables[name] = table
}
// Truncate all data tables to the same length

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

Just giving my two wei: a method called truncateAllTableToEqualLength would remove this comment and decrease the cognitive load to know what the remaining code is doing.

This comment has been minimized.

Copy link
@hadv

hadv Apr 26, 2019

Contributor

loving your usage of my two wei 👍

freezerBatchLimit = 30000
)

// freezer is an memory mapped append-only database to store immutable chain data

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

Can you point me to where the memory mapping is happening? My understanding is that os.File is not guaranteed to be memory-mapped as its left up to the OS.

return nil
}

// Ancient retrieves an ancient binary blob from the append-only immutable files.

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

It seems like the kind is around because each table is not coupled to the data it stores. This could be changed if it was placed within the ChainDB proposed in #19200 (as discussed previously).

"github.com/ethereum/go-ethereum/common"
"github.com/ethereum/go-ethereum/log"
"github.com/ethereum/go-ethereum/metrics"
"github.com/golang/snappy"

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

Has there been any benchmarking to determine if snappy compression is helpful (e.g. low overhead, high data savings)? Perhaps it is useful for bodies and receipts (and headers?), but may not do much for the other pieces of data.

Have you considered making this an optional parameter for each freezerTable instance?

This comment has been minimized.

Copy link
@holiman

holiman Mar 13, 2019

Contributor

From an earlier test

16 blocks.

  • as one binary file, uncompressed 332K ,compressed 254K.
  • As 16 indivudually compressed files, 257K

Afaik, we haven't tested that on headers

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 13, 2019

Author Member

pic

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 13, 2019

Author Member

Though we cut off a lot of redundant data from the receipts, so I expect snappy to be less effective since the charts were plotted.

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 13, 2019

Contributor

Storage-wise it seems worthwhile. Is the decompression overhead for reads something to consider?

for {
// Retrieve the freezing threshold. In theory we're interested only in full
// blocks post-sync, but that would keep the live database enormous during
// dast sync. By picking the fast block, we still get to deep freeze all the

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

s/dast/fast

}
// Inject all the components into the relevant data tables
if err := f.tables["hashes"].Append(f.frozen, hash[:]); err != nil {
log.Error("Failed to deep freeze hash", "number", f.frozen, "hash", hash, "err", err)

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

It seems like that if one of the subsequent freezerTable.Appends fails and then the import is retried (without reopening the freezer) then this will result in a panic since the entry will already have been written? Is that the failure model you want?

if n := len(ancients); n > 0 {
context = append(context, []interface{}{"hash", ancients[n-1]}...)
}
log.Info("Deep froze chain segment", context...)

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 8, 2019

Contributor

This is another example of something I would have extracted out to not distract from the core functionality in this method.

Show resolved Hide resolved core/rawdb/freezer_table.go Outdated

@karalabe karalabe force-pushed the karalabe:freezer-2 branch from bc733f7 to 1aa8834 Mar 26, 2019

@Matthalp
Copy link
Contributor

left a comment

Left a few comments. They really center around two points:
(1) Using atomic within code regions that are already guarded by locks
(2) File handle management. It seems like the freezer will have quite a few file handles open across the tables being used, which may compete with LevelDB's handle budget without being careful (which I know you are well aware of since you raised the point to me a while back)

errOutOfBounds = errors.New("out of bounds")
)

// indexEntry contains the number/id of the file that the data resides in, aswell as the

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

s/aswell/as well/

offset uint32 // stored as uint32 ( 4 bytes)
}

const indexEntrySize = 6

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

Nit: move const to the top of the file

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

Whilst I generally agree with that, this number depends on the above type declaration. Moving it elsewhere would give the false sense that it can be modified freely.

// offset within the file to the end of the data
// In serialized form, the filenum is stored as uint16.
type indexEntry struct {
filenum uint32 // stored as uint16 ( 2 bytes)

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

Is the index's storage footprint such a concern that you have to use uint[16|32] instead of uint64? Using uint64 seems simpler, consistent with the BlockChain API, and is the common word size in modern processors. It just feels like an additional piece of context to remember without having any substantial impact on a running system.

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

In-memory wise it doesn't matter tbh, it could even be a plain int. On-disk storage wise we know it won't ever exceed uint16, so no point in wasting more disk space on zeroes.

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 27, 2019

Contributor

I hear you. Just figured I would point it out.


// newTable opens a freezer table with default settings - 2G files and snappy compression
func newTable(path string, name string, readMeter metrics.Meter, writeMeter metrics.Meter) (*freezerTable, error) {
return newCustomTable(path, name, readMeter, writeMeter, 2*1000*1000*1000, false)

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

Nit: Consider making this a constant. Also should this be 2 * 1024 * 1024 * 1024?

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

Yeah, I did wonder about this. I think I'd go with your suggestion.

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

We just need to take extra care not to ever overflow it, since many file systems + 32bit architectures are hard limited to that number.

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

Ah, scratch 32 bit arch, that should handle 4GB (or 3 in case of windows, lol).

This comment has been minimized.

Copy link
@holiman

holiman Mar 27, 2019

Contributor

Meh. ISO standards FTW

t.releaseFilesAfter(0, false)
// Open all except head in RDONLY
for i := uint32(0); i < t.headId; i++ {
if _, err = t.openFile(i, os.O_RDONLY); err != nil {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

Are there any issues with file handler management here? The scenario I'm imagining is when multiple tables each have a bunch of files open (and LevelDB is also active).

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

The current historical chain is about 80GB in size, so that's +- 40 fds. These will probably be opened either way if someone syncs from us. I think 40 is safe, but if we grow we probably want to eventually add fd management. That seems an overcomplication at this point however.

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 27, 2019

Contributor

Agree. I was just asking to serve as a sanity check.

t.lock.Lock()
defer t.lock.Unlock()
// If out item count is corrent, don't do anything
if atomic.LoadUint64(&t.items) <= items {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

Is the atomic.LoadUint64 needed here if the write lock has already been acquired?

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

The locking mechanism was designed to allow reading and appending to be concurrently runable (since append is just pushes some data that readers are unaware of, and at the very end bumps the item count, making it available).

If append does not write lock, just uses atomic, then this write lock here doesn't protect the items counter. Append will only obtain a write lock if it needs to switch to a new file. That said, Append and Truncate may not really be safe to use concurrently, as they might race on what the final value of items should be.

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 27, 2019

Contributor

If it were me I would be more conservative and lock more frequently -- especially for writes. I feel like any slow down would be negligible and it ensures the code is more safe. Just my two wei.

}

// releaseFilesAfter closes all open files with a higher number, and optionally also deletes the files
func (t *freezerTable) releaseFilesAfter(num uint32, remove bool) {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

Nit: I find the remove option to be misleading for a method about releasing. I would consider adding a second method that makes this more explicit. They can still use a common helper function.

return errClosed
}
// Ensure only the next item can be written, nothing else
if atomic.LoadUint64(&t.items) != item {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Mar 26, 2019

Contributor

Is the atomic.LoadUint64 necessary if you've already acquired a read lock?

This comment has been minimized.

Copy link
@karalabe

karalabe Mar 27, 2019

Author Member

Yes, Append can run concurrently (since it doesn't change internals, just adds something to the end of the file and them atomically makrs it available).

@karalabe karalabe force-pushed the karalabe:freezer-2 branch 2 times, most recently from bd9760b to 5810ffe Mar 27, 2019

@karalabe

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2019

Nitpick to do: make data file numbers zero padded to 3 digits.

TL;DR

-rw-r--r-- 1 root root 1999998703 Mar 27 16:37 receipts.0.cdat
-rw-r--r-- 1 root root 1999997444 Mar 27 16:54 receipts.1.cdat
-rw-r--r-- 1 root root 1999999967 Mar 27 19:48 receipts.10.cdat
-rw-r--r-- 1 root root 1999999260 Mar 27 20:08 receipts.11.cdat
-rw-r--r-- 1 root root 1999994597 Mar 27 20:30 receipts.12.cdat
-rw-r--r-- 1 root root 1999993520 Mar 27 20:51 receipts.13.cdat
-rw-r--r-- 1 root root 1999992762 Mar 27 21:12 receipts.14.cdat
-rw-r--r-- 1 root root 1999990541 Mar 27 21:33 receipts.15.cdat
-rw-r--r-- 1 root root 1999988154 Mar 27 21:53 receipts.16.cdat
-rw-r--r-- 1 root root 1999995262 Mar 27 22:14 receipts.17.cdat
-rw-r--r-- 1 root root 1273323457 Mar 27 22:27 receipts.18.cdat
-rw-r--r-- 1 root root 1999989717 Mar 27 17:07 receipts.2.cdat
-rw-r--r-- 1 root root 1999995186 Mar 27 17:21 receipts.3.cdat
-rw-r--r-- 1 root root 1999999174 Mar 27 17:40 receipts.4.cdat
-rw-r--r-- 1 root root 1999990962 Mar 27 18:03 receipts.5.cdat
-rw-r--r-- 1 root root 1999995600 Mar 27 18:26 receipts.6.cdat
-rw-r--r-- 1 root root 1999997331 Mar 27 18:47 receipts.7.cdat
-rw-r--r-- 1 root root 1999993270 Mar 27 19:08 receipts.8.cdat
-rw-r--r-- 1 root root 1999998447 Mar 27 19:28 receipts.9.cdat
@holiman

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Here are some charts from the test with @rjl493456442 's changes that was run in fast-sync.

This is the master. It has a very intense download of blocks/headers/state for about 4 hours, then a long drawn-out state download which takes roughly another 8 hours.
Screenshot_2019-03-28 EXP (52 56 200 108) VS Master (52 56 85 219) Datadog

For the experimental, it looks a bit different. It has an 'intense' download lasting six hours, followed by one hours of less intense download, and finishes the fast sync in a total of 7 hours.
Screenshot_2019-03-28 EXP (52 56 200 108) VS Master (52 56 85 219) Datadog(1)

So 7 hours instead of 12. I'm not quite sure why.

The malloc/gc overhead was more stable on experimental:

Screenshot_2019-03-28 EXP (52 56 200 108) VS Master (52 56 85 219) Datadog(2)

@holiman

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

More stats on the data

root@mon07:/datadrive/geth/geth/chaindata# du -h .
88G    ./ancient
153G    .
root@mon06:/datadrive/geth/geth/chaindata# du -h .
154G    .

So the master levedb stores 154G of data, whereas the experimental has 65G in levedb and 88G in the ancient

@Matthalp

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

@holiman The results make sense to me. There is less pressure on LevelDB since it's now responsible for handling half the amount of data as before, which reduces its operation overhead (e.g. compaction). The LevelDB storage hierarchy is also organized by powers of 10, so going from ~150 GB to half that means that one less level is needed.

@Matthalp

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Nonetheless it's probably worth running it a few times. I'm not sure the exact benchmark setup but if a different peer is being used for each sync then that could introduce some variation as well.

@holiman

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Yeah - we did just that. We swapped the machines, wiped the node-ids and ran again. It's still running, but this time it looks like the master will finish ahead of this PR. So that was probably a one-off thing - the big difference for statesync is the quality of your peers.

@holiman

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

So on the second run, they were roughly equal -- this PR making a fast sync in ~10h, the master half an hour faster. Also, on the second run, the difference in mallocs and gc was not as pronounced:

Screenshot_2019-03-29 EXP (52 56 200 108) VS Master (52 56 85 219) Datadog

OBS Since we switched machines, the colours are swapped in that ^ chart, this PR is the blue line.

@karalabe karalabe force-pushed the karalabe:freezer-2 branch from 5810ffe to ffbf258 Apr 24, 2019

@@ -367,9 +367,12 @@ func exportPreimages(ctx *cli.Context) error {

func copyDb(ctx *cli.Context) error {
// Ensure we have a source chain directory to copy
if len(ctx.Args()) != 1 {
if len(ctx.Args()) < 1 {
utils.Fatalf("Source chaindata directory path argument missing")

This comment has been minimized.

Copy link
@holiman

holiman Apr 24, 2019

Contributor
Suggested change
utils.Fatalf("Source chaindata directory path argument missing")
utils.Fatalf("Source directory paths missing (chaindata and ancient)")
utils.Fatalf("Source chaindata directory path argument missing")
}
if len(ctx.Args()) < 2 {

This comment has been minimized.

Copy link
@holiman

holiman Apr 24, 2019

Contributor

Don't we by default put ancient inside the chain directory? Shouldn't we default to that here too?

This comment has been minimized.

Copy link
@karalabe

karalabe Apr 24, 2019

Author Member

By default yes, but if we want to copydb from a database where the ancient is outside?

@@ -1569,7 +1576,7 @@ func MakeChainDatabase(ctx *cli.Context, stack *node.Node) ethdb.Database {
if ctx.GlobalString(SyncModeFlag.Name) == "light" {
name = "lightchaindata"
}
chainDb, err := stack.OpenDatabase(name, cache, handles, "")
chainDb, err := stack.OpenDatabaseWithFreezer(name, cache, handles, "", "")

This comment has been minimized.

Copy link
@holiman

holiman Apr 24, 2019

Contributor

Shouldn't you pass the cfg.DatabaseFreezer in here somehow?

@karalabe karalabe requested a review from zsfelfoldi as a code owner Apr 25, 2019

@karalabe karalabe force-pushed the karalabe:freezer-2 branch from 6828915 to ebca031 Apr 25, 2019

@karalabe karalabe added this to the 1.9.0 milestone Apr 25, 2019

return i, fmt.Errorf("containing header #%d [%x…] unknown", block.Number(), block.Hash().Bytes()[:4])
}
// Compute all the non-consensus fields of the receipts
if err := receiptChain[i].DeriveFields(bc.chainConfig, block.Hash(), block.NumberU64(), block.Transactions()); err != nil {

This comment has been minimized.

Copy link
@rjl493456442

rjl493456442 Apr 26, 2019

Member

We don't need to derive the receipt fields here since they will be discarded during the RLP encoding anyway.

continue
}
// Compute all the non-consensus fields of the receipts
if err := receiptChain[i].DeriveFields(bc.chainConfig, block.Hash(), block.NumberU64(), block.Transactions()); err != nil {

This comment has been minimized.

Copy link
@rjl493456442

rjl493456442 Apr 26, 2019

Member

We don't need to derive the receipt fields here since they will be discarded during the RLP encoding anyway.

t.lock.RLock()
}

defer t.lock.RUnlock()

This comment has been minimized.

Copy link
@rjl493456442

rjl493456442 Apr 26, 2019

Member

The lock is used to protect map[uint32]*os.File as well as head and index file descriptor. Since os.File.Write is not thread safe, should we use Lock here instead of RLock?

This comment has been minimized.

Copy link
@holiman

holiman Apr 26, 2019

Contributor

I think maybe we should add a freezer.md where we explain things like the concurrency assumptions and the data layout. To answer the question, I think No; we do not cater for multiple callers of Append, tha aim is to prevent concurrent calling of Append/Truncate.

// feed us valid blocks until head. All of these blocks might be written into
// the ancient store, the safe region for freezer is not enough.
if d.checkpoint != 0 && d.checkpoint > MaxForkAncestry+1 {
d.ancientLimit = height - MaxForkAncestry - 1

This comment has been minimized.

Copy link
@rjl493456442

rjl493456442 Apr 26, 2019

Member

d.ancientLimit = height - MaxForkAncestry - 1 -> d.ancientLimit = d.checkpoint - MaxForkAncestry - 1

@holiman

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2019

Data taken from mon07, comparing the effectiveness of snappy versus encoding plan.
receipts
headers
hashes
diffs
bodies

Sorry for the crappyness of the labeling, but aside from that, in order of size:

  • bodies: 76 e9 versus 55 e9 (snappy wins)
  • receipts 61 e9 versus 22 e9 (snappy wins)
  • hashes 260 e6 versus 240 e6 (plain would be smaller)
  • difficulty 91 e6versus 75 e6 (plain would be smaller).
  • headers 4.1 e9 versus 3.2 e9 (snappy wins)

Also, the index for each type is takes rougly 44MB on disk each.

@@ -218,6 +219,39 @@ func NewBlockChain(db ethdb.Database, cacheConfig *CacheConfig, chainConfig *par
if err := bc.loadLastState(); err != nil {
return nil, err
}
if frozen, err := bc.db.Ancients(); err == nil && frozen >= 1 {

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 30, 2019

Contributor

Nit: frozen > 0 is slightly clearer.

rawdb.DeleteBody(db, hash, num)
rawdb.DeleteReceipts(db, hash, num)
}
// Todo(rjl493456442) txlookup, bloombits, etc

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 30, 2019

Contributor

Nit: s/Todo/TODO (and possibly end in a ':'):

// TODO(rjl493456442): txlookup, bloombits, etc
//
// Notably, it can happen that system crashes without truncating the ancient data
// but the head indicator has been updated in the active store. Regarding this issue,
// system will self recovery by truncating the extra data during the setup phase.

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 30, 2019

Contributor

s/recovery/recover/


case *number-freezerBlockGraduation <= f.frozen:
log.Debug("Ancient blocks frozen already", "number", *number, "hash", hash, "frozen", f.frozen)
time.Sleep(freezerRecheckInterval)

This comment has been minimized.

Copy link
@Matthalp

Matthalp Apr 30, 2019

Contributor

Should this be an error?

@holiman

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

TODO, as far as I know

  • Make sure that if you use an ancient database which already has data, but an otherwise cliean leveldb, that things at least 'work'.

  • Optional: Make sure that if you use an ancient database which already has data, but an otherwise cliean leveldb, you don't re-download things that already exist (but do properly truncate on errors)

@adamschmideg

This comment has been minimized.

Copy link
Collaborator

commented May 7, 2019

Waiting for karalabe#15 to be merged

@Matthalp

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

@karalabe Did you consider using separate LevelDB instances to store blockchain constructs (e.g. what each Table is storing) rather than using a freezer? I think it will accomplish most of the goals of this implementation without all of the implementation complexity. It seems like it would be pretty straightforward to do inside of #19200. The storage overhead would probably be negligible and the storage bottleneck for block processing will still be in the LevelDB instance storing trie data.

@karalabe karalabe force-pushed the karalabe:freezer-2 branch 2 times, most recently from a971ccd to c3bb1b3 May 14, 2019

karalabe and others added some commits Mar 8, 2019

freezer: implement split files for data
* freezer: implement split files for data

* freezer: add tests

* freezer: close old head-file when opening next

* freezer: fix truncation

* freezer: more testing around close/open

* rawdb/freezer: address review concerns

* freezer: fix minor review concerns

* freezer: fix remaining concerns + testcases around truncation

* freezer: docs

* freezer: implement multithreading

* core/rawdb: fix freezer nitpicks + change offsets to uint32

* freezer: preopen files, simplify lock constructs

* freezer: delete files during truncation
all: integrate the freezer with fast sync
* all: freezer style syncing

core, eth, les, light: clean up freezer relative APIs

core, eth, les, trie, ethdb, light: clean a bit

core, eth, les, light: add unit tests

core, light: rewrite setHead function

core, eth: fix downloader unit tests

core: add receipt chain insertion test

core: use constant instead of hardcoding table name

core: fix rollback

core: fix setHead

core/rawdb: remove canonical block first and then iterate side chain

core/rawdb, ethdb: add hasAncient interface

eth/downloader: calculate ancient limit via cht first

core, eth, ethdb: lots of fixes

* eth/downloader: print ancient disable log only for fast sync
freezer: disable compression on hashes and difficulties (#14)
* freezer: disable compression on hashes and difficulties

* core/rawdb: address review concerns

* core/rawdb: address review concerns
core, cmd, vendor: fixes and database inspection tool (#15)
* core, eth: some fixes for freezer

* vendor, core/rawdb, cmd/geth: add db inspector

* core, cmd/utils: check ancient store path forceily

* cmd/geth, common, core/rawdb: a few fixes

* cmd/geth: support windows file rename and fix rename error

* core: support ancient plugin

* core, cmd: streaming file copy

* cmd, consensus, core, tests: keep genesis in leveldb

* core: write txlookup during ancient init

* core: bump database version

@karalabe karalabe force-pushed the karalabe:freezer-2 branch from c3bb1b3 to 536b3b4 May 16, 2019

@karalabe karalabe removed the status:triage label May 16, 2019

@karalabe karalabe force-pushed the karalabe:freezer-2 branch from 79c0a43 to 996e2de May 16, 2019

@karalabe

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

@Matthalp This PR did get a bit more complex than I anticipated, though that's because we decided to split the data files into 2GB chunks. Without that the code was really really simple.

That said, the operations we are doing are relatively straightforward. The complexity comes mostly from optimizations. With our current code we have a very nice control over how much resources we waste (0 memory, 10-20 file descriptors vs. caches + files for leveldb), and we control how much computational overhead we have (0 vs. compaction for leveldb).

Yes, leveldb might be a tad nicer, but by rolling our own files we can take advantage of all the properties of the data we are storing and optimize for our specific use case.

@karalabe karalabe force-pushed the karalabe:freezer-2 branch from b50a100 to 9eba3a9 May 16, 2019

@karalabe karalabe merged commit f5d89cd into ethereum:master May 16, 2019

0 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details

// Always keep genesis block in active database.
if b.NumberU64() != 0 {
deleted = append(deleted, b)

This comment has been minimized.

Copy link
@holiman

holiman May 22, 2019

Contributor

This can cause a huge spike in memory, our monitoring node spiked at nearly 40G memory shoving four million blocks from leveldb to ancients

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.