Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

new blooms database #8712

Merged
merged 44 commits into from Jun 20, 2018
Merged

new blooms database #8712

merged 44 commits into from Jun 20, 2018

Conversation

debris
Copy link
Collaborator

@debris debris commented May 25, 2018

This pr introduces new database for header blooms.

Why?

The only time when we access blooms is when we are filtering logs. Filtering logs requires iteration over all blooms in a database and this is very inefficient with RocksDB. As blockchain got bigger and bigger it took more and more time to effectively filter all blooms starting from the genesis block.

How?

I built increment only, sequential database for storing blooms. It consists of three database layers. top, mid, bot.

  • bot layer contains all header blooms
  • mid layer contains blooms created from 16 consecutive blooms of the bot layer.
  • top layer contains blooms from 16 consecutive blooms from the mid layer.

Blooms at each layer are placed next to each other on a disk. So it's very efficient to find O(1)~ and iterate O(1)~ over them. It's a huge improvement compared to the previous database implementation where every lookup and step of iteration was probably O(log n).

size of a database (for 1000000 blocks)

  • bot layer - 256mb (1000000 * size_of_bloom (256b))
  • mid layer - 16mb (bot / 16)
  • top layer - 1mb (mid / 16)

Other issues addressed

todo list (in this pr)

  • new blooms-db
  • new database interface for parity (aka BlockChainDB)
  • integrate new db interface with parity and tests (😓 )
  • BlockChainDB should not expose &RwLock in the interface
  • fix and tests for blooms-db flushing
  • migrate traces to the new blooms-db
  • remove old bloomchain crate
  • filter multiple bloom possibilities at the same time
  • restore blooms after snapshot
  • write database migration script

benchmarks

  • So far I run only a single benchmark for blooms-db. The results are promising, but I believe that they do not fully show how much everything has improved.

    benchmark from old stackoverflow thread

    time curl -X POST --data '{"id":8,"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"0x0","toBlock":"0xf0be2", "address": "0x33990122638b9132ca29c723bdf037f1a891a70c"}]}' -H "Content-Type: application/json" http://127.0.0.1:8545 >> /dev/null
    

    old parity:

    0.77s
    

    this branch:

    0.01s
    

This pr, should also reduce disk writes by roughly 70gb when doing full sync

3 * 5670000 * 16 * 256 bytes = 69.67296 gigabytes
  • 3 - bloom levels
  • 5670000 - number of blocks
  • 16 - currently blooms are stored in groups containing 16 blooms
  • 256 - size of a bloom

@debris debris added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. M4-core ⛓ Core client code / Rust. labels May 25, 2018
@tjayrush
Copy link

A couple of questions.

  1. Won't the higher level blooms, given that they are the same width and are getting filled with more and more data, get quickly saturated? In effect, making them less useful because they would tend to report higher numbers of false positives? Did you think about widening the mid-layer, top-layer blooms so as to keep a more consistent false-positive rate?

  2. How hard would it be to enhance the blooms to note every single address in a block (as opposed to the current list which, I think, doesn't include every address). The reason I'm asking is because I want to use the bloom to answer more questions than just "Is there an event on my address in this block." I want the blooms to answer "Is my address involved in any possible way in this block." With the ability to answer that question, one could easily build lists of all transaction an account was every involved in.

I wrote about this here: Adaptive Enhanced Bloom Filters.

@@ -1459,8 +1459,7 @@ mod tests {
use std::sync::Arc;
use rustc_hex::FromHex;
use hash::keccak;
use kvdb::{KeyValueDB, DBTransaction};
use kvdb_memorydb;
use kvdb::{DBTransaction};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks around a single import

@@ -33,7 +33,7 @@ use snapshot::io::{SnapshotReader, PackedWriter, PackedReader};
use tempdir::TempDir;
use rand::Rng;

use kvdb::{KeyValueDB, DBValue};
use kvdb::{DBValue};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks around a single import

@@ -28,8 +27,7 @@ use snapshot::io::{PackedReader, PackedWriter, SnapshotReader, SnapshotWriter};

use parking_lot::Mutex;
use snappy;
use kvdb::{KeyValueDB, DBTransaction};
use kvdb_memorydb;
use kvdb::{DBTransaction};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks around a single import

@@ -24,11 +24,11 @@ use ids::BlockId;
use snapshot::service::{Service, ServiceParams};
use snapshot::{self, ManifestData, SnapshotService};
use spec::Spec;
use test_helpers::{self, generate_dummy_client_with_spec_and_data};
use test_helpers::{generate_dummy_client_with_spec_and_data};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks around a single import

use test_helpers_internal::restoration_db_handler;

use io::IoChannel;
use kvdb_rocksdb::{Database, DatabaseConfig};
use kvdb_rocksdb::{DatabaseConfig};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks around a single import

@@ -19,7 +19,7 @@
use std::path::Path;
use std::sync::Arc;
use parking_lot::RwLock;
use kvdb::{KeyValueDB, KeyValueDBHandler};
use kvdb::{KeyValueDB};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks around a single import

@@ -108,18 +104,15 @@ impl<T> TraceDB<T> where T: DatabaseExtras {
let current_size = self.cache_size();

let mut traces = self.traces.write();
//let mut blooms = self.blooms.write();
let mut cache_manager = self.cache_manager.write();

cache_manager.collect_garbage(current_size, | ids | {
for id in &ids {
match *id {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use if let instead of match of a single pattern?

@debris
Copy link
Collaborator Author

debris commented May 27, 2018

@niklasad1 this pr is still in progress and quite far away from being ready for a review :)

@tjayrush

  1. Won't the higher level blooms, given that they are the same width and are getting filled with more and more data, get quickly saturated?

They will, especially top level blooms. But the cost of checking them is so low, that it doesn't matter.

Currently we use exactly the same bloom filters with three layers for filtering events. The bottleneck is rocksdb io, and this pr is trying to address only that problem. :)

Did you think about widening the mid-layer, top-layer blooms so as to keep a more consistent false-positive rate?

Yes, that's a possible improvement.

How hard would it be to enhance the blooms to note every single address in a block (as opposed to the current list which, I think, doesn't include every address). The reason I'm asking is because I want to use the bloom to answer more questions than just "Is there an event on my address in this block."

You can use transaction traces exactly for that. They already utilize the same bloom filters.

@tjayrush
Copy link

tjayrush commented May 27, 2018

@debris

You can use transaction traces exactly for that. They already utilize the same bloom filters.

The reason I'm suggesting adding every address to the bloom is to avoid having to request traces. If one is trying to do a full accounting of a particular address (or group of addresses), one needs to visit every trace, but if the blooms included all addresses involved in a block, one could use them to avoid a huge number of trace requests. Requesting traces (especially across the 2016 dDos transactions) is brutal. Did you have a chance to glance at the paper I mentioned above? With that method, I can collect full lists of transactions per account about 100 times faster than reading every trace. The current blooms don't include every address, so they miss a lot of blocks if one wants a full list of transactions.

Also, one thing I found was that because the block level blooms "roll-up" the transaction level blooms, two unfortunate things happen: (1) the block level blooms are becoming saturated, so they are reporting more false positives than you might want (this is exacerbated in the hierarchical scheme you suggest without widening the mid- and upper-level blooms), and (2) the receipt-level blooms are way too undersaturated and therefore take up a ton of space that isn't well-utilized.

@5chdn 5chdn added this to the 1.12 milestone May 31, 2018
Copy link
Collaborator Author

@debris debris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ready for the review

fn open(&self, db_path: &Path) -> Result<Arc<BlockChainDB>, Error> {
let key_value = Arc::new(kvdb_rocksdb::Database::open(&self.config, &db_path.to_string_lossy())?);
let blooms_path = db_path.join("blooms");
let trace_blooms_path = db_path.join("trace_blooms");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a good location for blooms db. I'm open for any other suggestions

@@ -1293,10 +1294,11 @@ impl snapshot::DatabaseRestore for Client {
let mut tracedb = self.tracedb.write();
self.importer.miner.clear();
let db = self.db.write();
db.restore(new_db)?;
db.key_value().restore(new_db)?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because blooms-db is placed in the same directory, it also moves it. Maybe we should move this function out of KeyValueDB interface to make it more descriptive


// TODO: restore blooms properly
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I forgot to reopen the blooms db, will fix it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@debris Fixed? If so remove the comment 👋

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, that's the only remaining thing :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

// Some(3) -> COL_EXTRA
// 3u8 -> ExtrasIndex::BlocksBlooms
// 0u8 -> level 0
let blooms_iterator = db.key_value()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ethcore/src/blockchain/extras.rs

// Some(4) -> COL_TRACE
// 1u8 -> TraceDBIndex::BloomGroups
// 0u8 -> level 0
let trace_blooms_iterator = db.key_value()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ethcore/src/trace/db.rs

Copy link
Collaborator

@sorpaas sorpaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Just not sure about some aspects regarding the database consistency issues.

});

for (number, blooms) in blooms_iterator {
db.blooms().insert_blooms(number, blooms.iter())?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert_blooms is not atomic. If any of top, mid, or bot file write fails, the bloomdb basically enters a corrupt status -- we will have top-level blooms saying something exists but cannot be found in mid or bot (true negative). So I think it may be better to just panic expect here like we do in blockchain.rs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters. even if it ends up in the corrupt state. On next parity launch the migration will start from scratch. Corrupted data will be overwritten and database state will be consistent

// constant forks make lead to increased ration of false positives in bloom filters
// since we do not rebuild top or mid level, but we should not be worried about that
// most of the time events at block n(a) occur also on block n(b) or n+1(b)
self.top.accrue_bloom::<ethbloom::BloomRef>(pos.top, bloom)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding possible bloomdb true negative corrupt situation, given that we only insert one bloom at a time, I think a simple solution maybe to create a metadata file, storing the current writing bloom position before attempting the insertion. Then, if we find any error case in below, next time when the program starts, we would only need to rewrite that particular bloom position to fix the corruption.

This doesn't solve the disk consistency issue. Otherwise we will need to flush within the for loop, which may be a bad performance penalty. So not sure whether we would want this.

Copy link
Collaborator Author

@debris debris Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, if we find any error case in below, next time when the program starts, we would only need to rewrite that particular bloom position to fix the corruption.

That's what I did initially. Then I thought about the problem again and realised that all of this does not matter. Existence of blooms is not consensus critical. If one of the writes fail (or partially fail), the particular block will be reimported on the next launch of parity and the database will overwrite corrupted data

///
/// This database does not guarantee atomic writes.
pub struct Database {
database: Mutex<db::Database>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reasons we use Mutex but not RwLock? I think it would be safe for multiple parties to read blooms at the same time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr

Reading a file is mutating the object that represent the file stream, namely altering the position. The very data being mutated is stored in the kernel space, and simultaneous mutation won’t cause the kernel any trouble because it is smart enough to synchronize accesses.
Still, two threads simultaneously reading the same file stream will break internal logic. (I guess even a single read() call on BufReader could yield a non-contiguous slice).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I made iterator function of db::Database mut.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it!

@5chdn
Copy link
Contributor

5chdn commented Jun 18, 2018

Please add a ✔️ tick if you think this looks good.

@@ -152,7 +169,8 @@ pub trait BlockProvider {
}

/// Returns numbers of blocks containing given bloom.
fn blocks_with_bloom(&self, bloom: &Bloom, from_block: BlockNumber, to_block: BlockNumber) -> Vec<BlockNumber>;
fn blocks_with_bloom<'a, B, I, II>(&self, blooms: II, from_block: BlockNumber, to_block: BlockNumber) -> Vec<BlockNumber>
where BloomRef<'a>: From<B>, II: IntoIterator<Item = B, IntoIter = I> + Copy, I: Iterator<Item = B>, Self: Sized;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that each trait definition deserves its own line (to make it easier to read)
e.g,

    where
        BloomRef<'a>: From<B>,
        II: IntoIterator<Item = B, IntoIter = I> + Copy,
        I: Iterator<Item = B>,
        Self: Sized;

.map(|b| b as BlockNumber)
.collect()
fn blocks_with_bloom<'a, B, I, II>(&self, blooms: II, from_block: BlockNumber, to_block: BlockNumber) -> Vec<BlockNumber>
where BloomRef<'a>: From<B>, II: IntoIterator<Item = B, IntoIter = I> + Copy, I: Iterator<Item = B> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that each trait definition deserves its own line (to make it easier to read)
e.g,

where
    BloomRef<'a>: From<B>,
    II: IntoIterator<Item = B, IntoIter = I> + Copy,
    I: Iterator<Item = B>,
    Self: Sized;

@@ -622,7 +621,8 @@ impl Service {

match is_done {
true => {
db.flush().map_err(UtilError::from)?;
// TODO: flush also blooms?
Copy link
Collaborator

@niklasad1 niklasad1 Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand the API doesn't provide flushing but it does flush everytime new blooms are inserted?!

If so clarify comment alternatively change the API and actually flush!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's an obsolete comment ;)
Yes, this db doesn't provide flushing, cause writes are nonatomic and happen immediately when someone calls insert

// config,
bloom_config: BloomConfig,
// tracing enabled
cache_manager: RwLock<CacheManager<H256>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docs for cache_manager

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh... it's not a public field and no change has been made here :p but I'll add a description ;p

for key in blooms_keys {
self.note_used(CacheId::Bloom(key));
}
// TODO: replace it with database for trace blooms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be fixed in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes, obsolete comment 😅

let numbers = chain.filter(filter);
let possibilities = filter.bloom_possibilities();
let numbers = self.db.trace_blooms()
.filter(filter.range.start as u64, filter.range.end as u64, &possibilities).expect("TODO: blooms pr");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is fixed change the expect message!

let blooms = rlp::decode_list::<Bloom>(&group);
(number, blooms)
});

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is identical except the variables to #L38-#L52 consider moving it to a function instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not exactly the same, notice that endianness is completely different. also the function would have 4 variables and imo would be less readable than what we have here. As this is a migration, this code will never be modified again

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense then NVM!

[package]
name = "blooms-db"
version = "0.1.0"
authors = ["debris <marek.kotewicz@gmail.com>"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be admin@parity.io?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot one thing, also add a license I guess it should be: license = "GPL-3.0"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it doesn't matter for subcrates :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you are right. We usually have a license field so I'll also add it :)

Copy link
Collaborator

@niklasad1 niklasad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, a really high-quality PR but some minor things to fix!

@debris
Copy link
Collaborator Author

debris commented Jun 20, 2018

@niklasad1 @sorpaas I addressed grumbles. Can you review the pr again and mark as accepted (if it's good) :)

@debris
Copy link
Collaborator Author

debris commented Jun 20, 2018

also, I merged fs-swap branch into this one. Thought that I'll need it, but didn't need it eventually. If it's a problem, I'll remove those prs

Copy link
Collaborator

@niklasad1 niklasad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@niklasad1 niklasad1 added A9-buythatmanabeer 🍻 Pull request is reviewed well and worth buying the author a beer. and removed A0-pleasereview 🤓 Pull request needs code review. labels Jun 20, 2018
@5chdn 5chdn merged commit 458afcd into master Jun 20, 2018
@5chdn 5chdn deleted the new-blooms-db branch June 20, 2018 13:13
ordian added a commit to ordian/parity that referenced this pull request Jun 20, 2018
* 'master' of https://github.com/paritytech/parity:
  new blooms database (openethereum#8712)
  ethstore: retry deduplication of wallet file names until success (openethereum#8910)
  Update ropsten.json (openethereum#8926)
dvdplm added a commit that referenced this pull request Jun 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A9-buythatmanabeer 🍻 Pull request is reviewed well and worth buying the author a beer. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants