Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic LMDB mapsize allocation [1.1.0] #2605

Merged
merged 16 commits into from Feb 27, 2019

Conversation

Projects
None yet
4 participants
@yeastplume
Copy link
Member

commented Feb 19, 2019

Mostly to support #2525, but also to make the backend store a bit more flexible. This:

  • Allocates DB space in chunks of 128MB at a time
  • Checks whether more space is needed every time store.batch() is called. If so, increases the mapsize in 128MB chunks until the space used relative to the mapsize is beneath a threshold (currently 65%, can be tweaked later)

Does this need to be any more complicated than this?

yeastplume added some commits Feb 19, 2019

phatness: u64,
}

impl PhatChunkStruct {

This comment has been minimized.

Copy link
@antiochp

@yeastplume yeastplume changed the title Dynamic LMDB mapsize allocation Dynamic LMDB mapsize allocation [1.1.0] Feb 19, 2019

@yeastplume yeastplume added this to the 1.1.0 milestone Feb 19, 2019

@antiochp

This comment has been minimized.

Copy link
Member

commented Feb 20, 2019

Is it worth considering doing a "double the size each time" strategy?

env_info.mapsize + ALLOC_CHUNK_SIZE
};
unsafe {
self.env.set_mapsize(new_mapsize)?;

This comment has been minimized.

Copy link
@DavidBurkett

DavidBurkett Feb 20, 2019

Contributor

You need to make sure there are no active txns before resizing. See: https://github.com/monero-project/monero/blob/master/src/blockchain_db/lmdb/db_lmdb.cpp#L517-L539

This comment has been minimized.

Copy link
@yeastplume

yeastplume Feb 20, 2019

Author Member

Yep 👍 I'll look into enforcing that. I was thinking calling it only from the batch creation function helps here, but of course that doesn't take multiple threads with open txns into account.

This comment has been minimized.

Copy link
@yeastplume

yeastplume Feb 20, 2019

Author Member

In the monero code it's just seems to be implemented via a simple reference count of a global atomic. @antiochp @ignopeverell As far as I can see within the node, the Store struct is never wrapped in any mutexes, can you confirm whether there are multiple threads trying to access the ChainStore or PeerStore at any given time?

Also a bit of an issue here with multiple wallet invocations trying to access the store at the same time, which is possible under current architecture.

This comment has been minimized.

Copy link
@antiochp

antiochp Feb 20, 2019

Member

Are we talking any txns here? Or just write txs? (All lmdb access is via a read txn or write txn).

If write txns then lmdb itself is the mutex - it only supports a single write txn at a time (across all threads).
If we successfully create a batch then we're good to go - we guarantee no other thread has a write txn active.

This comment has been minimized.

Copy link
@DavidBurkett

DavidBurkett Feb 20, 2019

Contributor

My understanding is that setting the db mapsize rearranges the indices, affecting reads, too.

This comment has been minimized.

Copy link
@antiochp

antiochp Feb 20, 2019

Member

We don't actually have any mechanism to expose long-lived read txns via our store impl. So I think we're fine there, every read is effectively in its own read txn currently.
So if we take a write lock via our batch and then resize the db we should be good - the next read will simply create a new read txn on the resized db.

@@ -24,6 +24,10 @@ use lmdb_zero::LmdbResultExt;

use crate::core::ser;

/// number of bytes to grow the database by when needed
pub const ALLOC_CHUNK_SIZE: usize = 134_217_728; //128 MB

This comment has been minimized.

Copy link
@DavidBurkett

DavidBurkett Feb 20, 2019

Contributor

Seems a bit low. When Grin has full blocks, this could happen every hour or so, especially for archive peers. I'd be curious to see some metrics around how long this takes to resize. I'd also be interested to see how disruptive mdb_txn_safe::prevent_new_txns() and mdb_txn_safe::wait_no_active_txns() are (see comment on line 152). If either of those operations have a noticeable effect on performance, it'd be better to resize less often.

This comment has been minimized.

Copy link
@yeastplume

yeastplume Feb 20, 2019

Author Member

Sure, can try and collect some metrics once that's implemented.

This comment has been minimized.

Copy link
@yeastplume

yeastplume Feb 21, 2019

Author Member

Just another thought here, it might be debatable whether this is the right size for the chain, but it's already too large for the wallet and peer DB. I might think about making this a parameter somehow without adding too much cruft

This comment has been minimized.

Copy link
@DavidBurkett

DavidBurkett Feb 21, 2019

Contributor

Good point. Maybe we should consider @antiochp's doubling proposal? Or will that eventually be too excessive?

This comment has been minimized.

Copy link
@ignopeverell

ignopeverell Feb 25, 2019

Member

Seems easy enough to tune once we have live data.

}

/// Increments the database size by one ALLOC_CHUNK_SIZE
pub fn do_resize(&self) -> Result<(), Error> {

This comment has been minimized.

Copy link
@DavidBurkett

DavidBurkett Feb 20, 2019

Contributor

Should we check available space, and try to fail more gracefully? Or do you think that's more complexity than it's worth? It's trivial to do a check like that in C++, but not sure if Rust provides APIs for that.

This comment has been minimized.

Copy link
@yeastplume

yeastplume Feb 20, 2019

Author Member

I'd thought this as well, but if trying to allocate larger than disk space you get:

Error: LmdbErr(Error::Code(12, 'Cannot allocate memory'))

Which I think is graceful enough without having to add more complexity here.

This comment has been minimized.

Copy link
@DavidBurkett

DavidBurkett Feb 20, 2019

Contributor

Cool. I agree.

yeastplume added some commits Feb 21, 2019

@yeastplume

This comment has been minimized.

Copy link
Member Author

commented Feb 21, 2019

Right, unfortunately just performing the resize within calls to batch results in segfaulty behaviour somewhere in LMDB. It would seem, all things considered, that the safest thing to do here on resize is to close/drop the database entirely, perform the resize, then re-open.

I've tested this by setting the chunk size to something very small (2MB) and syncing a chain from scratch. With the close/reopen behaviour in place it syncs and expands the db as needed without issue, and fully syncs. Without, it inevitably segfaults somewhere.

The downside here is that the calls to open and close the db, and therefore batch now requires a mutable reference, which means that higher-up references in ChainStore and PeerStore now need to be wrapped in mutexes. It's more cumbersome, but you could argue it's more belt-and-suspendery since we're now much more sure it's safe to reallocate the DB size at the point it's being done. Also, I believe windows will need this in place anyhow in order to resize due to its aggressive file locking.

@yeastplume

This comment has been minimized.

Copy link
Member Author

commented Feb 21, 2019

Also, refactored the store itself a bit to be a bit more encapsulated and ensure callers don't need to explicitly import the lmdb crate.

@antiochp

This comment has been minimized.

Copy link
Member

commented Feb 21, 2019

The downside here is that the calls to open and close the db, and therefore batch now requires a mutable reference, which means that higher-up references in ChainStore and PeerStore now need to be wrapped in mutexes.

That would mean we lose any ability to have multiple readers on the db simultaneously?
Right now that's a nice benefit of LMDB that may be hard to give up (lets multiple peer threads read the db for some early existence checks etc.)

yeastplume added some commits Feb 21, 2019

@yeastplume

This comment has been minimized.

Copy link
Member Author

commented Feb 21, 2019

I've just tried implementing as an RwLock here. How much of a performance hit is this likely to be for the peer store?

yeastplume added some commits Feb 21, 2019

@yeastplume yeastplume referenced this pull request Feb 25, 2019

Open

(GrinWin) - Windows 10 Support Meta-Issue #2525

9 of 11 tasks complete
@@ -24,6 +24,10 @@ use lmdb_zero::LmdbResultExt;

use crate::core::ser;

/// number of bytes to grow the database by when needed
pub const ALLOC_CHUNK_SIZE: usize = 134_217_728; //128 MB

This comment has been minimized.

Copy link
@ignopeverell

ignopeverell Feb 25, 2019

Member

Seems easy enough to tune once we have live data.

@@ -142,7 +141,7 @@ impl OrphanBlockPool {
/// maintains locking for the pipeline to avoid conflicting processing.
pub struct Chain {
db_root: String,
store: Arc<store::ChainStore>,
store: Arc<RwLock<store::ChainStore>>,

This comment has been minimized.

Copy link
@ignopeverell

ignopeverell Feb 25, 2019

Member

I'm fine with the use of a RwLock but can't this be pushed down into our LMDB store? All regular operations would be considered a read, except for the close/open of the DB which would take the write.

This comment has been minimized.

Copy link
@yeastplume

yeastplume Feb 26, 2019

Author Member

Took a bit of doing and testing, but think I've managed it in the latest push.

yeastplume added some commits Feb 26, 2019

@yeastplume

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2019

Think this is ready for merging if anyone wants to give a final review and a little thumbs up somewhere. Since the last comments I've:

  • Moved the RwLock into store::Store itself, and moved all the locks to occur before each read/write transaction. I've tested by setting the increment chunk size to a small value (1MB at a time) and syncing from scratch, doing loads of DB resizes along the way. Current iteration works without issue.
  • I've changed the allocation size logic to keep allocating (currently 128MB) chunks until at least 45% of total mapsize is free. No reason for choosing this value, but it can be tweaked at any stage.
@ignopeverell
Copy link
Member

left a comment

Very nice looking, I like that it removes the lmdb dependency in a few crates!

@yeastplume yeastplume merged commit beaae28 into mimblewimble:milestone/1.1.0 Feb 27, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@antiochp

This comment has been minimized.

Copy link
Member

commented Feb 27, 2019

I've just tried implementing as an RwLock here. How much of a performance hit is this likely to be for the peer store?

I suspect effectively zero given everything else going on.

@yeastplume yeastplume deleted the yeastplume:lmdb_resize branch Mar 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.