Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched graph updates #3367

Merged
merged 7 commits into from
Nov 26, 2020

Conversation

cfromknecht
Copy link
Contributor

@cfromknecht cfromknecht commented Aug 2, 2019

This PR introduces batched operations for channel graph ingestion in order to speed up IGD. Currently a single db transaction is used to process each channel_announcement, channel_update, and node_announcement, which with the current graph size results in around 160k-200k transactions.

With this PR we bring this down to linear in the duration it takes the sync the graph, as we'll no make at most 2 db transactions/sec. OMM I was able to get a 4 minute graph sync using this, implying only about 500 db transactions were made in total.

So far I'm seeing a 2x speedup in IGD. The CPU and memory usage should be much lower, but I'm still in the process of tuning and gathering exact performance metrics.

batch package

To accomplish this, we introduce a new batch package, which exposes a simple API for constructing batched operations on a bbolt database. The primary reason we require this, and can't use bbolt's native Batch operation, is that we maintain two caches external to bbolt, the channel cache and the reject cache.

Since both are write through caches, it is critical that we maintain external consistency whenever a write succeeds. Updating the cache should atomically add the whole batch of updates, or none of them. To achieve this, we need a mechanism for doing locking at a level above bbolt, since we don't have the luxury of reentrant locks.

Most of the batching logic is borrowed from bbolt itself, but we also add some slight modifications to handle any updates to the external cache under a single mutex.

There are two primary components:

// Request defines an operation that can be batched into a single bbolt
// transaction.
type Request struct {
	// Reset is called before each invocation of Update and is used to clear
	// any possible modifications to local state as a result of previous
	// calls to Update that were not comitted due to a concurrent batch
	// failure.
        //
        // NOTE: This field is optional.
	Reset func()

	// Update is applied alongside other operations in the batch.
	//
	// NOTE: This method MUST NOT acquire any mutexes.
	Update func(tx kvdb.RwTx) error

	// OnCommit is called if the batch or a subset of the batch including
	// this request all succeeded without failure. The passed error should
	// contain the result of the transaction commit, as that can still fail
	// even if none of the closures returned an error.
	//
	// NOTE: This field is optional.
	OnCommit func(commitErr error) error
}

 // Scheduler abstracts a generic batching engine that accumulates an incoming
// set of Requests, executes them, and returns the error from the operation.
type Scheduler interface {
	// Execute schedules a Request for execution with the next available
	// batch. The resulting error is returned to the caller.
	Execute(req *Request) error
}

Different Schedulers may implement different policies for when a batch gets executed. The current one implemented, TimeScheduler, uses a fixed horizon after which it executes any concurrent requests added. Future schedulers may take into account metrics such as number of pending items in the batch or overall request throughput before deciding when to trigger, but that is left as an area for future research.

Using this package, it becomes simple to express our batched operations, here's an example of BatchedAddChannelEdge:

func (c *ChannelGraph) BatchedAddChannelEdge(edge *ChannelEdgeInfo) error {
	var alreadyExists bool
	return c.chanScheduler.Execute(&batch.Request{
		Reset: func() {
			alreadyExists = false
		},
		Update: func(tx kvdb.RwTx) error {
			err := c.addChannelEdge(tx, edge)

			// Silence ErrEdgeAlreadyExist so that the batch can
			// succeed, but propagate the error via local state.
			if err == ErrEdgeAlreadyExist {
				alreadyExists = true
				return nil
			}

			alreadyExists = false
			return err
		},
		OnCommit: func(err error) error {
			switch {
			case err != nil:
				return err
			case alreadyExists:
				return ErrEdgeAlreadyExist
			default:
				c.rejectCache.remove(edge.ChannelID)
				c.chanCache.remove(edge.ChannelID)
				return nil
			}
		},
	})
}

whose logic largely coincides with what we'd expect, given the unbatched variant:

func (c *ChannelGraph) AddChannelEdge(edge *ChannelEdgeInfo) error {
	c.cacheMu.Lock()
	defer c.cacheMu.Unlock()	

 	err := kvdb.Update(c.db, func(tx *bbolt.Tx) error {	
		return c.addChannelEdge(tx, edge)	
	}, func() {})	
	if err != nil {	
		return err	
	}	

 	c.rejectCache.remove(edge.ChannelID)	
	c.chanCache.remove(edge.ChannelID)	

 	return nil
}

Note that in the batched variant, c.cacheMu is passed into the creation of the chanScheduler and is automatically acquired by the scheduler, so it is safe to modify the cache directly within the OnCommit closure.

There are a few other areas in the code base where we may wish to adopt, e.g. the CircuitMap, which should be a straightforward change using this primitive. I suspect this may be of interest to other projects using bbolt, we may want to consider making the batch package its own versioned go module.

testing

Before proceeding in writing some more tests, I'm putting this up on travis with an temporary commit that causes the unbatched APIs to use the batched ones under the hood. This should give us a good idea that the semantics are unchanged before proceeding in writing more extensive tests to handle edge cases in the batched API.

Closes #3288

@wpaulino wpaulino added this to the 0.8.0 milestone Aug 2, 2019
@wpaulino wpaulino added enhancement Improvements to existing features / behaviour gossip graph optimization v0.8.0 labels Aug 2, 2019
@cfromknecht cfromknecht force-pushed the batched-graph-updates branch 2 times, most recently from 88adc5e to d34b61a Compare August 2, 2019 01:09
db: db,
rejectCache: newRejectCache(rejectCacheSize),
chanCache: newChannelCache(chanCacheSize),
}
g.chanScheduler = batch.NewTimeScheduler(db.DB, &g.cacheMu, 10*time.Millisecond)
g.nodeScheduler = batch.NewTimeScheduler(db.DB, nil, 10*time.Millisecond)
Copy link
Collaborator

@Crypt-iQ Crypt-iQ Aug 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to set this slightly higher and still achieve the same sync time of about 2 minutes (granted, the peers I synced with on different runs were very fast). I used as high as 8 seconds, less db txn's. What do you think? I think perhaps the two schedulers could be staggered at different intervals so as to not cause db contention since we receive their messages at around the same time anyways?

With 10ms, I see writes taking a minimum of 20ms with very small batches (1-6 requests with the occasional large batch). Since there can only be one db writer at a time:

  • If there is another batch during this time, it will block until the first one completes. This can cause several batches to be stalled (not really a problem in terms of time in this scenario but not ideal).
  • More db txn's means more heap usage.

So far this doesn't affect the overall sync time in any way, but may cause more write contention with the database.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see some memory usage with dereference calls when growing the mmap since these are large writes if 8 second batch time is used (~10MB) so a case could be made for InitialMmapSize as discussed in #3278 , but this is nothing compared to the GetBlock calls (6 GB +) made in the router.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see a reduction from the 10ms scheduler at 845MB:

ROUTINE ======================== github.com/lightningnetwork/lnd/batch.(*batch).trigger in /Users/nsa/go/src/github.com/lightningnetwork/lnd/batch/batch.go
         0   845.71MB (flat, cum)  7.09% of Total
         .          .     27:}
         .          .     28:
         .          .     29:// trigger is the entry point for the batch and ensures that run is started at
         .          .     30:// most once.
         .          .     31:func (b *batch) trigger() {
         .   845.71MB     32:	b.start.Do(b.run)
         .          .     33:}
         .          .     34:
         .          .     35:// run executes the current batch of requests. If any individual requests fail
         .          .     36:// alongside others they will be retried by the caller.
         .          .     37:func (b *batch) run() {

to 269MB with the 8s scheduler:

ROUTINE ======================== github.com/lightningnetwork/lnd/batch.(*batch).trigger in /Users/nsa/go/src/github.com/lightningnetwork/lnd/batch/batch.go
         0   269.59MB (flat, cum)  1.80% of Total
         .          .     27:}
         .          .     28:
         .          .     29:// trigger is the entry point for the batch and ensures that run is started at
         .          .     30:// most once.
         .          .     31:func (b *batch) trigger() {
         .   269.59MB     32:	b.start.Do(b.run)
         .          .     33:}
         .          .     34:
         .          .     35:// run executes the current batch of requests. If any individual requests fail
         .          .     36:// alongside others they will be retried by the caller.
         .          .     37:func (b *batch) run() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Crypt-iQ yes the 10ms time is just a temporary commit so that i could drop in the batched version into the regular API and verify it has the same semantics w/o having to modify any timeouts.

In practice we should be using a timeout on the order of seconds. I did my testing with a duration of 1 second and it performed quite well. We could probably do 8 seconds as well, though this also delays the time till first write pretty significantly. Most of the writes were taking 10-20ms, so only about 2-4% of one core would be dedicated to the db txn itself if we go with 1 second, and even lower with 8.

The main thing left to do is benchmark this on mobile and get a sense for how these parameters behave on different architectures, just haven't gotten around to doing so :)

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will indeed be interesting to see how fast the IGD is on low powered devices. I saw my rpi3 take 5 hours for sync on a 100 mbit fiber connection.

My main question related to this PR is whether this is the right optimization path to take.

How would this PR compare to a different, quite extreme, approach: The graph only exists in memory. It is updated in memory and queried from memory (which we want at some point anyway to speed up path finding). Existing caches can be removed / merged with the in-memory graph.

Then there is a background process that periodically (maybe once a minute) serializes all of it into a byte array and writes it in a single bbolt key (or file even). On startup, that value is deserialized into memory again. Because the graph is non critical data, it is not a problem if we don't have all the very latest data on disk.

batch/batch.go Show resolved Hide resolved
batch/scheduler.go Show resolved Hide resolved
batch/batch.go Show resolved Hide resolved
batch/batch.go Show resolved Hide resolved
batch/scheduler.go Show resolved Hide resolved
db: db,
rejectCache: newRejectCache(rejectCacheSize),
chanCache: newChannelCache(chanCacheSize),
}
g.chanScheduler = batch.NewTimeScheduler(db.DB, &g.cacheMu, time.Second)
g.nodeScheduler = batch.NewTimeScheduler(db.DB, nil, time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no node cache, couldn't this just use the normal bbolt batching?

channeldb/graph.go Outdated Show resolved Hide resolved
edgeNotFound bool
)
return c.chanScheduler.Execute(&batch.Request{
Update: func(tx *bbolt.Tx) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an option to add two error return parameters, to prevent this pattern with the local variable?

discovery/gossiper.go Show resolved Hide resolved
@Roasbeef Roasbeef requested review from valentinewallace and removed request for halseth August 13, 2019 01:41
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature, so excited to get this in for mobile users!! 🙌

Mostly for feedback I just have some questions. Looking forward to reading what the testing looks like, but time-based batching scheduler seems like the perfect starting point for batching db txs like these 👍

batch/batch.go Show resolved Hide resolved
batch/batch.go Outdated
// return the error directly.
for _, req := range b.reqs {
if req.OnSuccess != nil {
req.errChan <- req.OnSuccess(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've reached this point, won't err always be nil? (Afaict, err only gets assigned here, and this line prevents us from ever reaching this line w/ a non-nil error)

If this is the case, seems like passing in nil might be a bit more clear (maybe that's just not conventional?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little subtle, but this value can still be non-nil. it's true that at this point, none of the requests fail inside the db txn. however, it's still possible for the txn itself to fail. that error is passed into OnSuccess so that any errors in persisting the changes can be returned at the call sites

batch/scheduler.go Show resolved Hide resolved
// in a channel update.
// information. Note that this method is expected to only be called to update an
// already present node from a node announcement, or to insert a node found in a
// channel update.
//
// TODO(roasbeef): also need sig of announcement
func (c *ChannelGraph) AddLightningNode(node *LightningNode) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if the plan is to get rid of this function since it just purely wraps BatchedAddLightningNode now? And same w/ AddChannelEdge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point, yes possibly. currently they are still useful esp. in unit tests, since we wouldn't have to set custom timeouts and stuff for tests. it would be a pretty big investment to do so tho because most of the test infra is already written with the one-off methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up making the timeout configurable from the config file, so everything passes through the original methods!

batch/interface.go Show resolved Hide resolved
batch/scheduler.go Show resolved Hide resolved
@cfromknecht cfromknecht modified the milestones: 0.8.0, 0.8.1 Sep 16, 2019
@cfromknecht cfromknecht added v0.8.1 and removed v0.8.0 labels Sep 16, 2019
@wpaulino wpaulino removed the v0.8.1 label Oct 8, 2019
@joostjager joostjager modified the milestones: 0.8.1, 0.9 Oct 8, 2019
@Crypt-iQ
Copy link
Collaborator

Ran some benchmarks using Neutrino and assumechanvalid, this is a significant speedup!

A full graph sync on mainnet (about 31k channels as we speak) went down from about 32 minutes to 5m 45s on my setup! 😮

However, I experimented a bit with different batch intervals, and setting it to 500ms instead of 1s actually gave me a 4m 35s run! So I think it would make sense to expose this as a configuration option, at least until we know more about what is the "right" number here.

Could you get some pprof output comparisons? There is also a memory gain to be had and that would be especially beneficial to neutrino users.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid PR! Also a relatively compact diff. Completed my first "new" pass since this PR was resurrected. Will do at least one additional pass. Still need to wrap my head around the way the batch itself interacts with the scheduler in a concurrent safe manner...

discovery/gossiper.go Show resolved Hide resolved
func (b *batch) run() {
// Clear the batch from its scheduler, ensuring that no new requests are
// added to this batch.
b.clear(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still getting through this, but possibly couldn't a new batch instance be created that already removes the scheduled items from the scheduler? Seems like an odd "up call" in this context.

batch/interface.go Show resolved Hide resolved
batch/scheduler.go Show resolved Hide resolved
channeldb/graph.go Outdated Show resolved Hide resolved
channeldb/graph.go Outdated Show resolved Hide resolved
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥮

(pending resolution of the last commit)

I think we can either use a build tag to force the lower interval during the tests, or make it a config option and have all the tests specify their own value. No strong feelings in either direction.

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, still think we should make the interval configurable, should also help setting it low during tests!

channeldb/graph.go Outdated Show resolved Hide resolved
channeldb/graph.go Outdated Show resolved Hide resolved
c.chanCache.remove(edge.ChannelID)

return nil
return c.BatchedAddChannelEdge(edge)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change to be reverted or no? If makes sense to only have AddChannelEdge and let that be the batched version?

return err
}

if s.locker != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here

func (b *batch) run() {
// Clear the batch from its scheduler, ensuring that no new requests are
// added to this batch.
b.clear(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should copy bbolt if it is not to our standards 😛 Just looks like this could be made much simpler, but also not a blocker for me.

@Roasbeef
Copy link
Member

Ping on this @cfromknecht, just needs the config flag now AFAIU.

This allows for a 1000 different validation operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the gossiper busy while batches are constructed.
This allows for a 1000 different persistent operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the router busy while batches are constructed.
@Roasbeef
Copy link
Member

Config check is failing:

Command line flag --db.batch-commit-interval not added to sample-lnd.conf

)

// DB holds database configuration for LND.
type DB struct {
Backend string `long:"backend" description:"The selected database backend."`

BatchCommitInterval time.Duration `long:"batch-commit-interval" description:"The maximum duration the channel graph batch schedulers will wait before attempting to commit a batch of pending updates. This can be tradeoff database contenion for commit latency."`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively we could add this behind a build flag to avoid exposing it to "normal" users (would also not require to update the sample config)

This will permit a greater degree of tuning or customization depending
on various hardware/environmental factors.
@Roasbeef Roasbeef merged commit 7e298f1 into lightningnetwork:master Nov 26, 2020
@cfromknecht cfromknecht deleted the batched-graph-updates branch December 3, 2020 23:10
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to existing features / behaviour gossip graph optimization v0.12
Projects
None yet
Development

Successfully merging this pull request may close these issues.

routing+channeldb: implement batched channel announcement+update writing
9 participants