Implement a Cache For Monitor Updates #75

dave-tucker · 2021-04-12T14:16:24Z

(See Original Comments on eBay#35)

This makes a few significant API changes in order to provide a cache for monitor updates.
Not only does it improve the API, but it also makes it possible to implement a Get (from cache) in the ORM API proposed in #78

The API changes are as follows:

Restrict the client to have only one DatabaseSchema. This mirrors how ovsdb-server is run in the wild (even though techincally a server can support multiple schemas). This has the advantage of removing the database argument from the Transact, Monitor... APIs
Replace Disconnected(*OvsClient) with Disconnected().The client needs to know about a handler to register it, but the handler shouldn't need to know about the client. Even if it did, that could be achieved through channels. Either way, the Go compiler didn't seem to like passing that client around once I'd added the cache feature.

The feature itself can be seen in operation in the play_with_ovs example.

In short, the OvsClient exposes a Cache and associated handler to ensure the cache is updated.
As such, a user can call Monitor without a handler defined and the initial state and all updates will populate in to the cache.

The cache API is very simple:

# Get a list of all tables
client.Cache.Tables()

# Get a table
client.Cache.Table("Open_vSwitch")

# Get a list of all row uuids in a table
client.Cache.Tables("Open_vSwitch").Rows()

# Get a row in a table by UUID
client.Cache.Tables("Open_vSwitch").Row("uuid")

In addition, a user may choose to register a calback so they can perform an action when a cache item is updated:

ovs.Cache.AddEventHandler(&libovsdb.EventHandlerFuncs{
AddFunc: func(table string, row libovsdb.Row) {
	if table == bridgeTable {
		update <- row
	}
},
})

Misc changes included in this PR to make sure CI is green:

Fix linting/formatting of encoding_test.go
Fix play_with_ovs which was a case of s/"ovsTable"/ovsTable/g

/cc @amorenoz @hzhou8

Closes: #85

cache.go

coveralls · 2021-04-13T16:32:26Z

Coverage increased (+1.06%) to 77.99% when pulling e554ba1 on dave-tucker:cache into 177adf8 on socketplane:main.

dave-tucker · 2021-04-14T21:01:23Z

@vtolstov to your comment in #78

populate is getting called for every update notification which can happen a lot. we also claim a write lock on the cache when we are doing populate. in summary, we need that code to execute as fast as possible to avoid blocking reads, but the eventhandlers are "unknown" - could be fast running code, could be slow running code.

calling the handlers synchronously doesn't seem like a good idea because if a library user writes a handler that is slow, then all cache reads will be blocked until all handlers are done processing events.

calling the handler async gets past the locking issues, but as you mentioned, having unbounded numbers of goroutines isn't great either.

one other thing i can think of is to create buffered channels for Add/Update/Delete operations and have a goroutine that reads those channels and dispatches events synchronously. worst case, the channel buffer fills and we block on sending updates to the channel until there is room. to avoid blocking in that case, we could use a ring buffer for the channels, and then we'd drop events when the channels are full.

i'm curious as to what behaviour would be preferred?

dave-tucker · 2021-04-14T21:17:35Z

I pushed an update that changes the locks to RWMutex as suggested by @vtolstov in #78

vtolstov · 2021-04-14T21:52:40Z

@vtolstov to your comment in #78

populate is getting called for every update notification which can happen a lot. we also claim a write lock on the cache when we are doing populate. in summary, we need that code to execute as fast as possible to avoid blocking reads, but the eventhandlers are "unknown" - could be fast running code, could be slow running code.

calling the handlers synchronously doesn't seem like a good idea because if a library user writes a handler that is slow, then all cache reads will be blocked until all handlers are done processing events.

calling the handler async gets past the locking issues, but as you mentioned, having unbounded numbers of goroutines isn't great either.

one other thing i can think of is to create buffered channels for Add/Update/Delete operations and have a goroutine that reads those channels and dispatches events synchronously. worst case, the channel buffer fills and we block on sending updates to the channel until there is room. to avoid blocking in that case, we could use a ring buffer for the channels, and then we'd drop events when the channels are full.

i'm curious as to what behaviour would be preferred?

I'm prefer something like worker pool of goroutines. So user of library can decide how much resources can be used and how fast updates processed. Another thing that may be we can have some deadline for event processing and if consumer is slow - log some warning and drop some events.
I'm keep in my mind something like github.com/panjf2000/ants so we can wait for free the goroutine in pool or get error that all goroutines in pool are busy and event will be dropped

dave-tucker · 2021-04-15T11:18:18Z

@vtolstov

I'm prefer something like worker pool of goroutines. So user of library can decide how much resources can be used and how fast updates processed.

Ack. ants looks really cool. thanks for sharing!

Kubernetes has a similar problem in the cache in client-go, and I think perhaps we could follow the same pattern.

We state emphatically in the documentation that the cache is not designed to service slow handlers
If you really need to do something slow in a handler function, then instead write a small handler to push events in to a queue in your code, and service that queue with as many goroutines as you want

This way, we can keep the libovsdb footprint small - one additional goroutine to dispatch events to listeners - and give the user complete control of the thread pool for event handling.

Now I just need to cobble together a quick ring buffer and we should be good to go 😉

vtolstov · 2021-04-15T11:42:16Z

@vtolstov

I'm prefer something like worker pool of goroutines. So user of library can decide how much resources can be used and how fast updates processed.

Ack. ants looks really cool. thanks for sharing!

Kubernetes has a similar problem in the cache in client-go, and I think perhaps we could follow the same pattern.

We state emphatically in the documentation that the cache is not designed to service slow handlers

If you really need to do something slow in a handler function, then instead write a small handler to push events in to a queue in your code, and service that queue with as many goroutines as you want

This way, we can keep the libovsdb footprint small - one additional goroutine to dispatch events to listeners - and give the user complete control of the thread pool for event handling.

Now I just need to cobble together a quick ring buffer and we should be good to go

I think that this will be good.

dave-tucker · 2021-04-20T15:22:05Z

@vtolstov done. the ring buffer isn't very well optimised, but works well enough for the single writer, multiple reader we have currently:

go test -bench=. -run=XXX ./internal/buffer                                                                                                                              ✔  
goos: linux
goarch: amd64
pkg: github.com/socketplane/libovsdb/internal/buffer
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkWrite-8        96289977                12.37 ns/op
BenchmarkRead-8         551214096                1.920 ns/op
PASS
ok      github.com/socketplane/libovsdb/internal/buffer 3.472s

I've used RWMutex on the cache as suggested, but based on our usage pattern we might actually be better off using sync.Map - it's something worth experimenting with.

@hzhou8 @amorenoz I think this is pretty ready for review now if you have a spare minute.

coveralls · 2021-04-22T12:51:11Z

Pull Request Test Coverage Report for Build 792357558

151 of 182 (82.97%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+1.0%) to 77.635%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
client.go	32	39	82.05%
cache.go	119	143	83.22%

Totals
Change from base Build 790941841:	1.0%
Covered Lines:	788
Relevant Lines:	1015

💛 - Coveralls

example/play_with_ovs/play_with_ovs.go

internal/buffer/buffer.go

cache.go

client.go

cache.go

dave-tucker · 2021-04-26T10:19:47Z

@amorenoz comments addressed. Please take another look.

Thanks to your comments I made some improvements to the ring buffer impl and added some additional docs.
Read performance increased 🎉 and so did Write performance, until it was spoiled by Go's race detector.
I've instead had to use atomic.Value for the backing array 😢 performance is not as good for writes now, but it's good enough,

While RFC 7047 allows for a database server to have multiple databases, in the wild, this is not the case for Open vSwitch or OVN. There are cases where supporting multiple databases would be odd. For example, if you were to Monitor the address_set table of the OVN North and South databases, the client would get no indication in the corresponding update notifiction of which DB the update was for. As such, it seems sensible to only support one database per client Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

Since the Client can only have one database, we no longer need to provide the client methods with database name. This provides a much more simple API Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

This can be handled client side with a channel Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

Also use log.Fatal for errors as it's less verbose Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

internal/buffer/buffer.go

The client now includes a cache, and also registers a handler to ensure the cache is updated once Monitor or MonitorAll is called. It also permits clients to register callbacks for Add/Update/Delete cache events, which is a nicer API that registering your own handler and unpacking the updates yourself. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

dave-tucker · 2021-04-28T10:51:19Z

@amorenoz suggested that we use channels instead.
I've implemented it and (as we guessed) it simplifies the implementation and performs a little better.

The only drawback is that when the buffer is full, we drop events vs in the ring buffer we would overwrite older events.

In both cases you're losing events, and it's up to the library user to make sure that doesn't happen.
However with the ring buffer you lose old events, whereas with the channels you lose new events.

As neither has a clear advantage over the other I suggest we stick with the simple implementation for now, and we can re-visit the buffer again if we need it in future.

dave-tucker · 2021-04-28T10:52:22Z

# CHANNELS
$ time go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100                                                                                                 Summary:
        Insertions: 202
        Deletions: 100
go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100  0.68s user 0.15s system 129% cpu 0.645 total
# BUFFER
$ time go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100                                                                                                  Summary:
        Insertions: 188
        Deletions: 90
go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100  1.27s user 0.18s system 177% cpu 0.822 total

amorenoz · 2021-04-29T15:31:51Z

LGTM

amorenoz reviewed Apr 13, 2021

View reviewed changes

cache.go Show resolved Hide resolved

amorenoz mentioned this pull request Apr 13, 2021

Add ORM API #78

Merged

dave-tucker force-pushed the cache branch 2 times, most recently from 47f59c2 to 4cdfd00 Compare April 13, 2021 16:23

dave-tucker force-pushed the cache branch from 4cdfd00 to e554ba1 Compare April 14, 2021 21:16

dave-tucker force-pushed the cache branch 2 times, most recently from 9a56679 to eb7eff9 Compare April 20, 2021 15:18

dave-tucker mentioned this pull request Apr 21, 2021

Cache #85

Closed

dave-tucker force-pushed the cache branch from eb7eff9 to f1748ac Compare April 22, 2021 12:45

amorenoz reviewed Apr 22, 2021

View reviewed changes

dave-tucker force-pushed the cache branch from f1748ac to faf55cc Compare April 26, 2021 10:14

dave-tucker force-pushed the cache branch from faf55cc to 02c7271 Compare April 26, 2021 10:22

dave-tucker added 4 commits April 26, 2021 21:53

client: Simplify the API by removing database name

328505a

Since the Client can only have one database, we no longer need to provide the client methods with database name. This provides a much more simple API Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

client: Don't pass reference to Disconnected

b98f89d

This can be handled client side with a channel Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

play_with_ovs: Fix validation error

5ed28c4

Also use log.Fatal for errors as it's less verbose Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

dave-tucker force-pushed the cache branch from 02c7271 to 5e39f60 Compare April 26, 2021 20:57

amorenoz reviewed Apr 27, 2021

View reviewed changes

internal/buffer/buffer.go Outdated Show resolved Hide resolved

internal/buffer/buffer.go Outdated Show resolved Hide resolved

dave-tucker force-pushed the cache branch from 5e39f60 to 65208d4 Compare April 28, 2021 10:43

dave-tucker force-pushed the cache branch from 65208d4 to a26d99f Compare April 28, 2021 10:51

dave-tucker merged commit d8f47af into ovn-org:main Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a Cache For Monitor Updates #75

Implement a Cache For Monitor Updates #75

dave-tucker commented Apr 12, 2021 •

edited

coveralls commented Apr 13, 2021 •

edited

dave-tucker commented Apr 14, 2021 •

edited

dave-tucker commented Apr 14, 2021

vtolstov commented Apr 14, 2021

dave-tucker commented Apr 15, 2021 •

edited

vtolstov commented Apr 15, 2021

dave-tucker commented Apr 20, 2021 •

edited

coveralls commented Apr 22, 2021 •

edited

dave-tucker commented Apr 26, 2021

dave-tucker commented Apr 28, 2021

dave-tucker commented Apr 28, 2021

amorenoz commented Apr 29, 2021

Implement a Cache For Monitor Updates #75

Implement a Cache For Monitor Updates #75

Conversation

dave-tucker commented Apr 12, 2021 • edited

coveralls commented Apr 13, 2021 • edited

dave-tucker commented Apr 14, 2021 • edited

dave-tucker commented Apr 14, 2021

vtolstov commented Apr 14, 2021

dave-tucker commented Apr 15, 2021 • edited

vtolstov commented Apr 15, 2021

dave-tucker commented Apr 20, 2021 • edited

coveralls commented Apr 22, 2021 • edited

Pull Request Test Coverage Report for Build 792357558

💛 - Coveralls

dave-tucker commented Apr 26, 2021

dave-tucker commented Apr 28, 2021

dave-tucker commented Apr 28, 2021

amorenoz commented Apr 29, 2021

dave-tucker commented Apr 12, 2021 •

edited

coveralls commented Apr 13, 2021 •

edited

dave-tucker commented Apr 14, 2021 •

edited

dave-tucker commented Apr 15, 2021 •

edited

dave-tucker commented Apr 20, 2021 •

edited

coveralls commented Apr 22, 2021 •

edited