Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a Cache For Monitor Updates #75

Merged
merged 5 commits into from Apr 29, 2021
Merged

Conversation

dave-tucker
Copy link
Collaborator

@dave-tucker dave-tucker commented Apr 12, 2021

(See Original Comments on eBay#35)

This makes a few significant API changes in order to provide a cache for monitor updates.
Not only does it improve the API, but it also makes it possible to implement a Get (from cache) in the ORM API proposed in #78

The API changes are as follows:

  1. Restrict the client to have only one DatabaseSchema. This mirrors how ovsdb-server is run in the wild (even though techincally a server can support multiple schemas). This has the advantage of removing the database argument from the Transact, Monitor... APIs
  2. Replace Disconnected(*OvsClient) with Disconnected().The client needs to know about a handler to register it, but the handler shouldn't need to know about the client. Even if it did, that could be achieved through channels. Either way, the Go compiler didn't seem to like passing that client around once I'd added the cache feature.

The feature itself can be seen in operation in the play_with_ovs example.

In short, the OvsClient exposes a Cache and associated handler to ensure the cache is updated.
As such, a user can call Monitor without a handler defined and the initial state and all updates will populate in to the cache.

The cache API is very simple:

# Get a list of all tables
client.Cache.Tables()

# Get a table
client.Cache.Table("Open_vSwitch")

# Get a list of all row uuids in a table
client.Cache.Tables("Open_vSwitch").Rows()

# Get a row in a table by UUID
client.Cache.Tables("Open_vSwitch").Row("uuid")

In addition, a user may choose to register a calback so they can perform an action when a cache item is updated:

ovs.Cache.AddEventHandler(&libovsdb.EventHandlerFuncs{
AddFunc: func(table string, row libovsdb.Row) {
	if table == bridgeTable {
		update <- row
	}
},
})

Misc changes included in this PR to make sure CI is green:

  1. Fix linting/formatting of encoding_test.go
  2. Fix play_with_ovs which was a case of s/"ovsTable"/ovsTable/g

/cc @amorenoz @hzhou8

Closes: #85

cache.go Show resolved Hide resolved
@amorenoz amorenoz mentioned this pull request Apr 13, 2021
@coveralls
Copy link

coveralls commented Apr 13, 2021

Coverage Status

Coverage increased (+1.06%) to 77.99% when pulling e554ba1 on dave-tucker:cache into 177adf8 on socketplane:main.

@dave-tucker
Copy link
Collaborator Author

dave-tucker commented Apr 14, 2021

@vtolstov to your comment in #78

populate is getting called for every update notification which can happen a lot. we also claim a write lock on the cache when we are doing populate. in summary, we need that code to execute as fast as possible to avoid blocking reads, but the eventhandlers are "unknown" - could be fast running code, could be slow running code.

calling the handlers synchronously doesn't seem like a good idea because if a library user writes a handler that is slow, then all cache reads will be blocked until all handlers are done processing events.

calling the handler async gets past the locking issues, but as you mentioned, having unbounded numbers of goroutines isn't great either.

one other thing i can think of is to create buffered channels for Add/Update/Delete operations and have a goroutine that reads those channels and dispatches events synchronously. worst case, the channel buffer fills and we block on sending updates to the channel until there is room. to avoid blocking in that case, we could use a ring buffer for the channels, and then we'd drop events when the channels are full.

i'm curious as to what behaviour would be preferred?

@dave-tucker
Copy link
Collaborator Author

I pushed an update that changes the locks to RWMutex as suggested by @vtolstov in #78

@vtolstov
Copy link
Contributor

@vtolstov to your comment in #78

populate is getting called for every update notification which can happen a lot. we also claim a write lock on the cache when we are doing populate. in summary, we need that code to execute as fast as possible to avoid blocking reads, but the eventhandlers are "unknown" - could be fast running code, could be slow running code.

calling the handlers synchronously doesn't seem like a good idea because if a library user writes a handler that is slow, then all cache reads will be blocked until all handlers are done processing events.

calling the handler async gets past the locking issues, but as you mentioned, having unbounded numbers of goroutines isn't great either.

one other thing i can think of is to create buffered channels for Add/Update/Delete operations and have a goroutine that reads those channels and dispatches events synchronously. worst case, the channel buffer fills and we block on sending updates to the channel until there is room. to avoid blocking in that case, we could use a ring buffer for the channels, and then we'd drop events when the channels are full.

i'm curious as to what behaviour would be preferred?

I'm prefer something like worker pool of goroutines. So user of library can decide how much resources can be used and how fast updates processed. Another thing that may be we can have some deadline for event processing and if consumer is slow - log some warning and drop some events.
I'm keep in my mind something like github.com/panjf2000/ants so we can wait for free the goroutine in pool or get error that all goroutines in pool are busy and event will be dropped

@dave-tucker
Copy link
Collaborator Author

dave-tucker commented Apr 15, 2021

@vtolstov

I'm prefer something like worker pool of goroutines. So user of library can decide how much resources can be used and how fast updates processed.

Ack. ants looks really cool. thanks for sharing!

Kubernetes has a similar problem in the cache in client-go, and I think perhaps we could follow the same pattern.

  1. We state emphatically in the documentation that the cache is not designed to service slow handlers
  2. If you really need to do something slow in a handler function, then instead write a small handler to push events in to a queue in your code, and service that queue with as many goroutines as you want

This way, we can keep the libovsdb footprint small - one additional goroutine to dispatch events to listeners - and give the user complete control of the thread pool for event handling.

Now I just need to cobble together a quick ring buffer and we should be good to go 😉

@vtolstov
Copy link
Contributor

@vtolstov

I'm prefer something like worker pool of goroutines. So user of library can decide how much resources can be used and how fast updates processed.

Ack. ants looks really cool. thanks for sharing!

Kubernetes has a similar problem in the cache in client-go, and I think perhaps we could follow the same pattern.

  1. We state emphatically in the documentation that the cache is not designed to service slow handlers
  2. If you really need to do something slow in a handler function, then instead write a small handler to push events in to a queue in your code, and service that queue with as many goroutines as you want

This way, we can keep the libovsdb footprint small - one additional goroutine to dispatch events to listeners - and give the user complete control of the thread pool for event handling.

Now I just need to cobble together a quick ring buffer and we should be good to go

I think that this will be good.

@dave-tucker
Copy link
Collaborator Author

dave-tucker commented Apr 20, 2021

@vtolstov done. the ring buffer isn't very well optimised, but works well enough for the single writer, multiple reader we have currently:

go test -bench=. -run=XXX ./internal/buffer                                                                                                                              ✔  
goos: linux
goarch: amd64
pkg: github.com/socketplane/libovsdb/internal/buffer
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkWrite-8        96289977                12.37 ns/op
BenchmarkRead-8         551214096                1.920 ns/op
PASS
ok      github.com/socketplane/libovsdb/internal/buffer 3.472s

I've used RWMutex on the cache as suggested, but based on our usage pattern we might actually be better off using sync.Map - it's something worth experimenting with.

@hzhou8 @amorenoz I think this is pretty ready for review now if you have a spare minute.

@coveralls
Copy link

coveralls commented Apr 22, 2021

Pull Request Test Coverage Report for Build 792357558

  • 151 of 182 (82.97%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.0%) to 77.635%

Changes Missing Coverage Covered Lines Changed/Added Lines %
client.go 32 39 82.05%
cache.go 119 143 83.22%
Totals Coverage Status
Change from base Build 790941841: 1.0%
Covered Lines: 788
Relevant Lines: 1015

💛 - Coveralls

example/play_with_ovs/play_with_ovs.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
cache.go Outdated Show resolved Hide resolved
client.go Outdated Show resolved Hide resolved
client.go Outdated Show resolved Hide resolved
cache.go Outdated Show resolved Hide resolved
@dave-tucker
Copy link
Collaborator Author

@amorenoz comments addressed. Please take another look.

Thanks to your comments I made some improvements to the ring buffer impl and added some additional docs.
Read performance increased 🎉 and so did Write performance, until it was spoiled by Go's race detector.
I've instead had to use atomic.Value for the backing array 😢 performance is not as good for writes now, but it's good enough,

While RFC 7047 allows for a database server to have multiple databases,
in the wild, this is not the case for Open vSwitch or OVN.

There are cases where supporting multiple databases would be odd.
For example, if you were to Monitor the address_set table of the OVN
North and South databases, the client would get no indication in the
corresponding update notifiction of which DB the update was for.

As such, it seems sensible to only support one database per client

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Since the Client can only have one database, we no longer
need to provide the client methods with database name.

This provides a much more simple API

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This can be handled client side with a channel

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Also use log.Fatal for errors as it's less verbose

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
The client now includes a cache, and also registers a handler
to ensure the cache is updated once Monitor or MonitorAll is called.
It also permits clients to register callbacks for Add/Update/Delete
cache events, which is a nicer API that registering your own handler and
unpacking the updates yourself.

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
@dave-tucker
Copy link
Collaborator Author

@amorenoz suggested that we use channels instead.
I've implemented it and (as we guessed) it simplifies the implementation and performs a little better.

The only drawback is that when the buffer is full, we drop events vs in the ring buffer we would overwrite older events.

In both cases you're losing events, and it's up to the library user to make sure that doesn't happen.
However with the ring buffer you lose old events, whereas with the channels you lose new events.

As neither has a clear advantage over the other I suggest we stick with the simple implementation for now, and we can re-visit the buffer again if we need it in future.

@dave-tucker
Copy link
Collaborator Author

# CHANNELS
$ time go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100                                                                                                 Summary:
        Insertions: 202
        Deletions: 100
go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100  0.68s user 0.15s system 129% cpu 0.645 total
# BUFFER
$ time go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100                                                                                                  Summary:
        Insertions: 188
        Deletions: 90
go run ./example/stress/stress.go -ovsdb tcp::49154 -ninserts 100  1.27s user 0.18s system 177% cpu 0.822 total

@amorenoz
Copy link
Collaborator

LGTM

@dave-tucker dave-tucker merged commit d8f47af into ovn-org:main Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache
4 participants