Storage: Watch support #82282

DanCech · 2024-02-10T01:18:52Z

This PR adds unified storage support for k8s WATCH

Within the storage server it uses a broadcaster pattern to consume events from the database and dispatch them to all subscribed watchers, each watcher filters the broadcasted events and returns matching events to the grpc client.

When a new watch is initiated, watchInit reads the first batch of records from the database, then subscribes to the broadcaster. The broadcaster maintains an in-memory circular buffer of recent events to help bridge any gap between the watchInit query and the subscriber connecting, the contents of the buffer are replayed to each new subscriber before any new events.

The subscriber is responsible for discarding any duplicate events and for filtering events received from the broadcaster to return only those events applicable to the requested watch.

In the k8s storage implementation, we simply use the grpc client to connect to the storage api and stream the results to the consumer via the k8s-provided StreamWatcher implementation.

Currently the database event consumer is implemented via a background poller which reads from the entity_history table, but we can add support for alternative implementations that tail binary logs or use changefeeds to discover new entries in entity_history.

Right now the circular buffer size, broadcaster subscription channel buffer size, sql batch size, and poll interval are hardcoded but they can be made configurable in the future if/when needed.

pkg/services/apiserver/storage/entity/storage.go

pkg/services/store/entity/sqlstash/sql_storage_server.go

DanCech · 2024-02-14T15:34:35Z

added an initial implementation using a single poller that broadcasts out to all watchers, I still need to look at tweaking the startup process for new watches that specify a "since" value (which may be in the past) but it's more or less working and has the basics we need to swap in a different tail mechanism by providing the broadcaster with a different connectFunc implementation

chaudyg · 2024-02-15T15:18:23Z

pkg/services/store/entity/sqlstash/sql_storage_server.go

+
+	fields := s.getReadFields(rr)
+
+	entityQuery := selectQuery{


Do we need SELECT FOR SHARE ? This would ensure we don't skip entities in the process of being committed. I suspect transaction t1 with revision_version=1 might be comited after t2 with revision_version=2 leading to a gap? If that's correct, using SELECT FOR SHARE would ensure we wait for all the in-progress transactions to finish before returning.

Interesting, I'll have to think about that. It's supported by MySQL and Postgres but not SQLite. It does seem like it shouldn't hurt anything if we added it via dialect.

pkg/services/store/entity/entity.proto

pkg/services/store/entity/sqlstash/sql_storage_server.go

pkg/services/store/entity/sqlstash/broadcaster.go

pkg/services/store/entity/sqlstash/sql_storage_server.go

radiohead · 2024-02-16T11:51:38Z

pkg/services/store/entity/sqlstash/sql_storage_server.go

+			// result doesn't match our watch params, skip it
+			if !watchMatches(r, result) {
+				s.log.Debug("watch result not matched", "guid", result.Guid, "action", result.Action, "rv", result.ResourceVersion)
+				break


If I understand the code correctly we should continue on to the next event instead of stopping the watch, shouldn't we?

Suggested change

break

continue

more or less the same impact since we go to the next loop of the for and call select again, but yes continue may make more sense

pkg/services/store/entity/sqlstash/sql_storage_server.go

diegommm · 2024-03-02T00:27:40Z

@DanCech Running it locally to test #83772 I found that watchInit appears to cache results, even for deleted entities, If I create and delete playlists and run kubectl --kubeconfig=./grafana.kubeconfig get --watch=true playlist I initially get all the CUD operations that happened since I started the server, but not if I kill it and run it again. I think that's related to watchInit and not to my changes, it's Friday and signing off for today, I'll try to get more details next week if possible, but just wanted to give the heads up.

DanCech · 2024-03-02T02:30:55Z

So when you run kubectl get with watch enabled, it performs list + watch, it'll first get the list of items, then start a watch from the resourceVersion returned by the list call. You can see what it's doing if you experiment with the -v param, eg -v6 will give you the primary request urls it uses, -v10 will echo details of all requests and responses.

What you may be seeing if you delete a playlist would be that it won't be returned in the list, but when the watch is started from the most recent resourceVersion returned by the list, that resourceVersion is older than the resourceVersion for some create, update and/or delete events for that playlist so they are included in the initial set of events returned in the watch.

There are a few ways we could deal with that, one may be to return a fresh snowflake in the list response rather than the most-recent resourceVersion of the items we're returning, which would mean that the watch would be started from the time of the list command rather than the time of the most recently created/updated item in the list.

DanCech · 2024-03-04T16:40:15Z

Unless we see any major blockers I'd like to get this PR merged and do any further cleanup in followup PRs, since it's already quite substantial and we are starting to accumulate PRs off this branch.

toddtreece · 2024-03-04T17:59:37Z

would mean that the watch would be started from the time of the list command rather than the time of the most recently created/updated item in the list

@DanCech it seems like i remember the etcd implementation doing something like this. is that what it does?

pkg/services/store/entity/sqlstash/sql_storage_server.go

Co-authored-by: Igor Suleymanov <radiohead@users.noreply.github.com>

grafana-delivery-bot bot added this to the 10.4.x milestone Feb 10, 2024

grafana-pr-automation bot added the area/backend label Feb 10, 2024

radiohead reviewed Feb 12, 2024

View reviewed changes

DanCech force-pushed the unified-storage-watch branch from 1f3d959 to 615597e Compare February 12, 2024 23:27

grafana-pr-automation bot added the area/backend/db/migration label Feb 12, 2024

radiohead reviewed Feb 13, 2024

View reviewed changes

pkg/services/store/entity/sqlstash/sql_storage_server.go Outdated Show resolved Hide resolved

chaudyg reviewed Feb 15, 2024

View reviewed changes

DanCech force-pushed the unified-storage-watch branch from 7d70f73 to aa641ba Compare February 15, 2024 22:37

DanCech marked this pull request as ready for review February 15, 2024 22:37

DanCech requested review from a team as code owners February 15, 2024 22:37

DanCech requested review from wbrowne, marefr and andresmgot and removed request for a team February 15, 2024 22:37

radiohead reviewed Feb 16, 2024

View reviewed changes

aangelisc modified the milestones: 10.4.x, 11.0.x Feb 20, 2024

DanCech force-pushed the unified-storage-watch branch from aa641ba to f81d028 Compare February 27, 2024 20:49

DanCech added add to changelog no-backport Skip backport of PR labels Feb 29, 2024

DanCech requested review from chaudyg, radiohead and toddtreece March 4, 2024 16:31

toddtreece force-pushed the unified-storage-watch branch from 4294c07 to 155ee64 Compare March 4, 2024 20:27

toddtreece reviewed Mar 4, 2024

View reviewed changes

pkg/services/store/entity/sqlstash/sql_storage_server.go Show resolved Hide resolved

DanCech and others added 15 commits March 5, 2024 09:31

initial naive implementation

5c319f2

Update pkg/services/store/entity/sqlstash/sql_storage_server.go

03a0722

Co-authored-by: Igor Suleymanov <radiohead@users.noreply.github.com>

tidy up

11150e2

add action column, batch watch events

f2b7236

initial implementation of broadcast-based watcher

fe40780

fix up watch init

6faa9a9

remove batching, it just adds needless complexity

06e21d3

use StreamWatcher

43e314f

make broadcaster generic

9524ffe

add circular buffer to replay recent events to new watchers

9b73bd4

loop within poll until all events are read

79c5455

add index on entity_history.resource_version to support poller

25da441

increment r.Since when we send events to consumer

3d78ff0

switch broadcaster and cache to use channels instead of mutexes

bc0a48f

cleanup

1400a7e

DanCech force-pushed the unified-storage-watch branch from 155ee64 to 1400a7e Compare March 5, 2024 14:31

diegommm approved these changes Mar 5, 2024

View reviewed changes

DanCech merged commit 7b4925e into main Mar 5, 2024
12 checks passed

DanCech deleted the unified-storage-watch branch March 5, 2024 15:14

fabrizio-grafana modified the milestones: 11.0.x, 11.0.0-preview Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage: Watch support #82282

Storage: Watch support #82282

DanCech commented Feb 10, 2024 •

edited

Loading

DanCech commented Feb 14, 2024

chaudyg Feb 15, 2024

DanCech Feb 15, 2024

radiohead Feb 16, 2024

DanCech Feb 16, 2024

diegommm commented Mar 2, 2024

DanCech commented Mar 2, 2024

DanCech commented Mar 4, 2024

toddtreece commented Mar 4, 2024

Storage: Watch support #82282

Storage: Watch support #82282

Conversation

DanCech commented Feb 10, 2024 • edited Loading

DanCech commented Feb 14, 2024

chaudyg Feb 15, 2024

Choose a reason for hiding this comment

DanCech Feb 15, 2024

Choose a reason for hiding this comment

radiohead Feb 16, 2024

Choose a reason for hiding this comment

DanCech Feb 16, 2024

Choose a reason for hiding this comment

diegommm commented Mar 2, 2024

DanCech commented Mar 2, 2024

DanCech commented Mar 4, 2024

toddtreece commented Mar 4, 2024

DanCech commented Feb 10, 2024 •

edited

Loading