Serialize hosts writes per instance #2753

chiiph · 2021-11-01T11:21:56Z

Checklist for submitter

If some of the following don't apply, delete the relevant line.

Changes file added (for user-visible changes)
~~Documented any API changes (docs/01-Using-Fleet/03-REST-API.md)~~
~~Documented any permissions changes~~
Added/updated tests
Manual QA for all new/changed functionality

mna

Commenting on the general approach, not the details as this is still a wip, my main concern would be that the goroutine can't process fast enough and requests start piling up and waiting for the context to expire (do we set a timeout on request handlers? i.e. would it ever expire?), consuming resources on the fleet server during that time. Saving a host stores a lot of data, especially when there are packs/software/additional data/users involved.

There's also a lot of similarity with the async processing work - this is basically addressing the same issues but with a different solution, in the async mode we aim to process requests very fast by storing temporarily to redis and asynchronously processing the save to mysql in a controlled manner, here we queue the requests until they get a chance to save to mysql (or time out waiting).

That being said, while I think the MarkHostsSeen processing would fit well in the same async approach as labels and policies (small payload), I don't think the same can be said of SaveHosts, it would be quite a lot to store to redis even temporarily, our load testing already shows that it takes a relatively big redis cluster to handle labels (not so much due to size of the payload in memory, but due to high workload of the redis process, which would be impacted even more with the size of the payload of the save hosts processing).

So I think this PR's approach would be worth a shot, but keeping an eye on the impact of the idle waiting of the requests. But it seems like more and more, we see the need for a real queue to decouple requests from processing (which gocloud makes relatively easy to support in an abstract way so that you don't force a specific provider, with implementations for aws sqs, GCP, Azure, etc.: https://pkg.go.dev/gocloud.dev/pubsub).

chiiph · 2021-11-01T14:31:55Z

my main concern would be that the goroutine can't process fast enough and requests start piling up and waiting for the context to expire

Yeah, this is my main concern as well (plus testing would be pretty annoying, unless we make it configurable). The first approach didn't have the async save, so I added it with a counter to see how things look. If it works well, then the question would be how to make it solid enough that it would pile up. We can probably make it all configurable at least.

This reverts commit 4a890c5.

server/datastore/mysql/hosts.go

server/service/service_osquery.go

chiiph · 2021-11-01T19:42:34Z

I cleaned it up a bit and made this approach configurable. For the time being, I haven't added any tests for this approach. I will if tomorrow's call results in us going for this.

server/datastore/mysql/hosts.go

server/datastore/mysql/mysql.go

cmd/fleet/serve.go

server/datastore/mysql/hosts.go

server/datastore/mysql/labels_test.go

server/datastore/mysql/migrations/tables/20211102135149_AddHostSeenTimesTable.go

server/datastore/mysql/policies.go

server/service/service_osquery.go

server/datastore/mysql/policies.go

chiiph · 2021-11-05T11:32:50Z

server/datastore/mysql/hosts.go

+		errCh: errCh,
+		item:  host,
+	}:
+		return <-errCh


@lucasmrod @mna after a convo with Lucas, he helped me see that I could remove the select from here, and defer close errCh above. Just mentioning explicitly to bring more eyes here.

Last note on my end, given latest state of the code, we don't need a buffered errCh channel anymore (the creator will always read it after writing to writeCh, but!... maybe it makes sense in the scenario where there's high load to not block the serializing goroutine...). So am ok with leaving as-is :)

Yes that works, it didn't work in the previous implementation as the context could be cancelled and cause the function to return between the write to writeCh and the read from errCh. Now, it is guaranteed not to return before the other goroutine writes to errCh, though note that it means the context can only unblock the call if it expires before the write to writeCh. After that, it can block indefinitely waiting for the result from errCh. This should be fine though, as the item's ctx is also used in the other goroutine to stop the database actions early if necessary.

server/datastore/mysql/hosts.go

lucasmrod · 2021-11-05T12:40:04Z

server/datastore/mysql/hosts.go

+	}
+	defer rows.Close()
+
+	for rows.Next() {


<Just a nit, as this PR has been up for too long now :)>

Small improvement to not mix row scanning + deletion in the same for loop:

Make a helper function that calls the SELECT and returns a slice of ids (the rows.Err() + rows.Close() would get call when the function returns).

Then on the for loop here you iterate such returned slice of ids and call DELETE on them.

The reasoning behind the scan+delete in a loop is to not store potentially a silly amount of ids in memory, though.

server/datastore/mysql/migrations/tables/20211102135149_AddHostSeenTimesTable.go

mna

I'd be a bit concerned about stalling requests waiting on write to writeCh, but load tests have shown that it works quite nicely and fast enough so 👍 !

mna · 2021-11-08T13:36:40Z

server/datastore/mysql/hosts.go

+		errCh: errCh,
+		item:  host,
+	}:
+		return <-errCh


Yes that works, it didn't work in the previous implementation as the context could be cancelled and cause the function to return between the write to writeCh and the read from errCh. Now, it is guaranteed not to return before the other goroutine writes to errCh, though note that it means the context can only unblock the call if it expires before the write to writeCh. After that, it can block indefinitely waiting for the result from errCh. This should be fine though, as the item's ctx is also used in the other goroutine to stop the database actions early if necessary.

Serialize hosts writes per instance

f05a829

chiiph temporarily deployed to Docker Hub November 1, 2021 11:22 Inactive

Write hosts asynchronously

296c3c3

chiiph temporarily deployed to Docker Hub November 1, 2021 13:38 Inactive

mna reviewed Nov 1, 2021

View reviewed changes

chiiph added 3 commits November 1, 2021 12:54

Dont make the save in a goroutine

4a890c5

Revert "Dont make the save in a goroutine"

dd34ecd

This reverts commit 4a890c5.

Make all savehosts async

53788bf

chiiph temporarily deployed to Docker Hub November 1, 2021 16:16 Inactive

lucasmrod reviewed Nov 1, 2021

View reviewed changes

server/datastore/mysql/hosts.go Outdated Show resolved Hide resolved

lucasmrod reviewed Nov 1, 2021

View reviewed changes

server/datastore/mysql/hosts.go Outdated Show resolved Hide resolved

mna reviewed Nov 1, 2021

View reviewed changes

server/service/service_osquery.go Outdated Show resolved Hide resolved

Address review comments and make this approach configurable

5c8f7bf

chiiph temporarily deployed to Docker Hub November 1, 2021 19:38 Inactive

mna reviewed Nov 1, 2021

View reviewed changes

server/datastore/mysql/hosts.go Outdated Show resolved Hide resolved

server/datastore/mysql/mysql.go Show resolved Hide resolved

Address review comments

d85c85e

chiiph temporarily deployed to Docker Hub November 1, 2021 19:59 Inactive

Disable bulk seen time marking for a test

0b0721a

chiiph temporarily deployed to Docker Hub November 2, 2021 13:07 Inactive

chiiph temporarily deployed to Docker Hub November 2, 2021 13:08 Inactive

Move host seen times to a new table

9b03953

chiiph temporarily deployed to Docker Hub November 2, 2021 17:17 Inactive

chiiph temporarily deployed to Docker Hub November 2, 2021 17:18 Inactive

chiiph temporarily deployed to Docker Hub November 4, 2021 19:21 Inactive

lucasmrod reviewed Nov 4, 2021

View reviewed changes

cmd/fleet/serve.go Show resolved Hide resolved

lucasmrod reviewed Nov 4, 2021

View reviewed changes

server/datastore/mysql/hosts.go Outdated Show resolved Hide resolved

lucasmrod reviewed Nov 4, 2021

View reviewed changes

server/datastore/mysql/labels_test.go Outdated Show resolved Hide resolved

lucasmrod reviewed Nov 4, 2021

View reviewed changes

server/datastore/mysql/migrations/tables/20211102135149_AddHostSeenTimesTable.go Show resolved Hide resolved

lucasmrod reviewed Nov 4, 2021

View reviewed changes

server/datastore/mysql/policies.go Show resolved Hide resolved

lucasmrod reviewed Nov 4, 2021

View reviewed changes

server/service/service_osquery.go Outdated Show resolved Hide resolved

lucasmrod reviewed Nov 4, 2021

View reviewed changes

server/datastore/mysql/policies.go Outdated Show resolved Hide resolved

Address review comments and remove fk constraints in host software

98149a9

chiiph temporarily deployed to Docker Hub November 5, 2021 11:29 Inactive

Make errCh buffered

c445516

chiiph temporarily deployed to Docker Hub November 5, 2021 11:31 Inactive

chiiph commented Nov 5, 2021

View reviewed changes

lucasmrod reviewed Nov 5, 2021

View reviewed changes

server/datastore/mysql/hosts.go Show resolved Hide resolved

lucasmrod reviewed Nov 5, 2021

View reviewed changes

Add changes file

0629c9d

chiiph temporarily deployed to Docker Hub November 5, 2021 12:48 Inactive

lucasmrod reviewed Nov 5, 2021

View reviewed changes

server/datastore/mysql/migrations/tables/20211102135149_AddHostSeenTimesTable.go Outdated Show resolved Hide resolved

Readd key

d82ef45

chiiph temporarily deployed to Docker Hub November 5, 2021 13:57 Inactive

chiiph requested review from lucasmrod and mna November 5, 2021 14:14

lucasmrod approved these changes Nov 5, 2021

View reviewed changes

mna approved these changes Nov 8, 2021

View reviewed changes

chiiph merged commit 7db6de7 into main Nov 8, 2021

chiiph deleted the serialize-host-writes branch November 8, 2021 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize hosts writes per instance #2753

Serialize hosts writes per instance #2753

chiiph commented Nov 1, 2021 •

edited

Loading

mna left a comment

chiiph commented Nov 1, 2021

chiiph commented Nov 1, 2021

chiiph Nov 5, 2021

lucasmrod Nov 5, 2021

mna Nov 8, 2021

lucasmrod Nov 5, 2021

chiiph Nov 5, 2021

mna left a comment

mna Nov 8, 2021

Serialize hosts writes per instance #2753

Serialize hosts writes per instance #2753

Conversation

chiiph commented Nov 1, 2021 • edited Loading

Checklist for submitter

mna left a comment

Choose a reason for hiding this comment

chiiph commented Nov 1, 2021

chiiph commented Nov 1, 2021

chiiph Nov 5, 2021

Choose a reason for hiding this comment

lucasmrod Nov 5, 2021

Choose a reason for hiding this comment

mna Nov 8, 2021

Choose a reason for hiding this comment

lucasmrod Nov 5, 2021

Choose a reason for hiding this comment

chiiph Nov 5, 2021

Choose a reason for hiding this comment

mna left a comment

Choose a reason for hiding this comment

mna Nov 8, 2021

Choose a reason for hiding this comment

chiiph commented Nov 1, 2021 •

edited

Loading