Skip to content

Support async saving of hosts' last seen time#5640

Merged
mna merged 6 commits intofleetdm:mainfrom
mna:mna-5536-async-host-last-seen
May 10, 2022
Merged

Support async saving of hosts' last seen time#5640
mna merged 6 commits intofleetdm:mainfrom
mna:mna-5536-async-host-last-seen

Conversation

@mna
Copy link
Copy Markdown
Member

@mna mna commented May 9, 2022

#5536

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes (in changes/ and/or orbit/changes/).
  • Documented any API changes (docs/Using-Fleet/REST-API.md)
  • Documented any permissions changes
  • Ensured that input data is properly validated, SQL injection is prevented (using placeholders for values in statements)
  • Added support on fleet's osquery simulator cmd/osquery-perf for new osquery data ingestion features.
  • Added/updated tests
  • Manual QA for all new/changed functionality

Note: to be validated/tested in load testing environment, to see if this unblocks further scaling regarding number of hosts.

@mna mna force-pushed the mna-5536-async-host-last-seen branch from fb0d150 to 74fcc10 Compare May 9, 2022 15:23
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 9, 2022

Codecov Report

Merging #5640 (bf99af1) into main (4dd2e57) will decrease coverage by 0.00%.
The diff coverage is 64.19%.

@@            Coverage Diff             @@
##             main    #5640      +/-   ##
==========================================
- Coverage   58.16%   58.16%   -0.01%     
==========================================
  Files         358      359       +1     
  Lines       33028    33144     +116     
==========================================
+ Hits        19212    19279      +67     
- Misses      11888    11925      +37     
- Partials     1928     1940      +12     
Impacted Files Coverage Δ
cmd/fleet/serve.go 6.35% <0.00%> (-0.03%) ⬇️
server/service/async/async.go 50.94% <0.00%> (-3.79%) ⬇️
server/service/async/collect.go 71.73% <ø> (ø)
server/service/hosts.go 68.62% <ø> (-0.24%) ⬇️
server/service/osquery.go 72.46% <0.00%> (-0.32%) ⬇️
server/service/service.go 83.87% <ø> (-6.13%) ⬇️
server/service/async/async_host_seen.go 71.55% <71.55%> (ø)
cmd/fleetctl/testing_utils.go 100.00% <100.00%> (ø)
server/datastore/mysql/hosts.go 81.36% <100.00%> (ø)
server/service/async/async_label.go 75.31% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4dd2e57...bf99af1. Read the comment docs.

@mna mna marked this pull request as ready for review May 9, 2022 15:51
@mna mna requested a review from a team as a code owner May 9, 2022 15:51
Comment on lines +23 to +26
if !t.AsyncEnabled {
t.seenHostSet.addHostID(hostID)
return nil
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the caller do this. It would be more readable and it's a bit unexpected that async.Task.SomeMethod could be doing it synchronously depending on config/creation-arg (same case with labels and policies).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way you also don't need to define "synchronized" related stuff on the async package (// seenHostSet implements synchronized storage for the set of seen hosts.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also apply to FlushHostsLastSeen below? I kind of like the idea of either completely separating "syncronized" vs "non-syncronized" code OR keeping all the !task.AsyncEnabled checks in a centralized place as much as possible (eg: moving the check that is done in serve.go here)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not ideal API-wise, but it's quite useful as a way to have the full feature implemented in one location instead of scattered across files/packages. To be honest, I don't think it's really cleaner to have this condition check at the call-site. It's also how the other 2 async features are implemented, so for consistency I think it's best for all of them to take full "ownership" of the feature. Maybe the bigger issue is the naming instead of the code separation, though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(+1 to this being a naming issue, that was my first thought)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also apply to FlushHostsLastSeen below? I kind of like the idea of either completely separating "syncronized" vs "non-syncronized" code OR keeping all the !task.AsyncEnabled checks in a centralized place as much as possible (eg: moving the check that is done in serve.go here)

Yeah I'd like that too, although in that case I wanted to avoid creating a goroutine for nothing, hence the check inside serve.go. I don't know, it wouldn't be a big deal either to have it start and run a no-op every 1-10s, let me know how y'all feel about it, I don't mind either way!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd trust your instinct! the code as-is LGTM!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree about the naming issue, I'll address this in a subsequent PR (and suggestions welcome, I think task or job is too generic, asyncable? it conveys that it can be but is not necessarily async?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to naming change for now (And doing it on a separate PR sounds good too.)

name: "collect_last_seen",
pool: t.Pool,
ds: t.Datastore,
execInterval: t.CollectorInterval,
Copy link
Copy Markdown
Member

@lucasmrod lucasmrod May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense for host seen times to have same collector interval as the other two? (30s is the default IIRC). Maybe we should allow user to configure per-feature?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah eventually I think we may want to expand the configs so each async feature has its own options (at least for the interval), but for now I think it's ok at least to try this out in load testing and see if we see the gains we expect.

Copy link
Copy Markdown
Member

@lucasmrod lucasmrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a couple of comments.

@Desmi-Dizney
Copy link
Copy Markdown
Contributor

Desmi-Dizney added a commit that referenced this pull request May 17, 2022
noahtalerman pushed a commit that referenced this pull request May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants