New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report resource usage counts by handling heartbeat events #35968
Conversation
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
457c248
to
f37ed08
Compare
for _, item := range result.Items { | ||
report := &prehogv1.ResourcePresenceReport{} | ||
if err := proto.Unmarshal(item.Value, report); err != nil { | ||
return nil, trace.Wrap(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to abort the listing operation if there is one bad report in storage? Can we log the failure and keep trying the rest of the reports instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, but if we make that change we should do it for user activity reports too, so for now I think it's best to be consistent with current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably something we want to look into. I know we've had various bugs caused by getting resources failing due to one bad resource aborting the entire operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing usage data submission will result in a cluster alert, which will hopefully prompt the customer into calling us. That'd still work if we skipped over invalid data, but we would need to tweak the logic around creating and deleting the cluster alerts (to still create one), since we don't want to keep ignoring some logic bug.
f37ed08
to
a03f93d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do-not-merge
because the .protos need to be updated in cloud master first, then copied here, other than that LGTM.
Protos were already merged in cloud master (see https://github.com/gravitational/cloud/pull/6823) but I pulled the latest in here (only differences were in comments). |
6e930ff
to
89c35c9
Compare
cc @timothyb89 - this pulls in some of your new bot protos. Let me know if that's okay. |
I think it's fine, it'll just be competing with #35881 so one of us will have a conflict to resolve 🙂 |
…se-anon-key * origin/master: (344 commits) Undelete CreateHostUserMode_HOST_USER_MODE_DROP (gravitational#36273) allow cwd to be changed in difftest (gravitational#35946) Auth device list component (gravitational#36235) make unified resources responsive (gravitational#35961) Support running Teleport in a "hot reload" mode (gravitational#35040) Prevent deleting enum values, allow deleting enum reservations in types.proto (gravitational#36248) Remove support for legacy (Amazon Linux 2) AMIs (gravitational#36153) Bump version(s) used for teleport-lab and teleport-quickstart (gravitational#36167) Allow Reconciler update handler to examine old value during update (gravitational#36171) Validate the user still exists during account reset (gravitational#35676) ButtonTextWithAddIcon shared component (gravitational#36103) Refactor hostname resolution for SSH connections via the WebUI (gravitational#35773) add structuredClone to jest JSDOMEnvironment (gravitational#36213) fix flaky `lib/auth` cache-enabled tests (gravitational#36216) Report resource usage counts by handling heartbeat events (gravitational#35968) Reviewer bot should use the stable version of Go (gravitational#36242) RFD 0153 Resource Guidelines (gravitational#34103) Use cmp and cmpots properly in operator tests (gravitational#36215) Relax Kubernetes CRD discovery when building cache (gravitational#36214) Add Access List messages to TAG protobuf (gravitational#36176) ...
Buddy PR for #34954
Closes #34954