Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watcher System Metrics #8338

Merged
merged 6 commits into from
Sep 28, 2021
Merged

Watcher System Metrics #8338

merged 6 commits into from
Sep 28, 2021

Conversation

rosstimothy
Copy link
Contributor

Purpose

Currently we have no insight into which events are being emitted, how often they are being emitted, and what the size of an event is. This makes it difficult to determine the source of high network utilization issues. By exposing metrics for the watcher system and consuming them on a new pane in tctl top we can get a real time view of the events being emitted.

Implementation

  • Expose Prometheus metrics in the Auth GRPC Server
    -- record events in WatchEvents streaming GRPC method before we Send on the stream
    -- use Event.Size() to estimate the amount of data being sent
  • Add a new pane to tctl top
    2021-09-21_17-07

@russjones
Copy link
Contributor

@fspmarshall @andrejtokarcik Can you two review?

Copy link
Contributor

@fspmarshall fspmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the watcherEventsEmitted vec is reigned in a bit.

lib/auth/grpcserver.go Outdated Show resolved Hide resolved
tool/tctl/common/top_command.go Outdated Show resolved Hide resolved
tool/tctl/common/top_command.go Outdated Show resolved Hide resolved
tool/tctl/common/top_command.go Outdated Show resolved Hide resolved
tool/tctl/common/top_command.go Outdated Show resolved Hide resolved
Count: int64(bucket.GetCumulativeCount()),
UpperBound: bucket.GetUpperBound(),
})
}
}
return out
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal but your new additions reveal getComponentHistogram and getHistogram share most of the code, making their respective purpose/differences unnecessarily unclear. I suppose they could be nicely unified by means of a third function with the signature func(metric *dto.MetricFamily, hist *dto.Histogram) Histogram.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to getHistogramWithFilter(metric *dto.MetricFamily, histogramFilter func(metrics []*dto.Metric) *dto.Histogram) Histogram to unite the shared code in one place and allow callers to provide filtering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this solution doesn't really seem to make the code more readable/succinct. Feel free to roll back to the original variant, as you prefer.

tool/tctl/common/top_command.go Outdated Show resolved Hide resolved
tool/tctl/common/top_command.go Outdated Show resolved Hide resolved
@rosstimothy rosstimothy force-pushed the tross/event-metrics branch 2 times, most recently from ee32bb6 to 582a3b7 Compare September 27, 2021 13:28
lib/utils/circular_buffer.go Outdated Show resolved Hide resolved
lib/utils/circular_buffer_test.go Outdated Show resolved Hide resolved
lib/utils/circular_buffer.go Outdated Show resolved Hide resolved
lib/utils/circular_buffer_test.go Show resolved Hide resolved
Count: int64(bucket.GetCumulativeCount()),
UpperBound: bucket.GetUpperBound(),
})
}
}
return out
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this solution doesn't really seem to make the code more readable/succinct. Feel free to roll back to the original variant, as you prefer.

lib/utils/circular_buffer_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@russjones russjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot.

@rosstimothy rosstimothy merged commit fb0ab2b into master Sep 28, 2021
@rosstimothy rosstimothy deleted the tross/event-metrics branch September 28, 2021 16:16
zmb3 pushed a commit that referenced this pull request Sep 28, 2021
* add event watcher prometheus metrics and a new tctl top tab to visualize them
rosstimothy added a commit that referenced this pull request Sep 29, 2021
* add event watcher prometheus metrics and a new tctl top tab to visualize them

(cherry picked from commit fb0ab2b)
rosstimothy added a commit that referenced this pull request Sep 29, 2021
* add event watcher prometheus metrics and a new tctl top tab to visualize them

(cherry picked from commit fb0ab2b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants