Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline-manager: add metrics to compiler server #1031

Merged
merged 1 commit into from
Nov 17, 2023
Merged

Conversation

lalithsuresh
Copy link
Collaborator

@lalithsuresh lalithsuresh commented Nov 17, 2023

This commit starts adding metrics to the pipeline-manager components. We'll start with the compiler server for now. We'll expose histograms for compiler latencies as well as a counter for the number of invocations. These two metrics are faceted by two labels: the phase (SQL vs Rust) and the status (Success vs Error).

For now, we'll always host metrics via a scrape endpoint on a different port (0.0.0.0:9000/metrics) than the usual APIs (80/8080). This is to make sure that we can avoid exposing the scraping endpoint outside the docker ensemble or cluster in a real deployment.

Is this a user-visible change (yes/no): yes

@@ -51,6 +51,8 @@ refinery = {version = "0.8.10", features = ["tokio-postgres"]}
reqwest = {version = "0.11.18", features = ["json"]}
url = {version = "2.4.0"}
dirs = "5.0"
prometheus-client = "0.22.0"
lazy_static = "1.4.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invocations: Family::<MetricLabel, Counter>::default(),
latency: Family::<MetricLabel, Histogram>::new_with_constructor(|| {
// These are buckets for measuring SQL and rust compilation times
let buckets = [1.0, 5.0, 10.0, 20.0, 50.0, 100.0, 200.0, 400.0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do the numbers mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histogram CDF points. So if compilation takes say, 15 seconds, it'll add 1 to buckets 20, 50, 100, 200 and 400.

I don't have a good sense for what the most useful CDF points should be yet, I hope to tweak them as we go.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it maybe this can be a comment :)

Signed-off-by: Lalith Suresh <lalith@feldera.com>
@lalithsuresh lalithsuresh merged commit 7133ae4 into main Nov 17, 2023
5 checks passed
@lalithsuresh lalithsuresh deleted the metrics branch November 17, 2023 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants