Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): add sampled (de)serialization duration metrics for RPC #3618

Merged
merged 13 commits into from
Jul 4, 2022

Conversation

Sunt-ing
Copy link
Contributor

@Sunt-ing Sunt-ing commented Jul 4, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!
Add sampled (de)serialization duration metrics for RPC with the given frequency

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#3284

@Sunt-ing
Copy link
Contributor Author

Sunt-ing commented Jul 4, 2022

Bench result in 3-compute-node mode:
image

@codecov
Copy link

codecov bot commented Jul 4, 2022

Codecov Report

Merging #3618 (903a142) into main (70a281a) will decrease coverage by 0.01%.
The diff coverage is 59.61%.

@@            Coverage Diff             @@
##             main    #3618      +/-   ##
==========================================
- Coverage   74.38%   74.37%   -0.02%     
==========================================
  Files         776      776              
  Lines      110110   110155      +45     
==========================================
+ Hits        81905    81926      +21     
- Misses      28205    28229      +24     
Flag Coverage Δ
rust 74.37% <59.61%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/compute/src/rpc/service/exchange_metrics.rs 0.00% <0.00%> (ø)
src/compute/src/rpc/service/exchange_service.rs 0.00% <0.00%> (ø)
src/stream/src/executor/merge.rs 92.33% <100.00%> (+0.41%) ⬆️
src/stream/src/executor/monitor/streaming_stats.rs 100.00% <100.00%> (ø)
src/meta/src/hummock/mock_hummock_meta_client.rs 40.56% <0.00%> (-0.95%) ⬇️
src/meta/src/manager/id.rs 94.94% <0.00%> (-0.57%) ⬇️
src/connector/src/filesystem/file_common.rs 80.35% <0.00%> (-0.45%) ⬇️
src/frontend/src/expr/utils.rs 98.99% <0.00%> (-0.26%) ⬇️
src/storage/src/hummock/local_version_manager.rs 81.38% <0.00%> (-0.12%) ⬇️
src/storage/src/hummock/iterator/merge_inner.rs 90.00% <0.00%> (+0.62%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this metrics is only used within scale service, and should not be put in dashboard.

grafana/risingwave-dashboard.py Outdated Show resolved Hide resolved
@Sunt-ing
Copy link
Contributor Author

Sunt-ing commented Jul 4, 2022

No metric is used in all services. And since the panels of these two metrics locate in the "streaming actors" section, where most metrics that the scaling service needs locate, developers who do not care for scaling service can easily fold and unsee this section.

@skyzh
Copy link
Contributor

skyzh commented Jul 4, 2022

The metrics you added do not convey any actual meaning. If developers observe "30ms" from the metrics, what can they know about the system?

@skyzh
Copy link
Contributor

skyzh commented Jul 4, 2022

... and thus it's better not to add it to dashboard.

@Sunt-ing
Copy link
Contributor Author

Sunt-ing commented Jul 4, 2022

IMO, since sampling is used, the important information is not the accurate number but the existence and trend. If the graph of this metric exists, developers can know that this metric is not been disrupted. If this metric is growing, developers can know how the (de)serialization durations change.
Also, IMO this Grafana panel is temporal and will be removed after the Orchestrator is fully built. And that, we may use other methods to observe it.

@skyzh
Copy link
Contributor

skyzh commented Jul 4, 2022

IMO, since sampling is used, the important information is not the accurate number but the existence and trend.

If we want to explore trend, we should use rate function.

If this metric is growing, developers can know how the (de)serialization durations change.

It will change as long as there's an exchange. There's no need to have yet another metrics to prove a fact.

Also, IMO this Grafana panel is temporal and will be removed after the Orchestrator is fully built. And that, we may use other methods to observe it.

You can just leave it on your local branch, instead of merging into main.

@Sunt-ing
Copy link
Contributor Author

Sunt-ing commented Jul 4, 2022

With the rapid development, rebase can be miserable. Also, others who also develop the scaling service can benefit from it. I don't think this brings more overhead than its benefits.

@skyzh
Copy link
Contributor

skyzh commented Jul 4, 2022

I'd only approve for the changes of metrics and not the dashboard. You can find another reviewer who agree on you.

@Sunt-ing
Copy link
Contributor Author

Sunt-ing commented Jul 4, 2022

Thanks for your review of the changes of metrics.

@Sunt-ing Sunt-ing enabled auto-merge (squash) July 4, 2022 12:08
@Sunt-ing Sunt-ing merged commit 927fae2 into main Jul 4, 2022
@Sunt-ing Sunt-ing deleted the sunt_serde_duration_metrics branch July 4, 2022 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants