Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(source): add metrics for throughput of each partition #3505

Merged
merged 9 commits into from
Jun 28, 2022

Conversation

tabVersion
Copy link
Contributor

@tabVersion tabVersion commented Jun 27, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

as title, the metric records throughput before data sending to channel in connector source, which indicates the max throughput of source.

  • Summarize your change (mandatory)
  • How does this PR work? Need a brief introduction for the changed logic (optional)
  • Describe clearly one logical change and avoid lazy messages (optional)
  • Describe any limitations of the current code (optional)
  • Add the 'user-facing changes' label if your PR contains changes that are visible to users (optional)

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

close #3504

Signed-off-by: tabVersion <tabvision@bupt.icu>
Signed-off-by: tabVersion <tabvision@bupt.icu>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 879 files.

Valid Invalid Ignored Fixed
876 2 1 0
Click to see the invalid file list
  • src/source/src/monitor/metrics.rs
  • src/source/src/monitor/mod.rs

src/source/src/monitor/metrics.rs Show resolved Hide resolved
src/source/src/monitor/mod.rs Show resolved Hide resolved
Signed-off-by: tabVersion <tabvision@bupt.icu>
.stream_reader(
state,
self.column_ids.clone(),
self.metrics.registry.clone(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wcy-fdu I had some trouble with building SourceMetrics by this registry. pls take a look

@skyzh
Copy link
Contributor

skyzh commented Jun 27, 2022

Is this metrics available in redpanda / kafka-exporter? I wonder if it is necessary also to monitor on our side… Rest LGTM

Copy link
Contributor

@Sunt-ing Sunt-ing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM!

self.metrics
.partition_input_count
.with_label_values(&[
self.context.actor_id.to_string().as_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce overhead, a new field "actor_id_str" can be added to self.context and then used here. In this way, frequent type conversions can be avoided.

.partition_input_count
.with_label_values(&[
self.context.actor_id.to_string().as_str(),
self.context.source_id.to_string().as_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and used here

.with_label_values(&[
self.context.actor_id.to_string().as_str(),
self.context.source_id.to_string().as_str(),
id.as_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar change can be done here

@Sunt-ing
Copy link
Contributor

Sunt-ing commented Jun 27, 2022

It's normal for different sources to have different metrics. Take Flink connector metrics as an example: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#connectors

@tabVersion
Copy link
Contributor Author

tabVersion commented Jun 27, 2022

Is this metrics available in redpanda / kafka-exporter? I wonder if it is necessary also to monitor on our side

Sure kafka provides metrics like this but currently, we do not commit to kafka so it does not know the consumption process. cc @shanicky

Besides, there exists a bounded channel that involves tokio scheduling issues to the throughput so I think it is not appropriate to use source_output_row_count, counted after the channel, as the ideal throughput of our system.

Copy link
Contributor

@KeXiangWang KeXiangWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: tabVersion <tabvision@bupt.icu>
Signed-off-by: tabVersion <tabvision@bupt.icu>
@codecov
Copy link

codecov bot commented Jun 28, 2022

Codecov Report

Merging #3505 (39f3107) into main (6281eab) will increase coverage by 0.00%.
The diff coverage is 79.26%.

@@           Coverage Diff           @@
##             main    #3505   +/-   ##
=======================================
  Coverage   74.48%   74.49%           
=======================================
  Files         770      771    +1     
  Lines      108140   108209   +69     
=======================================
+ Hits        80547    80605   +58     
- Misses      27593    27604   +11     
Flag Coverage Δ
rust 74.49% <79.26%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/compute/src/server.rs 0.00% <0.00%> (ø)
src/source/src/lib.rs 88.88% <ø> (ø)
src/source/src/manager.rs 74.75% <50.00%> (-0.13%) ⬇️
src/source/src/monitor/metrics.rs 80.00% <80.00%> (ø)
src/source/src/connector_source.rs 71.22% <82.00%> (+2.62%) ⬆️
src/stream/src/executor/source.rs 95.41% <100.00%> (+0.04%) ⬆️
src/frontend/src/expr/utils.rs 98.99% <0.00%> (ø)
src/connector/src/filesystem/file_common.rs 80.80% <0.00%> (+0.44%) ⬆️
src/meta/src/manager/id.rs 95.50% <0.00%> (+0.56%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Signed-off-by: tabVersion <tabvision@bupt.icu>
@tabVersion tabVersion added mergify/can-merge Indicates that the PR can be added to the merge queue and removed mergify/can-merge Indicates that the PR can be added to the merge queue labels Jun 28, 2022
Signed-off-by: tabVersion <tabvision@bupt.icu>
@tabVersion tabVersion added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jun 28, 2022
tabVersion and others added 2 commits June 28, 2022 20:48
Signed-off-by: tabVersion <tabvision@bupt.icu>
@mergify mergify bot merged commit 1ee3dcf into main Jun 28, 2022
@mergify mergify bot deleted the tab/source-input-rate branch June 28, 2022 13:14
@@ -369,6 +369,9 @@ def section_streaming(panels):
panels.target(
"rate(stream_source_output_rows_counts[15s])", "source_id = {{source_id}}"
),
panels.target(
"rate(partition_input_count[5s])", "{{actor_id}}-{{source_id}}-{{partition}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5s doesn't make much sense as our collect interval is by default set to 15s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

metric for input rate
4 participants