Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add proposal of Kmesh observability #337

Merged
merged 8 commits into from
Jun 11, 2024

Conversation

LiZhenCheng9527
Copy link
Collaborator

@LiZhenCheng9527 LiZhenCheng9527 commented May 18, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:
a part of #320

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>
@LiZhenCheng9527
Copy link
Collaborator Author

/hold
Discuss the metrics that need to be achieved first, then add detail design with @lec-bit.

Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>
Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>
Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>

As shown in the follow figure, Kmesh's Ads controller passes the access log filter to the Accesslog Controller to parse out the access log format.

Then according to the configuration, monitor the workload, collect the metrics to generate access log, and store the generated access log locally or send it to istiod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be you miss some details here, how can theser metrics recorded. How can we correlate a metric with a connection/req, i am asking becuase accesslog is one per request for http. and one per connection for tcp

@hzxuzhonghu
Copy link
Member

cc @zirain


The telemetry controller combines the metrics into an access log based on the access log format, and puts it in the envoyfilter to return to the control plane.

For metric features, provide 15001 port for Prometheus queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For metric features, provide 15001 port for Prometheus queries.
For metric features, provide 15020 port for Prometheus queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTY, it should be configurable and support prometheus annotations.


The xDS controller fetches the access log format and metric configurations from it. Then get the corresponding metrics from the workloads.

The telemetry controller combines the metrics into an access log based on the access log format, and puts it in the envoyfilter to return to the control plane.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is exporter a better name?

@zirain
Copy link
Contributor

zirain commented May 24, 2024

Another brainstorming: consider support OpenTelemetry with push mode?

Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>
@codecov-commenter
Copy link

codecov-commenter commented May 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Flag Coverage Δ
unittests 31.10% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>
Comment on lines +86 to +91
connection_opens: The total number of TCP connections opened
connection_close: The total number of TCP connections closed
received_bytes: The size of total bytes received during request in case of a TCP connection
sent_bytes: The size of total bytes sent during response in case of a TCP connection
on_demand_dns: The total number of requests that used on-demand DNS (unstable)
on_demand_dns_cache_misses: The total number of cache misses for requests on-demand DNS (unstable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some metrics like sent_bytes should have some labels, please double check what does it look like, and socument it here

Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>
Signed-off-by: LiZhenCheng9527 <lizhencheng6@huawei.com>
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot merged commit f396556 into kmesh-net:main Jun 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants