Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: change metric cache to tsdb #1228

Merged
merged 1 commit into from May 9, 2023
Merged

koordlet: change metric cache to tsdb #1228

merged 1 commit into from May 9, 2023

Conversation

zwzhang0107
Copy link
Contributor

@zwzhang0107 zwzhang0107 commented Apr 18, 2023

Ⅰ. Describe what this PR does

metric cache is a module for metric data persistence in koordlet.
Currently, it use sqlite3 as the storage, which is an structured SQL-like database.
sqlite3 shows poor performance for time-series data like resource metrics, consuming more space and slow on query.
Besides, as an structured database, tables and interfaces must be added when adding new metrics on metric cache, which shows bad expansibility。
We plan to replace sqlite3 with tsdb for better performance and extension.

This PR finish two jobs:

  • Finish the tsdb related framework in metric cache, providing append and query interface
  • Replace cpu throttled metric storage with tsdb and adapt the QoS plugin cpu_burst and Collector pod throttled with new interface.

Ⅱ. Does this pull request fix one issue?

see #586

Ⅲ. Describe how to verify it

Resource Consumption Comparation: before & after pod & container throttled ratio saved to tsdb.

We deploy koordlet v1.2 and latest version with tsdb refactor on the same node and start 60 pods on it for data collecting.

In v1.2, only latest 30min metrics are saved in sqlite, in latest version the time range is extended to 24h by default.

image

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@zwzhang0107
Copy link
Contributor Author

/hold

@zwzhang0107
Copy link
Contributor Author

/hold cancel

@codecov
Copy link

codecov bot commented Apr 23, 2023

Codecov Report

Patch coverage: 65.84% and project coverage change: +0.06 🎉

Comparison is base (6a6c6dd) 64.90% compared to head (ce1e9a8) 64.96%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1228      +/-   ##
==========================================
+ Coverage   64.90%   64.96%   +0.06%     
==========================================
  Files         312      313       +1     
  Lines       32660    32938     +278     
==========================================
+ Hits        21197    21398     +201     
- Misses       9916     9981      +65     
- Partials     1547     1559      +12     
Flag Coverage Δ
unittests 64.96% <65.84%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/koordlet/metriccache/metric_cache.go 50.98% <0.00%> (-6.05%) ⬇️
.../koordlet/qosmanager/metricsquery/metrics_query.go 64.04% <ø> (+10.27%) ⬆️
pkg/koordlet/resmanager/cpu_evict.go 74.50% <0.00%> (ø)
pkg/koordlet/resmanager/memory_evict.go 70.08% <0.00%> (ø)
pkg/koordlet/resmanager/cpu_burst.go 72.50% <23.07%> (-0.82%) ⬇️
pkg/koordlet/metriccache/util.go 70.61% <50.00%> (-1.86%) ⬇️
pkg/koordlet/resmanager/metrics_query.go 64.35% <50.00%> (+2.11%) ⬆️
pkg/koordlet/metriccache/metric_result.go 61.29% <61.29%> (ø)
...collectors/podthrottled/pod_throttled_collector.go 51.66% <62.50%> (ø)
pkg/koordlet/metriccache/tsdb_storage.go 70.58% <70.58%> (ø)
... and 2 more

... and 30 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jasonliu747
Copy link
Member

/cc @LambdaHJ

@koordinator-bot koordinator-bot bot requested a review from LambdaHJ April 28, 2023 04:02
pkg/koordlet/metriccache/metric_types.go Outdated Show resolved Hide resolved
pkg/koordlet/metriccache/metric_types.go Outdated Show resolved Hide resolved
pkg/koordlet/metriccache/metric_types.go Outdated Show resolved Hide resolved
pkg/koordlet/metriccache/metric_resources.go Outdated Show resolved Hide resolved
Copy link
Contributor

@LambdaHJ LambdaHJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

config/manager/koordlet.yaml Outdated Show resolved Hide resolved
Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
@koordinator-bot koordinator-bot bot removed the lgtm label May 9, 2023
@koordinator-bot
Copy link

New changes are detected. LGTM label has been removed.

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes, LambdaHJ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hormes hormes added the lgtm label May 9, 2023
@koordinator-bot koordinator-bot bot merged commit f377ce8 into koordinator-sh:main May 9, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants