Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: add more metrics for cache and search #31777

Merged
merged 3 commits into from
Apr 10, 2024

Conversation

chyezh
Copy link
Contributor

@chyezh chyezh commented Apr 1, 2024

issue: #30931

@sre-ci-robot sre-ci-robot added the size/XL Denotes a PR that changes 500-999 lines. label Apr 1, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Apr 1, 2024
Copy link
Contributor

mergify bot commented Apr 1, 2024

@chyezh ut workflow job failed, comment rerun ut can trigger the job again.

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

Attention: Patch coverage is 89.10506% with 28 lines in your changes are missing coverage. Please review.

Project coverage is 81.56%. Comparing base (1b76766) to head (ee01639).
Report is 11 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #31777      +/-   ##
==========================================
- Coverage   81.63%   81.56%   -0.08%     
==========================================
  Files         978      990      +12     
  Lines      121665   121680      +15     
==========================================
- Hits        99326    99251      -75     
- Misses      18525    18611      +86     
- Partials     3814     3818       +4     
Files Coverage Δ
internal/querynodev2/segments/utils.go 11.81% <100.00%> (+1.63%) ⬆️
internal/querynodev2/tasks/query_task.go 92.85% <100.00%> (+0.26%) ⬆️
internal/querynodev2/tasks/search_task.go 90.78% <100.00%> (+6.34%) ⬆️
pkg/metrics/querynode_metrics.go 100.00% <100.00%> (ø)
pkg/util/cache/cache.go 93.92% <86.66%> (ø)
...ernal/querynodev2/segments/metricsutil/observer.go 97.63% <97.63%> (ø)
internal/querynodev2/segments/retrieve.go 83.48% <50.00%> (-0.83%) ⬇️
internal/querynodev2/segments/search.go 89.85% <50.00%> (-2.09%) ⬇️
...nternal/querynodev2/segments/metricsutil/record.go 91.52% <91.52%> (ø)
internal/querynodev2/segments/manager.go 79.00% <0.00%> (-1.70%) ⬇️

... and 220 files with indirect coverage changes

@chyezh
Copy link
Contributor Author

chyezh commented Apr 1, 2024

rerun ut

@mergify mergify bot added the ci-passed label Apr 1, 2024
globalObserver = newSegmentsObserver()
go func() {
d := 15 * time.Minute
ticker := time.NewTicker(d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better make d configurable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just a metric expire time. we do not modify it frequently.

configurable metric expire time will be given at main PR: #31562

Copy link
Contributor

@longjiquan longjiquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaime0815
Copy link
Contributor

/lgtm

@chyezh
Copy link
Contributor Author

chyezh commented Apr 3, 2024

rebase and solve the conflict

@mergify mergify bot added the ci-passed label Apr 3, 2024
case SearchSegmentAccessRecord:
o.prom.ObserveSearchAccess(mm)
default:
panic("unknown segment access metric")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just log.Warn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just an assertion.
it's an unreachable code branch.
we can find bug when develop and CI by panic message easily.


// CacheLoadRecord records the metrics of a cache load.
type CacheLoadRecord struct {
bytes uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numBytes will be better?

@chyezh chyezh force-pushed the enhance_rg_level_metrics_3 branch from 250cb76 to 65f9316 Compare April 8, 2024 08:53
@mergify mergify bot removed the ci-passed label Apr 8, 2024
@chyezh chyezh force-pushed the enhance_rg_level_metrics_3 branch from 65f9316 to 8c09950 Compare April 8, 2024 11:35
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: chyezh <chyezh@outlook.com>
@chyezh chyezh force-pushed the enhance_rg_level_metrics_3 branch from 8c09950 to ee01639 Compare April 9, 2024 06:09
@chyezh
Copy link
Contributor Author

chyezh commented Apr 9, 2024

rebase and remove the conflict

Copy link
Contributor

mergify bot commented Apr 9, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh
Copy link
Contributor Author

chyezh commented Apr 9, 2024

/run-cpu-e2e

@chyezh
Copy link
Contributor Author

chyezh commented Apr 9, 2024

rerun ut

@mergify mergify bot added the ci-passed label Apr 10, 2024
@czs007
Copy link
Contributor

czs007 commented Apr 10, 2024

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chyezh, czs007

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit c9faa6d into milvus-io:master Apr 10, 2024
15 checks passed
@chyezh chyezh deleted the enhance_rg_level_metrics_3 branch April 10, 2024 02:56
chyezh added a commit to chyezh/milvus that referenced this pull request Apr 10, 2024
issue: milvus-io#30931

---------

Signed-off-by: chyezh <chyezh@outlook.com>
jaime0815 pushed a commit that referenced this pull request Apr 10, 2024
issue: #30931

Signed-off-by: chyezh <chyezh@outlook.com>
@chyezh
Copy link
Contributor Author

chyezh commented Apr 15, 2024

related dev PR: #31868, #32097

sunby pushed a commit to sunby/milvus that referenced this pull request Apr 22, 2024
Signed-off-by: chyezh <chyezh@outlook.com>

Add metric for lru and fix lost delete data when enable lazy load  (milvus-io#31868)

Signed-off-by: chyezh <chyezh@outlook.com>

feat: Support stream reduce v1 (milvus-io#31873)

related: milvus-io#31410

---------

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Change do wait lru dev (milvus-io#31878)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

enhance: add config for disk cache (milvus-io#31881)

fix config not initialized (milvus-io#31890)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix error handle in search (milvus-io#31895)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix: thread safe vector (milvus-io#31898)

fix: insert record cannot reinsert (milvus-io#31900)

enhance: cancel concurrency restrict for stream reduce and add metrics (milvus-io#31892)

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: bit set (milvus-io#31905)

fix bitset clear to reset (milvus-io#31908)

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Fix 0404 lru dev (milvus-io#31914)

fix:
1. sealed_segment num_rows reset to std::null opt
2. sealed_segment lazy_load reset to true after evicting to avoid
shortcut

---------

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix possible block due to unpin fifo activating principle (milvus-io#31924)

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Add lru reloader lru dev (milvus-io#31952)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix query limit (milvus-io#32060)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix: lru cache lost delete and wrong mem size (milvus-io#32072)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

enhance: add more metrics for cache and search (milvus-io#31777) (milvus-io#32097)

issue: milvus-io#30931

Signed-off-by: chyezh <chyezh@outlook.com>

fix:panic due to empty search result when stream reducing(milvus-io#32009) (milvus-io#32083)

related: milvus-io#32009

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: sealed segment may not exist when throw (milvus-io#32098)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

Major compaction 1st edition (milvus-io#31804) (milvus-io#32116)

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Co-authored-by: chasingegg <chao.gao@zilliz.com>

fix: inconsistent between state lock and load state (milvus-io#32171)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

enhance: Throw error instead of crash when index cannot be built (milvus-io#31844)

issue: milvus-io#27589

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

(cherry picked from commit 1b76766)
Signed-off-by: jaime <yun.zhang@zilliz.com>

update knowhere to support clustering (milvus-io#32188)

Signed-off-by: chasingegg <chao.gao@zilliz.com>

fix: segment release is not sync with cache (milvus-io#32212)

issue: milvus-io#32206

Signed-off-by: chyezh <chyezh@outlook.com>

fix: incorrect pinCount resulting unexpected eviction(milvus-io#32136) (milvus-io#32238)

related: milvus-io#32136

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: possible panic when stream reducing (milvus-io#32247)

related: milvus-io#32009

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

enhance: [lru-dev] add the related data size for the read apis (milvus-io#32274)

cherry-pick: milvus-io#31816

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>

add debug log (milvus-io#32303)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

Refine code for analyze task scheduler (milvus-io#32122)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

fix: memory leak on stream reduce (milvus-io#32345)

related: milvus-io#32304

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

feat: adding cache stats support (milvus-io#32344)

See milvus-io#32067

Signed-off-by: Ted Xu <ted.xu@zilliz.com>

Fix bug for version (milvus-io#32363)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

fix: remove sub entity in load delta log, update entity num in segment itself (milvus-io#32350)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

fix: clear data when loading failure (milvus-io#32370)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

fix: stream reduce memory leak for failing to release stream reducer(milvus-io#32345) (milvus-io#32381)

related: milvus-io#32345

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Keep InProgress state when getting task state is init (milvus-io#32394)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

add log for search failed (milvus-io#32367)

related: milvus-io#32136

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

enable asan by default (milvus-io#32423)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

Major compaction refactoring (milvus-io#32149)

Signed-off-by: wayblink <anyang.wang@zilliz.com>

Lru dev debug (milvus-io#32414)

Co-authored-by: wayblink <anyang.wang@zilliz.com>

fix: protect loadInfo with atomic, remove rlock at cache to avoid dead lock (milvus-io#32436)

issue: milvus-io#32435

Signed-off-by: chyezh <chyezh@outlook.com>

fix: use Get but not GetBy of SegmentManager (milvus-io#32438)

issue: milvus-io#32435

Signed-off-by: chyezh <chyezh@outlook.com>

fix: return growing segment when sealed (milvus-io#32460)

issue: milvus-io#32435

Signed-off-by: chyezh <chyezh@outlook.com>

enhance: add request resource for lru loading process(milvus-io#32205) (milvus-io#32452)

related: milvus-io#32205

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: unexpected deleted index files when lazy loading(milvus-io#32136) (milvus-io#32469)

related: milvus-io#32136

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: reference count leak cause release blocked (milvus-io#32465)

issue: milvus-io#32379

Signed-off-by: chyezh <chyezh@outlook.com>

Fix compaction fail (milvus-io#32473)

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm size/XL Denotes a PR that changes 500-999 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants