Analyzer - Add support to store values of metrics in data diagnosis #392

yukirora · 2022-08-19T08:19:07Z

Description
Add support to store values of metrics in data diagnosis.

Take the following rules as example:

    nccl_store_rule:
      categories: NCCL_DIS
      store: True
      metrics:
        - nccl-bw:allreduce-run0/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run1/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run2/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run3/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run4/allreduce_1073741824_busbw
    nccl_rule:
      function: multi_rules
      criteria: 'lambda label:True if min(label["nccl_store_rule"].values())/max(label["nccl_store_rule"].values())<0.95 else False'
      categories: NCCL_DIS

nccl_store_rule will store the values of the metrics in dict and save them into label["nccl_store_rule"] , and then rccl_rule can use the values of metrics through label["nccl_store_rule"].values() in criteria

cp5555 · 2022-08-20T04:24:53Z

Would you please give an example in description for this feature?

tests/analyzer/test_data_diagnosis.py

superbench/analyzer/data_diagnosis.py

codecov · 2022-08-22T02:06:06Z

Codecov Report

Merging #392 (5a45ed4) into main (10a79c4) will increase coverage by 13.62%.
The diff coverage is 88.46%.

@@             Coverage Diff             @@
##             main     #392       +/-   ##
===========================================
+ Coverage   75.02%   88.65%   +13.62%     
===========================================
  Files          83       83               
  Lines        5158     5264      +106     
===========================================
+ Hits         3870     4667      +797     
+ Misses       1288      597      -691

Flag	Coverage Δ
cpu-python3.6-unit-test	`75.10% <88.00%> (+0.07%)`	⬆️
cpu-python3.7-unit-test	`75.10% <88.00%> (+0.07%)`	⬆️
cuda-unit-test	`88.58% <88.46%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superbench/analyzer/data_diagnosis.py	`89.13% <87.50%> (+0.84%)`	⬆️
superbench/analyzer/diagnosis_rule_op.py	`97.16% <100.00%> (+0.26%)`	⬆️
superbench/benchmarks/context.py	`100.00% <0.00%> (ø)`
superbench/analyzer/summary_op.py	`98.36% <0.00%> (+0.28%)`	⬆️
...perbench/benchmarks/micro_benchmarks/micro_base.py	`86.11% <0.00%> (+0.39%)`	⬆️
superbench/benchmarks/reducer.py	`93.93% <0.00%> (+0.60%)`	⬆️
superbench/benchmarks/result.py	`93.33% <0.00%> (+0.83%)`	⬆️
...perbench/benchmarks/model_benchmarks/model_base.py	`86.30% <0.00%> (+0.86%)`	⬆️
... and 27 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

add support for storing values of metrics in data diagnosis

65d10ee

yukirora added the tool label Aug 19, 2022

yukirora requested review from cp5555 and abuccts August 19, 2022 08:19

yukirora requested a review from a team as a code owner August 19, 2022 08:19

yukirora added 3 commits August 19, 2022 17:10

fix lint issue

fc2fd52

fix lint issue

954e2af

fix test issue

6ba1ed0

cp5555 mentioned this pull request Aug 20, 2022

V0.6.0 Release Plan #359

Closed

27 tasks

cp5555 reviewed Aug 20, 2022

View reviewed changes

tests/analyzer/test_data_diagnosis.py Show resolved Hide resolved

cp5555 reviewed Aug 20, 2022

View reviewed changes

tests/analyzer/test_data_diagnosis.py Outdated Show resolved Hide resolved

cp5555 reviewed Aug 20, 2022

View reviewed changes

tests/analyzer/test_data_diagnosis.py Outdated Show resolved Hide resolved

cp5555 reviewed Aug 20, 2022

View reviewed changes

superbench/analyzer/data_diagnosis.py Outdated Show resolved Hide resolved

yukirora and others added 2 commits August 21, 2022 21:42

update according to comments

0e5248f

fix test issue

0b2c918

yukirora mentioned this pull request Aug 22, 2022

V0.6.0 Test Plan #393

Closed

27 tasks

abuccts approved these changes Aug 22, 2022

View reviewed changes

cp5555 approved these changes Aug 22, 2022

View reviewed changes

Merge branch 'main' into yutji/store-value

5a45ed4

yukirora enabled auto-merge (squash) August 23, 2022 03:03

yukirora merged commit 733860d into main Aug 23, 2022

yukirora deleted the yutji/store-value branch August 23, 2022 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzer - Add support to store values of metrics in data diagnosis #392

Analyzer - Add support to store values of metrics in data diagnosis #392

yukirora commented Aug 19, 2022 •

edited

Loading

cp5555 commented Aug 20, 2022

codecov bot commented Aug 22, 2022 •

edited

Loading

Analyzer - Add support to store values of metrics in data diagnosis #392

Analyzer - Add support to store values of metrics in data diagnosis #392

Conversation

yukirora commented Aug 19, 2022 • edited Loading

cp5555 commented Aug 20, 2022

codecov bot commented Aug 22, 2022 • edited Loading

Codecov Report

yukirora commented Aug 19, 2022 •

edited

Loading

codecov bot commented Aug 22, 2022 •

edited

Loading