Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzer - Add support to store values of metrics in data diagnosis #392

Merged
merged 7 commits into from
Aug 23, 2022

Conversation

yukirora
Copy link
Contributor

@yukirora yukirora commented Aug 19, 2022

Description
Add support to store values of metrics in data diagnosis.

Take the following rules as example:

    nccl_store_rule:
      categories: NCCL_DIS
      store: True
      metrics:
        - nccl-bw:allreduce-run0/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run1/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run2/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run3/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run4/allreduce_1073741824_busbw
    nccl_rule:
      function: multi_rules
      criteria: 'lambda label:True if min(label["nccl_store_rule"].values())/max(label["nccl_store_rule"].values())<0.95 else False'
      categories: NCCL_DIS

nccl_store_rule will store the values of the metrics in dict and save them into label["nccl_store_rule"] , and then rccl_rule can use the values of metrics through label["nccl_store_rule"].values() in criteria

@yukirora yukirora added the tool label Aug 19, 2022
@yukirora yukirora requested a review from a team as a code owner August 19, 2022 08:19
@cp5555
Copy link
Contributor

cp5555 commented Aug 20, 2022

Would you please give an example in description for this feature?

@cp5555 cp5555 mentioned this pull request Aug 20, 2022
27 tasks
@codecov
Copy link

codecov bot commented Aug 22, 2022

Codecov Report

Merging #392 (5a45ed4) into main (10a79c4) will increase coverage by 13.62%.
The diff coverage is 88.46%.

@@             Coverage Diff             @@
##             main     #392       +/-   ##
===========================================
+ Coverage   75.02%   88.65%   +13.62%     
===========================================
  Files          83       83               
  Lines        5158     5264      +106     
===========================================
+ Hits         3870     4667      +797     
+ Misses       1288      597      -691     
Flag Coverage Δ
cpu-python3.6-unit-test 75.10% <88.00%> (+0.07%) ⬆️
cpu-python3.7-unit-test 75.10% <88.00%> (+0.07%) ⬆️
cuda-unit-test 88.58% <88.46%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superbench/analyzer/data_diagnosis.py 89.13% <87.50%> (+0.84%) ⬆️
superbench/analyzer/diagnosis_rule_op.py 97.16% <100.00%> (+0.26%) ⬆️
superbench/benchmarks/context.py 100.00% <0.00%> (ø)
superbench/analyzer/summary_op.py 98.36% <0.00%> (+0.28%) ⬆️
...perbench/benchmarks/micro_benchmarks/micro_base.py 86.11% <0.00%> (+0.39%) ⬆️
superbench/benchmarks/reducer.py 93.93% <0.00%> (+0.60%) ⬆️
superbench/benchmarks/result.py 93.33% <0.00%> (+0.83%) ⬆️
...perbench/benchmarks/model_benchmarks/model_base.py 86.30% <0.00%> (+0.86%) ⬆️
... and 27 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@yukirora yukirora mentioned this pull request Aug 22, 2022
27 tasks
@yukirora yukirora enabled auto-merge (squash) August 23, 2022 03:03
@yukirora yukirora merged commit 733860d into main Aug 23, 2022
@yukirora yukirora deleted the yutji/store-value branch August 23, 2022 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants