Bug - Fix issues for Ansible and benchmarks #267

abuccts · 2021-12-15T07:49:34Z

Description

Fix issues for Ansible and benchmarks:

Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large.
Support both absolute and relative paths when fecth results.
Use a deterministic image in Ansible test to avoid image update.
Update logging format.
Delete torch models and inputs after export.

Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large.

Support both absolute and relative paths when fecth results.

Use a deterministic image in Ansible test to avoid image update.

Update logging format.

Delete torch models and inputs after export.

codecov · 2021-12-15T07:56:40Z

Codecov Report

Merging #267 (7f0f759) into release/0.4 (682ed06) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@               Coverage Diff               @@
##           release/0.4     #267      +/-   ##
===============================================
+ Coverage        87.81%   87.83%   +0.02%     
===============================================
  Files               75       75              
  Lines             4350     4358       +8     
===============================================
+ Hits              3820     3828       +8     
  Misses             530      530

Flag	Coverage Δ
cpu-unit-test	`72.04% <15.38%> (-0.12%)`	⬇️
cuda-unit-test	`87.79% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...nchmarks/micro_benchmarks/_export_torch_to_onnx.py	`95.83% <100.00%> (+0.71%)`	⬆️
superbench/runner/ansible.py	`97.91% <100.00%> (+0.04%)`	⬆️
superbench/runner/runner.py	`86.71% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 682ed06...7f0f759. Read the comment docs.

tests/ansible/tests/test_deploy.yaml

Update according to comment.

__Description__ Fix issues for Ansible and benchmarks: * Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large. * Support both absolute and relative paths when fecth results. * Use a deterministic image in Ansible test to avoid image update. * Update logging format. * Delete torch models and inputs after export.

__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

abuccts added 5 commits December 15, 2021 15:40

Cleanup Ansible runner private data dir

43747ba

Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large.

Support both absolute and relative paths in fetch

9ab64ca

Support both absolute and relative paths when fecth results.

Use a deterministic image in Ansible test

c2c51dd

Use a deterministic image in Ansible test to avoid image update.

Update logging format

e6416ac

Update logging format.

Delete torch models and inputs after export

1363e49

Delete torch models and inputs after export.

abuccts added the bug Something isn't working label Dec 15, 2021

abuccts requested review from cp5555 and guoshzhao December 15, 2021 07:49

abuccts requested a review from a team as a code owner December 15, 2021 07:49

cp5555 reviewed Dec 15, 2021

View reviewed changes

tests/ansible/tests/test_deploy.yaml Show resolved Hide resolved

yzygitzh approved these changes Dec 16, 2021

View reviewed changes

Update according to comment

7f0f759

Update according to comment.

cp5555 approved these changes Dec 16, 2021

View reviewed changes

abuccts enabled auto-merge (squash) December 16, 2021 05:59

abuccts merged commit a15f773 into release/0.4 Dec 16, 2021

abuccts deleted the xiongyf/v0.4-bug-fixes branch December 16, 2021 06:21

abuccts mentioned this pull request Dec 29, 2021

Release - SuperBench v0.4.0 #278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug - Fix issues for Ansible and benchmarks #267

Bug - Fix issues for Ansible and benchmarks #267

abuccts commented Dec 15, 2021

codecov bot commented Dec 15, 2021 •

edited

Bug - Fix issues for Ansible and benchmarks #267

Bug - Fix issues for Ansible and benchmarks #267

Conversation

abuccts commented Dec 15, 2021

codecov bot commented Dec 15, 2021 • edited

Codecov Report

codecov bot commented Dec 15, 2021 •

edited