Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug - Fix issues for Ansible and benchmarks #267

Merged
merged 6 commits into from
Dec 16, 2021

Conversation

abuccts
Copy link
Member

@abuccts abuccts commented Dec 15, 2021

Description

Fix issues for Ansible and benchmarks:

  • Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large.
  • Support both absolute and relative paths when fecth results.
  • Use a deterministic image in Ansible test to avoid image update.
  • Update logging format.
  • Delete torch models and inputs after export.

Cleanup Ansible runner private data dir to avoid out of disk space
issue when node number is large.
Support both absolute and relative paths when fecth results.
Use a deterministic image in Ansible test to avoid image update.
Update logging format.
Delete torch models and inputs after export.
@abuccts abuccts added the bug Something isn't working label Dec 15, 2021
@abuccts abuccts requested a review from a team as a code owner December 15, 2021 07:49
@codecov
Copy link

codecov bot commented Dec 15, 2021

Codecov Report

Merging #267 (7f0f759) into release/0.4 (682ed06) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           release/0.4     #267      +/-   ##
===============================================
+ Coverage        87.81%   87.83%   +0.02%     
===============================================
  Files               75       75              
  Lines             4350     4358       +8     
===============================================
+ Hits              3820     3828       +8     
  Misses             530      530              
Flag Coverage Δ
cpu-unit-test 72.04% <15.38%> (-0.12%) ⬇️
cuda-unit-test 87.79% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...nchmarks/micro_benchmarks/_export_torch_to_onnx.py 95.83% <100.00%> (+0.71%) ⬆️
superbench/runner/ansible.py 97.91% <100.00%> (+0.04%) ⬆️
superbench/runner/runner.py 86.71% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 682ed06...7f0f759. Read the comment docs.

Update according to comment.
@abuccts abuccts enabled auto-merge (squash) December 16, 2021 05:59
@abuccts abuccts merged commit a15f773 into release/0.4 Dec 16, 2021
@abuccts abuccts deleted the xiongyf/v0.4-bug-fixes branch December 16, 2021 06:21
abuccts added a commit that referenced this pull request Dec 29, 2021
__Description__

Fix issues for Ansible and benchmarks:
* Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large.
* Support both absolute and relative paths when fecth results.
* Use a deterministic image in Ansible test to avoid image update.
* Update logging format.
* Delete torch models and inputs after export.
abuccts added a commit that referenced this pull request Dec 30, 2021
__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)

Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants