-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug - Fix issues for Ansible and benchmarks #267
Conversation
Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large.
Support both absolute and relative paths when fecth results.
Use a deterministic image in Ansible test to avoid image update.
Update logging format.
Delete torch models and inputs after export.
Codecov Report
@@ Coverage Diff @@
## release/0.4 #267 +/- ##
===============================================
+ Coverage 87.81% 87.83% +0.02%
===============================================
Files 75 75
Lines 4350 4358 +8
===============================================
+ Hits 3820 3828 +8
Misses 530 530
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Update according to comment.
__Description__ Fix issues for Ansible and benchmarks: * Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large. * Support both absolute and relative paths when fecth results. * Use a deterministic image in Ansible test to avoid image update. * Update logging format. * Delete torch models and inputs after export.
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
Description
Fix issues for Ansible and benchmarks: