Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support topo-aware IB performance validation #373

Merged
merged 4 commits into from
Jul 26, 2022

Conversation

jeseszhang1010
Copy link
Contributor

@jeseszhang1010 jeseszhang1010 commented Jul 11, 2022

Add a new pattern topo-aware, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern topo-aware
--ibstat path to ibstat output
--ibnetdiscover path to ibnetdiscover output
--min_dist minimum distance of VM pairs (optional, default 2)
--max_dist maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these generated VM pairs.

Signed-off-by: Jie Zhang jessezhang1010@gmail.com

@jeseszhang1010
Copy link
Contributor Author

jeseszhang1010 commented Jul 11, 2022

@cp5555 @abuccts
I'm seeing cpu-unit-test and cuda-unit-test failures, please grant me the access to these build logs or paste the failures here, so I can try to fix.

Thanks @abuccts for granting me the access. It's failing with code lint. I will fix them shortly.

superbench/common/utils/topo_aware.py Outdated Show resolved Hide resolved
superbench/common/utils/topo_aware.py Outdated Show resolved Hide resolved
superbench/common/utils/topo_aware.py Outdated Show resolved Hide resolved
superbench/common/utils/topo_aware.py Show resolved Hide resolved
superbench/common/utils/topo_aware.py Outdated Show resolved Hide resolved
@cp5555 cp5555 mentioned this pull request Jul 18, 2022
27 tasks
@jeseszhang1010 jeseszhang1010 force-pushed the topo-aware branch 2 times, most recently from cc0860b to e4c1071 Compare July 21, 2022 01:12
setup.py Outdated Show resolved Hide resolved
superbench/common/utils/topo_aware.py Outdated Show resolved Hide resolved
superbench/common/utils/topo_aware.py Outdated Show resolved Hide resolved
tests/data/ib_traffic_topo_aware_hostfile Outdated Show resolved Hide resolved
@jeseszhang1010 jeseszhang1010 force-pushed the topo-aware branch 2 times, most recently from 3ec835a to cf94e24 Compare July 22, 2022 16:25
@codecov
Copy link

codecov bot commented Jul 22, 2022

Codecov Report

Merging #373 (d941ed3) into main (5d448ee) will decrease coverage by 0.01%.
The diff coverage is 88.66%.

@@            Coverage Diff             @@
##             main     #373      +/-   ##
==========================================
- Coverage   88.89%   88.88%   -0.02%     
==========================================
  Files          82       83       +1     
  Lines        5044     5191     +147     
==========================================
+ Hits         4484     4614     +130     
- Misses        560      577      +17     
Flag Coverage Δ
cpu-unit-test 75.14% <88.66%> (+0.39%) ⬆️
cuda-unit-test 88.80% <88.66%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...arks/micro_benchmarks/ib_validation_performance.py 89.30% <81.81%> (-0.64%) ⬇️
superbench/common/utils/topo_aware.py 89.13% <89.13%> (ø)
superbench/common/utils/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d448ee...d941ed3. Read the comment docs.

Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern	topo-aware
--ibstat	path to ibstat output
--ibnetdiscover	path to ibnetdiscover output
--min_dist	minimum distance of VM pairs (optional, default 2)
--max_dist	maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these
generated VM pairs.

Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.

Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
@jeseszhang1010 jeseszhang1010 merged commit ef4d657 into microsoft:main Jul 26, 2022
@cp5555 cp5555 added the benchmarks SuperBench Benchmarks label Aug 11, 2022
@yukirora yukirora mentioned this pull request Aug 22, 2022
27 tasks
rafsalas19 pushed a commit to rafsalas19/superbenchmark that referenced this pull request Jan 26, 2023
An enhancement for topo-aware IB performance validation microsoft#373.
This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks SuperBench Benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants