This repository has been archived by the owner on Sep 30, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 8
/
README.infiniband
84 lines (64 loc) · 2.92 KB
/
README.infiniband
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#
# Copyright (c) 2013-2014 University of Wisconsin-La Crosse.
# All rights reserved.
#
# Additional copyrights may follow.
# See COPYING in top-level directory.
#
First you have to build netloc, install it and add netloc binary path
to your PATH environment variable. In bash, just do:
$ PATH=$PATH:/path/to/netloc/installation/bin
$ export PATH
Then create a directory to gather your cluster information and enter it:
$ mkdir mycluster-data
$ cd mycluster-data/
Create a subdirectory "hwloc" and place the lstopo XML output of your nodes
there.
$ ssh node001 lstopo ~/mycluster-data/hwloc/node001.xml
[...]
$ ls -l hwloc/
-rw-r--r-- 1 user group 74242 13 sept. 15:07 node001.xml
[...]
-rw-r--r-- 1 user group 26438 24 sept. 08:59 node263.xml
-rw-r--r-- 1 user group 26438 13 sept. 08:11 node264.xml
Now run the netloc_ib_gather_raw script to gather IB network information.
It uses the hwloc information to find out the IB subnets to query.
It uses ibnetdiscover and ibroute utilities which require privileged
access. You can either run the entire script as root, or pass --sudo
so that ibnetdiscover and ibroute are called by sudo.
If the hwloc XML files are not in the "hwloc", specify it with --hwloc-dir.
$ netloc_ib_gather_raw --out-dir ib-raw --sudo
Found 1 subnets in hwloc directory:
Subnet fe80:0000:0000:0000 is locally accessible from board qib0 port 1.
Looking at fe80:0000:0000:0000 (through local board qib0 port 1)...
Running ibnetdiscover...
Getting routes...
Running ibroute for switch LID 19...
[...]
Running ibroute for switch LID 7...
Running ibroute for switch LID 12...
Now you have IB information under the "ib-raw" directory.
Each subnet gets one file and one directory.
$ ls -l ib-raw/
-rw-r--r-- 1 user group 274493 22 janv. 15:21 ib-subnet-fe80:0000:0000:0000.txt
drwxr-xr-x 2 user group 4096 22 janv. 15:21 ibroutes-fe80:0000:0000:0000/
The last step is to run netloc_ib_extract_dats to collect the IB information.
It automatically looks inside the "ib-raw" directory by default and generates
the output under "netloc". These can be changed with "--raw-dir" and "--out-dir".
$ netloc_ib_extract_dats
----------------------------------------------------------------------
Processing Subnet: fe80:0000:0000:0000
----------------------------------------------------------------------
--------------- General Network Information
The netloc information is now ready for use under the "netloc" subdirectory.
$ ls -l netloc/
-rw-r--r-- 1 user group 3226573 22 janv. 15:31 IB-fe80:0000:0000:0000-log-paths.ndat
-rw-r--r-- 1 user group 384011 22 janv. 15:31 IB-fe80:0000:0000:0000-nodes.ndat
-rw-r--r-- 1 user group 3237964 22 janv. 15:31 IB-fe80:0000:0000:0000-phy-paths.ndat
$ lsnettopo netloc/
Network: IB-fe80:0000:0000:0000
Type : InfiniBand
Subnet : fe80:0000:0000:0000
Hosts : 279
Switches: 25
[...]