-
Notifications
You must be signed in to change notification settings - Fork 931
Description
While testing on a node with 4 HFI (Intel OmniPath devices) using openib btl we saw that a Warning message is shown when the default subent GID is used to warn the user of possible miss behaviors. This is described in detail here: https://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
However, found that the text does not represent what the code is actually testing/finding. The text says that two ports have the default subnet GID. When, instead the code is testing if there is at least one "openib btl" available and the CURRENT subnet ID is the default one and the warning message is not masked, then print the Warning. See:
"ompi/mca/btl/openib/btl_openib_component.c" line 750 in ompi-release branch v1.10:
if(mca_btl_openib_component.ib_num_btls > 0 &&
IB_DEFAULT_GID_PREFIX == subnet_id &&
mca_btl_openib_component.warn_default_gid_prefix) {
opal_show_help("help-mpi-btl-openib.txt", "default subnet prefix",
true, ompi_process_info.nodename);
}In my setup, as I mentioned, I have 4 ports and only hfi1_1 with the default subnet GID. Each time that port is "evaluated" the warning message is shown. Playing with "- x OMPI_MCA_btl_openib_if_include=hfi1_x,hfi1_y " (being x and y numbers different than 1 and smaller than 4) the warning message is not shown.
Therefore, there are two ways to fix this:
a) update the error message (and the FAQ) entry describing the behavior.
b) update the code to compare all subnet GIDs.