-
Notifications
You must be signed in to change notification settings - Fork 544
[Deployment] Disable IB driver installation by default #2595
Conversation
Disable IB driver installation by default. Azure VM builtin IB kernel modules into vmlinux image, IB driver installation will fail in this case. If IB installation is needed during deployment, set `skip-ib-installation` field to `true` in config.
1 similar comment
src/drivers/config/drivers.yaml
Outdated
@@ -28,4 +28,4 @@ version: "384.111" | |||
pre-installed-nvidia-path: /usr/local/nvidia | |||
|
|||
|
|||
skip-ib-installation: false | |||
skip-ib-installation: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can the name of configuration is positive (enable-IB-driver-installation), instead of negative(skip, disable, turn-off)?
- add some comments and update document to explain this setting, and potential issue once it's enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Rename skip-ib-installation to enable-ib-installation in config.
src/drivers/config/drivers.yaml
Outdated
@@ -27,5 +27,8 @@ version: "384.111" | |||
|
|||
pre-installed-nvidia-path: /usr/local/nvidia | |||
|
|||
|
|||
skip-ib-installation: false | |||
# Azure VM builtin IB kernel modules into vmlinux image, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not apply to Azure only. how about as below?
Some servers has already installed IB drivers, so this flag is disabled by default. If this flag is enabled, OpenPAI will try best to install the correct IB driver, but it may be failed due to compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation works if ib drivers have already been installed, and we need to re-install drivers in /var/drivers
path. The flag should be enabled in this case.
The only issue is linux kernel on Azure has builtin ib kernel modules, which fails the re-installation. Those kind of kernels are only used on Azure VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't tested on AWS, aliyun, and other various hardware configurations. So don't specify Azure here only, it may be a common case in cloud providers.
Disable IB driver installation by default.
Azure VM builtin IB kernel modules into vmlinux image,
IB driver installation will fail in this case.
If IB installation is needed during deployment,
set
enable-ib-installation
field totrue
in config.