-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DRAC configuration steps more robust #193
Comments
Just to clarify what happens in each stage: stage1
stage2
|
tl;dr: at this point this appears to be a firmware issue, which was likely resolved in some newer version of the firmware. I have a bit more data on this based on my experiences and testing with bringing up BOG04. All machines at BOG04 were stuck in stage1 because of a known bug in the epoxy_client. However, since we can login as root to stage1 boots, I was able to login and experiment with ipmitool manually. What I found is that more often than not, calls to ipmitool to modify network settings would yield something like the following:
The command would hang for right about 30s, and then dump that last message and exit. In some of the cases, I found that the value had actually been modified, despite the warning. In other cases, the value was not modified. I found that this problem existed in stage1, as well as when booted to stage3 being part of the cluster. Searches yielded little, but a number of results indicated that a firmware issue was likely the root cause, so on mlab2-bog04 I upgraded the iDRAC with Lifecycle Controller firmware to 4.20.20.20, the latest version. After the upgrade, calls to ipmitool to modify network settings returned nearly instantly, and worked. The workaround suggested in the first comment of this issue is likely the easiest fix, for now, and can't hurt in any case. I discovered that despite the warning/error, after a couple of tries the setting eventually took. Upgrading the firmware could be nice and there are several possible options:
The latter option sounds the best, but at this moment I have no idea how it could be accomplished, thought I am sure it can be done. |
We currently apply both a basic DRAC configuration in stage1 and a full one during stage2. To apply these configurations we use
ipmitool
, which awaits for confirmation from DRAC after sending each command. For unknown reasons, on R640s some of the commands can take a long time to be confirmed and ipmitool times out.Both the stage1 and the stage2 scripts should be modified to tolerate these transient failures and keep retrying each command for a few times (e.g. 10).
The text was updated successfully, but these errors were encountered: