-
Notifications
You must be signed in to change notification settings - Fork 53
first-boot infinite loop if exit code non-zero #202
Comments
As far as I’m concerned this works as intended. It was a deliberate move on my part to require scripts to exit 0 to indicate success. Which “new behaviour” would people be relying on? Imagr has behaved this way for as long as I can remember. |
It's "new" for me :-) - I only just rebuilt my netrestore.nbi since the old one worked for El Capitan and Sierra. The commit I mentioned is dated 16 Aug 2016. Whatever version of imagr that I was using before didn't go into an infinite loop. |
Checking the exit code is a good thing to do. People that have used imagr first boot scripts since this change being made will generally be relying on the retries to cover up sporadic failures, so removing the retries now isn't the right answer. A retry limit, cancel button and presumably dialogue if the retry limit is exceeded would be a nicer way to handle a permanently non-zero exit code. |
A dialogue on exceeding the retry limit would require local intervention on failure, so its use should be configurable and/or could have a configurable timeout. |
What happens if there are multiple first-boot scripts? I only have one, so I don't know. What should happen if there are multiple first-boot scripts? Should too many non-zero exit codes prevent later first-boot scripts from running? I like things to fail fast, so I'd want that. It's more conservative, so I think that should probably be the default. Others might want things to work as best as possible despite failures, but would still want to know if any steps failed. |
My first-boot script (with its non-zero exit code) is getting rerun in a loop even after I restart the mac, preventing a local login. So, to debug a non-zero exit code I need to either:
|
If you wish to do a PR to do this I would look at it, but I will not be working on this as I consider this a feature, not a bug. I personally think your scripts should be able to recover from a failure. |
Description of Issue/Question
This part:
fed0b94#diff-60e0e6a591efb16cc6f2c5fc2391f857
...of this commit:
fed0b94
...means that a non-zero exit code from a first-boot scripts now
causes the script to be re-run. This is a problem for a few reasons:
It means that first-boot scripts now need to be idempotent. That's
not a bad goal, but it's something that's generally hard to acheive.
Mainly since it's something that's hard to test. It's also generally
not cheap to implement - e.g. if you can't atomically perform some
operation, you might need to fingerprint some huge files to check
that an earlier attempt at running a first-boot script succeeded.
If don't manage to make every single step idempotent a first-boot
script can now fail on the first try and then "succeed" the next
time. This means that you could be left with broken installation
without realizing it. I.e. first-boot scripts can't now assume that
they are working in a pristine environment.
There doesn't seem to be a way to break out of the loop other than
doing a hard reset, so it's hard to debug problems.
Since people will be relying on the new behaviour too, it looks like a
configurable per-first-boot-file retry limit is required here. If it
defaults to some small number, then people that rely on these retries
should remain happy, but first boot scripts will eventually stop
running if they are broken.
I could then configure imagr to only allow one attempt at running each
first-boot script, avoiding the need to write idempotent scripts.
A UI for cancelling the retries early would be nice.
(I don't like filing bugs and then not offering to fix them, but I'm
busy and it looks like I can wrap my one first-boot script with code
that saves the real exit code somehow but then returns with a zero
exit code.)
Setup
This is slightly editted, and could presumably be simplified to just a
config with just a first-boot script with non-zero exit code.
Steps to Reproduce Issue
Non-zero exit code from a first-boot script.
Versions Report
The text was updated successfully, but these errors were encountered: