New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operating System 5.8 update breaking VHDX image #1092
Comments
Hm, I did test upgrade from release 4 to 5.8 using KVM, and haven't noticed this. Is this also reproducible if you import/install a new 5.8 installation as a new machine? |
I have not tried yet but i could do over the weekend. Also the key here is that the upgrade worked and the first reboot went through fine. It was only a 2nd reboot after OS upgrade which failed. (I only caught the problem because I was rebooting in a vain effort to get something unrelated working and happened to have done the upgrade only an hour or so before so still had a snap shot to recover from.) One more point which occurred to me after I logged the case. |
I see the exact same behavior as tonyjabson. When updating from OS 4.17 to 5.8, the WHDX image fails to load the OS – Have tested this twice (on Core 0.118.4 and Core 2020.12.0) and can reproduce. version: 0.118.4 Home Assistant Cloud logged_in: false Hass.io host_os: HassOS 4.17 |
I had similar no boot issue on three HASSOS vm's I run when upgrading. They would not boot from saved state but you could reload the last snapshot (after the upgrade) and they would boot. Then a further restart (from saved state) and the issue would happen again. It dumps you to the efi shell and when you look in the EFI directory there is nothing there, no .efi file to boot. Needing a solution I made a HA snaphost and then started from scratch with fresh release copy of 5.8 and restored my HA snaphost and now all is good. Still it's a bummer that the upgrade failed. Maybe on major upgrades the upgrade button in the UI should warn and suggest maybe doing what I ultimately had to do. This post's answer could explain why one can't boot a second time. https://askubuntu.com/questions/454557/virtualbox-virtual-machines-wont-boot-after-cloning. Also I found this video which explains how to boot when it can't find the efi to boot. https://www.youtube.com/watch?v=YCegkcVheJA so apparently the virtual nram settings are getting borked per that post and the efi partition doesn't seem to have the .efi booting file (per efi shell) and/or the UUIDs where changed during the upgrade process. Related after some further investigating whenever I change the UUID of the virtual disk or clone it which does the same thing I get a similar issue. This was pointed out many places about poorly designed VMs, if the image has a "hard coded" UUID then changing that will cause problems. It seems going forward that HASSOS should support changing the UUID of the filesystem partition. Normally if I did this on hardware I would burn the new filesystem image, change the UUID then go into fstab and change the UUID accordingly. In the case of a VM I have no idea how to mount the partitions of a virtual drive without booting. I suppose you'd have to get another support VM running and mount your virtual drive there and then maybe you could edit what needs to be edited. |
@dkebler the update doesn't change the UUID. In fact our image uses However, the EFI directory definitely should not be empty. My best guess is that something went wrong during writing the update. I'd love to reproduce this, as you say it happened in three independent instances, it sounds like it should be reproducible. I just tried upgrading from 4.11 and 4.17 to 5.8, in both situation things worked. Also tried with taking a snapshot. Which version have you been using before? |
@PerWeimann are you using HyperV then too? So far the only way I was able to corrupt my EFI partition was to force turn off the machine right after update (which, obviously, is not a good idea). I probably can improve the update process to shorten the time a forceful power off will corrupt EFI. Did you do a proper reboot (no system reset) as well as a proper power off? |
When we write the update to the boot partiton, there is nothing which makes sure that data is written to disk. This leaves a rather large window (probably around 30s) where a machine reset/poweroff can lead to a corrupted boot partition. Use the sync mount option to minimize the corruption window. Note that sync is not ideal for flash drives normally. But since we write very little and typically only on OS update to the boot partition, this shouldn't be a problem.
It happened to me On Virtualbox. I could not get it boot after power off.. and with the fresh install .. it’s booting very slow.. |
When we write the update to the boot partiton, there is nothing which makes sure that data is written to disk. This leaves a rather large window (probably around 30s) where a machine reset/poweroff can lead to a corrupted boot partition. Use the sync mount option to minimize the corruption window. Note that sync is not ideal for flash drives normally. But since we write very little and typically only on OS update to the boot partition, this shouldn't be a problem.
I use vbox and after starting with a vdhx I had converted to a vdi (although the op had same issue without doing that). I was at 4.17 before I did the upgrade. I had no issue earlier when I did 4.x upgrades this only happened jumping to 5.x. I usually do an an apci shutdown so it shuts down normally. I did not restart the vm during the upgrade and it came back fine showing the upgrade to HassOS 5.8. It is on the next start of the vm when the issue arises. Per other part of my comment. Since this project has complete control over the partition labels and /dev names then I suggest you mount the root file system and other partitions based either on the /dev or a label you give them. In this way it's not dependent on a unique UUID and can be cloned and still boot. As you said it's supposedly not the issue (but maybe it is??). Either way as a virtual drive it's easy to change the UUID whereas in a metal install a write on parts of the filesystem partition isn't going to change the UUID, and if you are going to move it will be to another metal where UUID would not conflict where as it's possible to have alternate VMs on the same metal and in the case where you don't change the UUID you can't even load the VM copy cause it complains about having the same UUID. Thus you must clone it. Then when you try to boot the clone it fails for the reasons I've already stated. |
We do mount by file system label, so from a OS perspective it doesn't matter. Not sure what the firmware (UEFI BIOS) is doing, it might be that it tries to do something with GUID partition UUID's (like remember what partition UUID you booted from the last time or similar). I am just saying, we ship with the same UUID always since we ship as a an image. The image also didn't change the UUID... But in theory, it shouldn't even matter since we use FS label. |
When we write the update to the boot partiton, there is nothing which makes sure that data is written to disk. This leaves a rather large window (probably around 30s) where a machine reset/poweroff can lead to a corrupted boot partition. Use the sync mount option to minimize the corruption window. Note that sync is not ideal for flash drives normally. But since we write very little and typically only on OS update to the boot partition, this shouldn't be a problem.
When we write the update to the boot partiton, there is nothing which makes sure that data is written to disk. This leaves a rather large window (probably around 30s) where a machine reset/poweroff can lead to a corrupted boot partition. Use the sync mount option to minimize the corruption window. Note that sync is not ideal for flash drives normally. But since we write very little and typically only on OS update to the boot partition, this shouldn't be a problem.
This update broke mine too. I didn't want to deal with the EFI stuff and I have daily snapshots backed up to google drive so I just downloaded a new image and restored it. Had to resync some integrations but was still probably faster than fighting the corrupted EFI directory. Next time I'll snapshot in virtualbox before doing OS upgrade (my last one was from February and it couldn't restore the snapshot). |
I got promising results by changing eeprom settings. HA at least got through more of it's boot sequence. I'm not sure about the problem now, but I think I need to adjust SystemD timeout. |
I have had no issues updating my VHDX on hyper-v. |
@agners - Sorry for late reply! Yes, running Hyper-V on Server 2016. The virtual machine is running Hyper-V configuration 8.0. I'm still able to reproduce when upgrading from OS 4.17 to OS 5.10. |
@PerWeimann is this reproducible with 5.11? It is currently in the beta channel, but you can use |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. |
Hardware Environment
Windows HyperV running on Windows Server X86.
Home Assistant OS release:
Supervisor logs:
System Health
version: 2020.12.0
installation_type: Home Assistant OS
dev: false
hassio: true
docker: true
virtualenv: false
python_version: 3.8.6
os_name: Linux
os_version: 5.4.77
arch: x86_64
timezone: Europe/London
logged_in: true
subscription_expiration: December 22, 2020, 12:00 AM
relayer_connected: true
remote_enabled: false
remote_connected: false
alexa_enabled: true
google_enabled: true
can_reach_cert_server: ok
can_reach_cloud_auth: ok
can_reach_cloud: ok
host_os: HassOS 4.17
update_channel: stable
supervisor_version: 2020.12.6
docker_version: 19.03.12
disk_total: 8.2 GB
disk_used: 3.8 GB
healthy: true
supported: true
board: ova
supervisor_api: ok
version_api: ok
installed_addons: Samba share (9.3.0), Node-RED (7.2.8), Mosquitto broker (5.1), File editor (5.2.0), Check Home Assistant configuration (3.6.0)
dashboards: 1
mode: auto-gen
resources: 0
If I apply the System Update to Operating System 5.8 from 4.17 the update successes. The instance reboots and everything looks good.
If I then reboot again the VM wont boot any longer as the virtual disk is no longer bootable and it hangs looking for a DHCP PXE boot.
I have replicated this twice (I take a snap shot before I update anything so was able to recover and replicate the problem.)
I have been able to do reboot 4 times in a row having updated Home Assistant but no the host OS so I'm fairly sure it's host issue.
The text was updated successfully, but these errors were encountered: