New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Straw Man 1-prep/templates/iiab-expand-rootfs.service based on 2020's PR #2522 + /usr/sbin/iiab-expand-rootfs "bash -xe" exit-on-error (to defer deleting /.expand-rootfs) #3337
Conversation
See also @jvonau and @tim-moody's #3325 suggestions from 9 days ago:
|
FWIW the RasPiOS version of
And its Ubuntu man page is: |
Just FYI |
An oversimplified (but plausible) concern is that rootfs expansion race conditions (similar to #3325) might occur during 1 out of every 20 boots:
I don't know if {Ubuntu, Mint, Debian, RasPiOS} abide by this default? And what are default maximums for allowed days/weeks/months (between each fsck) which also affect this? One very hypothetical collision avoidance tactic could be to force fsck to take a holiday on next boot — eliminating both race condition risk and also the annoying "double delay" (completing both fsck and rootfs expansion, during the same boot). e.g. with crude hacks like
https://www.thegeekdiary.com/maintaining-linux-filesystems-using-fsck-and-tune2fs/ |
Just FYI others skip fsck, or (temporarily!?) disable fsck, as follows:
|
Not an endorsement of this kind of approach, but FYI/recap:
|
@jvonau's approach 2 years ago (looks reasonable) was to defer rootfs expansion until As confirmed by
CLARIF: the above |
Across our 4 mainline OS's, here is
|
FYI this PR's
|
Further history #723 |
TimeoutSec=0 would be a good addition for the revised unit file. |
The above might appear dangerous. But in the end this appears harmless — as systemd apparently ignores
According to: https://serverfault.com/questions/1062205/does-systemd-fail-if-a-dependency-in-after-requires-doesnt-exist/1096424#1096424 |
Done. I used the more modern syntax: (FWIW systemd is trying to transition towards 0 meaning 0, and infinity meaning infinity — ever since systemd 229 was release 6.5 years ago.) Documentation: |
@deldesir a great introduction to systemd unit files would appear to be: https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files |
More than a good addition! This looks like a real lifesaver! That was overlooked in the era of smaller drives / microSD cards / etc: 3798685 |
Here's a person whose 256GB IIAB microSD cards only show 15GB (in one case) and 60GB (in the other case) : It would appear he/she is facing the same set of #3325-like issues. |
Stretch Goal: Merge this PR this month (August 2022) if we find enough testing talent to validate this step forward! |
@pickypet please if you can help us test this in early September! A simple working smoke-test with Mint 21 or Ubuntu 22.04 will hopefully be sufficient — demonstrating that the rooffs is indeed expanding properly — when using a larger/conventional HDD. (Call me along the way where you need help, so this hopefully happens soon in early Sept, Thanks!) |
It didn't happen last month so let's make it happen this month (September 2022). And possibly evolve/improve on this PR if @jvonau discovers further refinements that are necessary — based on various "reproducers" he's now generating on different Raspberry Pi OS's (and different faster/slower Raspberry Pi hardware) here: |
https://linuxconfig.org/how-to-force-fsck-to-check-filesystem-after-system-reboot-on-linux
pid ordering looks sane. |
@jvonau the failure of Can you however test an actual rootfs expansion — on RasPiOS and any other OS? |
It's expected and why the service is conditional. No, what would that further prove? The question of the ordering is already answered recreating a 'dirty filesystem' and prolonging the fsck process would be the real acid test. |
Methodology: jvonau@kickass:/mnt/scratch/git/iiab$
jvonau@kickass:/mnt/scratch/git/iiab$ Results:
Order looks perfect... Next to recreate the dirty filesystem by yanking the power in the middle of the filesystem resizing. |
This will be one of the most important tests, Great 🏗️ |
-- Journal begins at Mon 2022-04-04 14:41:41 UTC, ends at Tue 2022-09-20 19:20:26 UTC. -- -- Boot 32eff329181c422eb8907161d6b77328 -- -- Boot bd5e13fd2a86499ea0594268c5fab443 -- -- Journal begins at Mon 2022-04-04 14:41:41 UTC, ends at Tue 2022-09-20 19:25:45 UTC. -- -- Boot 32eff329181c422eb8907161d6b77328 -- -- Boot bd5e13fd2a86499ea0594268c5fab443 -- |
Thanks @tim-moody for the excellent suggestion to reboot immediately — which is almost immediate (i.e. invisible) to the operator thankfully: |
Good progress during today's call (http://minutes.iiab.io). @jvonau recommends further improving the Sounds great. Hopefully that converges in coming days and this PR can be merged at that point. |
To gather as much info as needed could the exit codes from growpart and resize2fs be echoed back for some better data capture moving forward? Just having rc=$? after each would be enough to view the return code when using 'bash -x'
|
raspi-config method failure on current image noted at #3375 (comment). Think your call to raspi-config might be incomplete, I didn't see the correct init= line get written to /boot/cmdline.txt when I used the same syntax from the command line. |
Given upstream have updated the routine that is used during firstboot to support raspi-imager's seeding of system values (written to /boot/firstrun.sh ie countriy code, ssid/pw, ssh) perhaps it might be wise to have the prefabbed images behave in the same way? You gain the very valuable visual user facing feedback at the monitor that the filesystem is being tinkered with and should wait until complete while wanted customization is being preformed with the auto reboot at the end. You lose the ability to ssh in until after the auto reboot takes place if called as init= in /boot/cmdline.txt but I don't see a reason why calling firstboot in place of raspi-config from iiab-expand-rootfs would not work. I'm generally in favor of doing things the same way as upstream so the behavior is a consistent user experience whether it's a stock image or a modified one as people come to expect the same user experience. |
|
Is there evidence of a bug in Or enough to try to find reproducer pattern(s) ?! |
@jvonau should this PR be merged now — or do the |
From 9/26: @jvonau confirms that no additional changes are required to the After= and Before= lines. He also confirmed that the After= line is more important than the Before= line regarding avoiding the race condition. /cc @holta |
Future Work, if/as Raspberry Pi OS evolves, e.g. to allow visual indication of ongoing progress during rootfs expansion — and better blocking during boot process (to fully protect rootfs): |
To move the state-of-the-art forward, this is just a Straw Man PR to build on and evolve @jvonau and @tim-moody's strict ordering of resize service #2522 discussion 2 years ago and similar work:
The race condition between iiab-expand-rootfs.service and fsck would appear to be the most serious problem (rootfs can fail to fully expand: race condition betw iiab-expand-rootfs & systemd-fsck ? (journalctl confusingly/regularly portrays 2 simultaneous boots due to lack of RTC?) #3325).
Other resiliency improvements can certainly also be added where they prove effective.
Personally...I'm not at all in favor of
systemctl disable iiab-expand-rootfs.service
after rootfs has expanded (e.g. near the bottom of /usr/sbin/iiab-expand-rootfs).The reason is that grassroots communities should retain the full freedom to touch
/.expand-rootfs
at absolutely anytime — to expand their own regional/spontaneous/generative/remix IIAB disk images — even when (especially when) their community tooling (and skills!) remain comparatively primitive.