Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootfs can fail to fully expand: race condition betw iiab-expand-rootfs & systemd-fsck ? (journalctl confusingly/regularly portrays 2 simultaneous boots due to lack of RTC?) #3325

Open
holta opened this issue Jul 29, 2022 · 58 comments
Labels
Milestone

Comments

@holta
Copy link
Member

holta commented Jul 29, 2022

This happens to be on [64-bit] "RasPiOS with desktop" [on RPi 400] but it likely occurs on many/most if not all OS's [and hardware].

In the journalctl log below, you can see many examples of /usr/sbin/iiab-expand-rootfs being launched twice simultaneously. The output of these simultaneous / duplicate processes is made even more clear by 32-digit hex 'Boot' identifier numbers at the top of each chunk of output, interweaved together, evidenced by each of the 2 running processes having 100% identical timestamps.

Unhealthy "conjoined / siamese twins" in a sense.

The 1st example (at the top of the log below) being the most onerous, as contention between those 2 processes caused the crash below (Line 48 of 117, made clear by: /usr/sbin/iiab-expand-rootfs: line 59: 669 Segmentation fault resize2fs $ROOT_PART) — that prevented the drive from fully expanding to its actual physical capacity.

journalctl -u iiab-expand-rootfs confirms the problem

Click To Expand 'journalctl -u iiab-expand-rootfs' Log Output
root@box:~# journalctl -u iiab-expand-rootfs
-- Journal begins at Mon 2022-04-04 14:52:30 UTC, ends at Fri 2022-07-29 03:00:03 UTC. --
Jun 16 20:51:03 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 20:51:03 box iiab-expand-rootfs[434]: + '[' -f /.expand-rootfs ']'
Jun 16 20:51:03 box iiab-expand-rootfs[434]: + '[' -f /.resize-rootfs ']'
Jun 16 20:51:03 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jun 16 20:51:03 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot 84cc35b34186456ba2adf08f7b4d5399 --
Jun 16 20:52:17 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + '[' -f /.expand-rootfs ']'
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + echo '/usr/sbin/iiab-expand-rootfs: Expanding rootfs partition'
Jun 16 20:52:17 box iiab-expand-rootfs[453]: /usr/sbin/iiab-expand-rootfs: Expanding rootfs partition
Jun 16 20:52:17 box iiab-expand-rootfs[468]: ++ findmnt / -o SOURCE -n
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[505]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:52:17 box iiab-expand-rootfs[541]: ++ echo /dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[542]: ++ grep -o '[[:digit:]]*$'
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_PART_NUM=2
Jun 16 20:52:17 box iiab-expand-rootfs[551]: ++ cut -f 1 -d:
Jun 16 20:52:17 box iiab-expand-rootfs[550]: ++ tail -n 1
Jun 16 20:52:17 box iiab-expand-rootfs[549]: ++ parted /dev/mmcblk0 -ms unit s p
-- Boot 8ac77d02beb348ebbfb5fe4562f05b65 --
Jun 16 20:52:17 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + '[' -f /.expand-rootfs ']'
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + echo '/usr/sbin/iiab-expand-rootfs: Expanding rootfs partition'
Jun 16 20:52:17 box iiab-expand-rootfs[455]: /usr/sbin/iiab-expand-rootfs: Expanding rootfs partition
Jun 16 20:52:17 box iiab-expand-rootfs[473]: ++ findmnt / -o SOURCE -n
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[496]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[455]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:52:18 box iiab-expand-rootfs[522]: ++ grep -o '[[:digit:]]*$'
Jun 16 20:52:18 box iiab-expand-rootfs[521]: ++ echo /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[455]: + ROOT_PART_NUM=2
Jun 16 20:52:18 box iiab-expand-rootfs[535]: ++ parted /dev/mmcblk0 -ms unit s p
Jun 16 20:52:18 box iiab-expand-rootfs[537]: ++ cut -f 1 -d:
Jun 16 20:52:18 box iiab-expand-rootfs[536]: ++ tail -n 1
-- Boot 84cc35b34186456ba2adf08f7b4d5399 --
Jun 16 20:52:18 box iiab-expand-rootfs[453]: + LAST_PART_NUM=2
Jun 16 20:52:18 box iiab-expand-rootfs[453]: + '[' 2 -ne 2 ']'
Jun 16 20:52:18 box iiab-expand-rootfs[453]: + growpart /dev/mmcblk0 2
-- Boot 8ac77d02beb348ebbfb5fe4562f05b65 --
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + LAST_PART_NUM=2
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + '[' 2 -ne 2 ']'
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + growpart /dev/mmcblk0 2
Jun 16 20:52:19 box iiab-expand-rootfs[631]: NOCHANGE: partition 2 is size 999210975. it cannot be grown
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:19 box iiab-expand-rootfs[669]: resize2fs 1.46.2 (28-Feb-2021)
Jun 16 20:52:20 box iiab-expand-rootfs[455]: /usr/sbin/iiab-expand-rootfs: line 59:   669 Segmentation fault      resize2fs $ROOT_PART
Jun 16 20:52:20 box iiab-expand-rootfs[455]: + rm -f /.expand-rootfs /.resize-rootfs
Jun 16 20:52:20 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jun 16 20:52:20 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot 84cc35b34186456ba2adf08f7b4d5399 --
Jun 16 20:52:20 box iiab-expand-rootfs[632]: CHANGED: partition=2 start=532480 old: size=23317568 end=23850048 new: size=999210975 end=999743455
Jun 16 20:52:20 box iiab-expand-rootfs[453]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:20 box iiab-expand-rootfs[735]: resize2fs 1.46.2 (28-Feb-2021)
-- Boot 02c8554a46b84a88938c27716049dcbb --
Jun 16 21:15:17 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 21:15:17 box iiab-expand-rootfs[463]: + '[' -f /.expand-rootfs ']'
Jun 16 21:15:17 box iiab-expand-rootfs[463]: + '[' -f /.resize-rootfs ']'
Jun 16 21:15:17 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jun 16 21:15:17 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot ebc3de71ad43488cbea813b5409c4aff --
Jun 16 21:44:58 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 21:44:58 box iiab-expand-rootfs[434]: + '[' -f /.expand-rootfs ']'
Jun 16 21:44:58 box iiab-expand-rootfs[434]: + '[' -f /.resize-rootfs ']'
Jun 16 21:44:58 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jun 16 21:44:58 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot 565a5a9c7aae4b11b5f82343d3a62805 --
Jun 16 21:44:58 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 21:44:58 box iiab-expand-rootfs[454]: + '[' -f /.expand-rootfs ']'
Jun 16 21:44:58 box iiab-expand-rootfs[454]: + '[' -f /.resize-rootfs ']'
Jun 16 21:44:59 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jun 16 21:44:59 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot 9bd1b134a67845908742051d69d9474c --
Jul 27 23:50:01 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jul 27 23:50:01 box iiab-expand-rootfs[489]: + '[' -f /.expand-rootfs ']'
Jul 27 23:50:01 box iiab-expand-rootfs[489]: + '[' -f /.resize-rootfs ']'
Jul 27 23:50:01 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 27 23:50:01 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot ddb53ceab7eb4263a2846a92e7299da0 --
Jul 28 00:19:39 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jul 28 00:19:39 box iiab-expand-rootfs[445]: + '[' -f /.expand-rootfs ']'
Jul 28 00:19:39 box iiab-expand-rootfs[445]: + '[' -f /.resize-rootfs ']'
Jul 28 00:19:40 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 28 00:19:40 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot 63c5d799e67a4e21b21088803ac6b2b0 --
Jul 28 13:23:53 box systemd[1]: Starting Root Filesystem Auto-Expander...
-- Boot 99cdf14cc3da416c8671a878bbf9d129 --
Jul 28 13:23:53 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jul 28 13:23:53 box iiab-expand-rootfs[469]: + '[' -f /.expand-rootfs ']'
Jul 28 13:23:53 box iiab-expand-rootfs[469]: + '[' -f /.resize-rootfs ']'
-- Boot 63c5d799e67a4e21b21088803ac6b2b0 --
Jul 28 13:23:53 box iiab-expand-rootfs[464]: + '[' -f /.expand-rootfs ']'
Jul 28 13:23:53 box iiab-expand-rootfs[464]: + '[' -f /.resize-rootfs ']'
Jul 28 13:23:53 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 28 13:23:53 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot 99cdf14cc3da416c8671a878bbf9d129 --
Jul 28 13:23:53 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 28 13:23:53 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot ef13041c8a3c40a89e130149e8fc8fb1 --
Jul 28 13:48:50 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jul 28 13:48:50 box iiab-expand-rootfs[470]: + '[' -f /.expand-rootfs ']'
Jul 28 13:48:50 box iiab-expand-rootfs[470]: + '[' -f /.resize-rootfs ']'
Jul 28 13:48:51 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 28 13:48:51 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot bc1ac6387d584b468ded38c9aee772c5 --
Jul 28 21:51:36 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jul 28 21:51:36 box iiab-expand-rootfs[456]: + '[' -f /.expand-rootfs ']'
Jul 28 21:51:36 box iiab-expand-rootfs[456]: + '[' -f /.resize-rootfs ']'
Jul 28 21:51:36 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 28 21:51:36 box systemd[1]: Finished Root Filesystem Auto-Expander.
-- Boot 1e391b0fe8a74fa0bb73ab9be6f051fd --
Jul 28 22:04:19 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jul 28 22:04:19 box iiab-expand-rootfs[451]: + '[' -f /.expand-rootfs ']'
Jul 28 22:04:19 box iiab-expand-rootfs[451]: + '[' -f /.resize-rootfs ']'
Jul 28 22:04:19 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 28 22:04:19 box systemd[1]: Finished Root Filesystem Auto-Expander.

Remediation

  1. Reverting to raspi-config --expand-rootfs (with manual reboot, already in our code but commented out for now) [ might help (?) as part of a short-term fix on RasPiOS — e.g. if it's smart enough to use some kind of lock file arrangement to avoid running 2 processes simultaneously: ]

    # raspi-config --expand-rootfs # REQUIRES A REBOOT

  2. But it's hopefully also possible to fix iiab-expand-rootfs.service across all OS's — to put an end to this erroneous and quite common simultaneous-double-launching of /usr/sbin/iiab-expand-rootfs ?

Related:

@holta holta added the bug label Jul 29, 2022
@holta holta added this to the 8.0 milestone Jul 29, 2022
@tim-moody
Copy link
Contributor

So to summarize, because one line of the script logs twice, you leapt to the conclusion that the script ran twice, ignoring the fact that the other three lines, including start and end messages only appear once. As a result you made changes that broke it and now think that because it's broken it should be eliminated altogether. As further evidence that it ran twice you took the fact that for one user it did nothing, not the same thing twice, without any idea as to why that user had a problem. Is that pretty much it?

@holta
Copy link
Member Author

holta commented Jul 29, 2022

Read the log and you will see that it started 2 simultaneous copies of /usr/sbin/iiab-expand-rootfs 2 times out of 10.

So it's almost definitely a timing issue during boot. Systemd should be celebrated for helping us to clearly see this in its more modern logging system.

If you're accusing others of breaking things without evidence, that's not helpful.

@holta
Copy link
Member Author

holta commented Jul 29, 2022

it started 2 simultaneous copies of /usr/sbin/iiab-expand-rootfs 2 times out of 10.

Clarification for people who don't read the logs:

The log shows that double-launching of iiab-expand-rootfs happens repeatedly — both during initial boot (when trying to expand the root filesystem) and also on subsequent boot(s) as well.

@tim-moody
Copy link
Contributor

Please cite the actual lines which you take to be evidence that it ran twice.

Does 'Starting Root Filesystem Auto-Expander' appear twice?

where does the log come from, what OS?

@holta
Copy link
Member Author

holta commented Jul 29, 2022

where does the log come from, what OS?

Please do read this ticket (the top, especially). To restate it: this occurred on 64-bit "RasPiOS with desktop" on RPi 400.

@tim-moody
Copy link
Contributor

Please cite the actual lines which you take to be evidence that it ran twice.

@tim-moody
Copy link
Contributor

Jul 28 22:04:19 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jul 28 22:04:19 box iiab-expand-rootfs[451]: + '[' -f /.expand-rootfs ']'
Jul 28 22:04:19 box iiab-expand-rootfs[451]: + '[' -f /.resize-rootfs ']'
Jul 28 22:04:19 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jul 28 22:04:19 box systemd[1]: Finished Root Filesystem Auto-Expander.

@holta
Copy link
Member Author

holta commented Jul 29, 2022

Please cite the actual lines which you take to be evidence that it ran twice.

Please analyze the log more patiently:

  • Lines 7-55 show the 1st example of duplicate invocation of iiab-expand-rootfs
  • Lines 86-99 show another example

As explained at the top of this ticket, the interweaved / interleaved chunks (i.e. sections, each with a Boot identifier 32-digit hex number) make clear how the 2 processes are happening simultaneously.

@tim-moody
Copy link
Contributor

If I run journalctl -u iiab-expand-rootfs I don't see the -- Boot 84cc35b34186456ba2adf08f7b4d5399 -- line, did you add that, is it a fresh boot? what does journalctl -b -u iiab-expand-rootfs show?

@holta
Copy link
Member Author

holta commented Jul 29, 2022

If I run journalctl -u iiab-expand-rootfs I don't see the -- Boot 84cc35b34186456ba2adf08f7b4d5399 -- line, did you add that,

No, I assume Systemd adds those into journalctl logging.

what does journalctl -b -u iiab-expand-rootfs show?

Indeed the -b flag gets rid of the interweaving / interleaving.

Making these 2 invocations (that have identical timestamps) more readable:

root@box:~# journalctl -b 84cc35b34186456ba2adf08f7b4d5399 -u iiab-expand-rootfs
-- Journal begins at Mon 2022-04-04 14:52:30 UTC, ends at Fri 2022-07-29 12:57:08 UTC. --
Jun 16 20:52:17 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + '[' -f /.expand-rootfs ']'
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + echo '/usr/sbin/iiab-expand-rootfs: Expanding rootfs partition'
Jun 16 20:52:17 box iiab-expand-rootfs[453]: /usr/sbin/iiab-expand-rootfs: Expanding rootfs partition
Jun 16 20:52:17 box iiab-expand-rootfs[468]: ++ findmnt / -o SOURCE -n
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[505]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:52:17 box iiab-expand-rootfs[541]: ++ echo /dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[542]: ++ grep -o '[[:digit:]]*$'
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_PART_NUM=2
Jun 16 20:52:17 box iiab-expand-rootfs[551]: ++ cut -f 1 -d:
Jun 16 20:52:17 box iiab-expand-rootfs[550]: ++ tail -n 1
Jun 16 20:52:17 box iiab-expand-rootfs[549]: ++ parted /dev/mmcblk0 -ms unit s p
Jun 16 20:52:18 box iiab-expand-rootfs[453]: + LAST_PART_NUM=2
Jun 16 20:52:18 box iiab-expand-rootfs[453]: + '[' 2 -ne 2 ']'
Jun 16 20:52:18 box iiab-expand-rootfs[453]: + growpart /dev/mmcblk0 2
Jun 16 20:52:20 box iiab-expand-rootfs[632]: CHANGED: partition=2 start=532480 old: size=23317568 end=23850048 new: size=999210975 end=999743455
Jun 16 20:52:20 box iiab-expand-rootfs[453]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:20 box iiab-expand-rootfs[735]: resize2fs 1.46.2 (28-Feb-2021)
root@box:~# journalctl -b 8ac77d02beb348ebbfb5fe4562f05b65 -u iiab-expand-rootfs
-- Journal begins at Mon 2022-04-04 14:52:30 UTC, ends at Fri 2022-07-29 12:57:08 UTC. --
Jun 16 20:52:17 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + '[' -f /.expand-rootfs ']'
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + echo '/usr/sbin/iiab-expand-rootfs: Expanding rootfs partition'
Jun 16 20:52:17 box iiab-expand-rootfs[455]: /usr/sbin/iiab-expand-rootfs: Expanding rootfs partition
Jun 16 20:52:17 box iiab-expand-rootfs[473]: ++ findmnt / -o SOURCE -n
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[496]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[455]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:52:18 box iiab-expand-rootfs[522]: ++ grep -o '[[:digit:]]*$'
Jun 16 20:52:18 box iiab-expand-rootfs[521]: ++ echo /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[455]: + ROOT_PART_NUM=2
Jun 16 20:52:18 box iiab-expand-rootfs[535]: ++ parted /dev/mmcblk0 -ms unit s p
Jun 16 20:52:18 box iiab-expand-rootfs[537]: ++ cut -f 1 -d:
Jun 16 20:52:18 box iiab-expand-rootfs[536]: ++ tail -n 1
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + LAST_PART_NUM=2
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + '[' 2 -ne 2 ']'
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + growpart /dev/mmcblk0 2
Jun 16 20:52:19 box iiab-expand-rootfs[631]: NOCHANGE: partition 2 is size 999210975. it cannot be grown
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:19 box iiab-expand-rootfs[669]: resize2fs 1.46.2 (28-Feb-2021)
Jun 16 20:52:20 box iiab-expand-rootfs[455]: /usr/sbin/iiab-expand-rootfs: line 59:   669 Segmentation fault      resize2fs $ROOT_PART
Jun 16 20:52:20 box iiab-expand-rootfs[455]: + rm -f /.expand-rootfs /.resize-rootfs
Jun 16 20:52:20 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jun 16 20:52:20 box systemd[1]: Finished Root Filesystem Auto-Expander.

@tim-moody
Copy link
Contributor

so these are two separate boots.

Root Filesystem Auto-Expander didn't finish on the first one (? manual reboot) and so ran again on the second one and succeeded.

@tim-moody
Copy link
Contributor

OK. Now I see /usr/sbin/iiab-expand-rootfs: line 59: 669 Segmentation fault resize2fs $ROOT_PART

@tim-moody
Copy link
Contributor

that could be run manually to test version

@tim-moody
Copy link
Contributor

but the segfault coudl also be from corrupt file system from previous reboot

@holta
Copy link
Member Author

holta commented Jul 29, 2022

I don't know enough about Systemd to understand why 1 of the 2 "simultaneous boots" (specifically "84cc35b34186456ba2adf08f7b4d5399") does not appear here:

root@box:~# journalctl --list-boots
-12 fd13670672ed4eb1ad9711a25c622cae Mon 2022-04-04 14:52:30 UTC—Mon 2022-04-04 14:52:47 UTC
-11 051858ef96664f8b8fdfff4325e993af Mon 2022-04-04 14:52:47 UTC—Thu 2022-06-16 19:37:21 UTC
-10 549ffca864984728b6a2a83b4da065cd Thu 2022-06-16 19:37:21 UTC—Thu 2022-06-16 20:51:00 UTC
 -9 a06560a4507447f094afd6ff382231be Thu 2022-06-16 20:51:00 UTC—Thu 2022-06-16 20:52:14 UTC
 -8 8ac77d02beb348ebbfb5fe4562f05b65 Thu 2022-06-16 20:52:14 UTC—Thu 2022-06-16 21:15:14 UTC
 -7 02c8554a46b84a88938c27716049dcbb Thu 2022-06-16 21:15:14 UTC—Thu 2022-06-16 21:44:55 UTC
 -6 565a5a9c7aae4b11b5f82343d3a62805 Thu 2022-06-16 21:44:55 UTC—Thu 2022-06-16 22:00:03 UTC
 -5 ebc3de71ad43488cbea813b5409c4aff Wed 2022-07-27 23:32:46 UTC—Wed 2022-07-27 23:49:57 UTC
 -4 9bd1b134a67845908742051d69d9474c Wed 2022-07-27 23:49:57 UTC—Thu 2022-07-28 00:19:36 UTC
 -3 99cdf14cc3da416c8671a878bbf9d129 Thu 2022-07-28 00:19:36 UTC—Thu 2022-07-28 13:48:47 UTC
 -2 ef13041c8a3c40a89e130149e8fc8fb1 Thu 2022-07-28 13:48:47 UTC—Thu 2022-07-28 21:51:33 UTC
 -1 bc1ac6387d584b468ded38c9aee772c5 Thu 2022-07-28 21:51:33 UTC—Thu 2022-07-28 22:04:16 UTC
  0 1e391b0fe8a74fa0bb73ab9be6f051fd Thu 2022-07-28 22:04:16 UTC—Fri 2022-07-29 12:40:02 UTC

@tim-moody
Copy link
Contributor

? never completed

@tim-moody
Copy link
Contributor

https://patchwork.ozlabs.org/project/linux-ext4/patch/20160810200113.GA10523@thunk.org/

this is old, but could be relevant

need to try to reproduce the resize2fs segmentation fault

@holta
Copy link
Member Author

holta commented Jul 29, 2022

? never completed

Possible.

These 3+3 pastes might shed more light on these "simultaneous boots" hopefully:

root@box:~# journalctl -b 84cc35b34186456ba2adf08f7b4d5399 | head -2
-- Journal begins at Mon 2022-04-04 14:52:30 UTC, ends at Fri 2022-07-29 13:35:30 UTC. --
Jun 16 20:52:13 box kernel: Booting Linux on physical CPU 0x0000000000 [0x410fd083]

root@box:~# journalctl -b 84cc35b34186456ba2adf08f7b4d5399 | grep -i 'ext4\|mmcblk'
Jun 16 20:52:13 box kernel: Kernel command line: coherent_pool=1M 8250.nr_uarts=0 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 video=HDMI-A-1:1920x1080M@60 smsc95xx.macaddr=DC:A6:32:D8:7F:ED vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=ttyS0,115200 console=tty1 root=PARTUUID=f91b8a44-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles
Jun 16 20:52:13 box kernel: mmcblk0: mmc0:aaaa SN512 477 GiB
Jun 16 20:52:13 box kernel:  mmcblk0: p1 p2
Jun 16 20:52:13 box kernel: mmcblk0: mmc0:aaaa SN512 477 GiB
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Jun 16 20:52:13 box kernel: VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): re-mounted. Opts: (null). Quota mode: none.
Jun 16 20:52:15 box systemd-fsck[287]: /dev/mmcblk0p1: 293 files, 61656/516190 clusters
Jun 16 20:52:17 box systemd[1]: Started Periodic ext4 Online Metadata Check for All Filesystems.
Jun 16 20:52:17 box systemd[1]: Starting Remove Stale Online ext4 Metadata Check Snapshots...
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[505]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[453]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:52:17 box iiab-expand-rootfs[541]: ++ echo /dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[549]: ++ parted /dev/mmcblk0 -ms unit s p
Jun 16 20:52:17 box systemd[1]: Finished Remove Stale Online ext4 Metadata Check Snapshots.
Jun 16 20:52:18 box iiab-expand-rootfs[453]: + growpart /dev/mmcblk0 2
Jun 16 20:52:20 box iiab-expand-rootfs[453]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:20 box kernel: EXT4-fs (mmcblk0p2): resizing filesystem from 2914696 to 124901371 blocks
Jun 16 20:52:31 box kernel: EXT4-fs (mmcblk0p2): resized to 6291456 blocks
Jun 16 20:52:41 box kernel: EXT4-fs (mmcblk0p2): resized to 11010048 blocks
Jun 16 20:52:53 box kernel: EXT4-fs (mmcblk0p2): resized to 16252928 blocks

root@box:~# journalctl -b 84cc35b34186456ba2adf08f7b4d5399 | tail -40
Jun 16 20:52:45 box systemd[1]: kalite-serve.service: Can't open PID file /library/ka-lite/kalite.pid (yet?) after start: Operation not permitted
Jun 16 20:52:45 box systemd[1]: Started Provides the kalite Server.
Jun 16 20:52:45 box lightdm[1408]: Error getting user list from org.freedesktop.Accounts: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Accounts was not provided by any .service files
Jun 16 20:52:45 box lightdm[1408]: pam_unix(lightdm-autologin:session): session opened for user iiab-admin(uid=1000) by (uid=0)
Jun 16 20:52:45 box systemd-logind[503]: New session 3 of user iiab-admin.
Jun 16 20:52:45 box systemd[1]: Started Session 3 of user iiab-admin.
Jun 16 20:52:45 box systemd[1]: systemd-fsckd.service: Succeeded.
Jun 16 20:52:47 box rtkit-daemon[1363]: Supervising 5 threads of 3 processes of 1 users.
Jun 16 20:52:47 box rtkit-daemon[1363]: Successfully made thread 1450 of process 1356 owned by '1000' RT at priority 5.
Jun 16 20:52:47 box rtkit-daemon[1363]: Supervising 6 threads of 3 processes of 1 users.
Jun 16 20:52:48 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:48 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:48 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:48 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:48 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:48 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:48 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:48 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:48 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:48 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:48 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:48 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:49 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:49 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:49 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:49 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:49 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:49 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:49 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:49 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:49 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:49 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:49 box kernel: hdmi-audio-codec hdmi-audio-codec.5.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
Jun 16 20:52:49 box kernel:  MAI: soc_pcm_open() failed (-19)
Jun 16 20:52:49 box bluetoothd[1028]: Endpoint registered: sender=:1.31 path=/MediaEndpoint/A2DPSink/sbc
Jun 16 20:52:49 box bluetoothd[1028]: Endpoint registered: sender=:1.31 path=/MediaEndpoint/A2DPSource/sbc
Jun 16 20:52:49 box kernel: Bluetooth: RFCOMM TTY layer initialized
Jun 16 20:52:49 box kernel: Bluetooth: RFCOMM socket layer initialized
Jun 16 20:52:49 box kernel: Bluetooth: RFCOMM ver 1.11
Jun 16 20:52:53 box kernel: EXT4-fs (mmcblk0p2): resized to 16252928 blocks
root@box:~# journalctl -b 8ac77d02beb348ebbfb5fe4562f05b65 | head -2
-- Journal begins at Mon 2022-04-04 14:52:30 UTC, ends at Fri 2022-07-29 13:35:30 UTC. --
Jun 16 20:52:13 box kernel: Booting Linux on physical CPU 0x0000000000 [0x410fd083]

root@box:~# journalctl -b 8ac77d02beb348ebbfb5fe4562f05b65 | grep -i 'ext4\|mmcblk'
Jun 16 20:52:13 box kernel: Kernel command line: coherent_pool=1M 8250.nr_uarts=0 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 video=HDMI-A-1:1920x1080M@60 smsc95xx.macaddr=DC:A6:32:D8:7F:ED vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=ttyS0,115200 console=tty1 root=PARTUUID=f91b8a44-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles
Jun 16 20:52:13 box kernel: mmcblk0: mmc0:aaaa SN512 477 GiB
Jun 16 20:52:13 box kernel:  mmcblk0: p1 p2
Jun 16 20:52:13 box kernel: mmcblk0: mmc0:aaaa SN512 477 GiB
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): write access will be enabled during recovery
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): orphan cleanup on readonly fs
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): 5 orphan inodes deleted
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): recovery complete
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Jun 16 20:52:13 box kernel: VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): re-mounted. Opts: (null). Quota mode: none.
Jun 16 20:52:16 box systemd-fsck[287]: /dev/mmcblk0p1: 293 files, 61656/516190 clusters
Jun 16 20:52:17 box systemd[1]: Started Periodic ext4 Online Metadata Check for All Filesystems.
Jun 16 20:52:17 box systemd[1]: Starting Remove Stale Online ext4 Metadata Check Snapshots...
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[496]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[455]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:52:18 box iiab-expand-rootfs[521]: ++ echo /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[535]: ++ parted /dev/mmcblk0 -ms unit s p
Jun 16 20:52:18 box systemd[1]: Finished Remove Stale Online ext4 Metadata Check Snapshots.
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + growpart /dev/mmcblk0 2
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:19 box kernel: EXT4-fs (mmcblk0p2): resizing filesystem from 15728640 to 124901371 blocks
Jun 16 20:52:19 box kernel: kernel BUG at fs/ext4/resize.c:528!
Jun 16 20:52:20 box kernel: pc : ext4_flex_group_add+0x9a0/0x1828
Jun 16 20:52:20 box kernel: lr : ext4_flex_group_add+0x98/0x1828
Jun 16 20:52:20 box kernel:  ext4_flex_group_add+0x9a0/0x1828
Jun 16 20:52:20 box kernel:  ext4_resize_fs+0xa98/0x1108
Jun 16 20:52:20 box kernel:  __ext4_ioctl+0x11dc/0x17b0
Jun 16 20:52:20 box kernel:  ext4_ioctl+0x40/0x68
Jun 16 21:15:04 box systemd[1]: Stopped Periodic ext4 Online Metadata Check for All Filesystems.

root@box:~# journalctl -b 8ac77d02beb348ebbfb5fe4562f05b65 | tail -40
Jun 16 21:15:13 box systemd[1]: systemd-modules-load.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Load Kernel Modules.
Jun 16 21:15:13 box systemd[1]: Stopping Network Time Synchronization...
Jun 16 21:15:13 box systemd[1]: Stopping Update UTMP about System Boot/Shutdown...
Jun 16 21:15:13 box systemd[1]: fake-hwclock.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Restore / save the current clock.
Jun 16 21:15:13 box systemd[1]: systemd-timesyncd.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Network Time Synchronization.
Jun 16 21:15:13 box systemd[1]: systemd-update-utmp.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Update UTMP about System Boot/Shutdown.
Jun 16 21:15:13 box systemd[1]: systemd-tmpfiles-setup.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Create Volatile Files and Directories.
Jun 16 21:15:13 box systemd[1]: Stopped target Local File Systems.
Jun 16 21:15:13 box systemd[1]: Unmounting /boot...
Jun 16 21:15:13 box systemd[1]: boot.mount: Succeeded.
Jun 16 21:15:13 box systemd[1]: Unmounted /boot.
Jun 16 21:15:13 box systemd[1]: Reached target Unmount All Filesystems.
Jun 16 21:15:13 box systemd[1]: systemd-fsck@dev-disk-by\x2dpartuuid-f91b8a44\x2d01.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped File System Check on /dev/disk/by-partuuid/f91b8a44-01.
Jun 16 21:15:13 box systemd[1]: Removed slice system-systemd\x2dfsck.slice.
Jun 16 21:15:13 box systemd[1]: Stopped target Local File Systems (Pre).
Jun 16 21:15:13 box systemd[1]: systemd-tmpfiles-setup-dev.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Create Static Device Nodes in /dev.
Jun 16 21:15:13 box systemd[1]: systemd-sysusers.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Create System Users.
Jun 16 21:15:13 box systemd[1]: systemd-remount-fs.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped Remount Root and Kernel File Systems.
Jun 16 21:15:13 box systemd[1]: systemd-fsck-root.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Stopped File System Check on Root Device.
Jun 16 21:15:13 box systemd[1]: Reached target Shutdown.
Jun 16 21:15:13 box systemd[1]: Reached target Final Step.
Jun 16 21:15:13 box systemd[1]: systemd-reboot.service: Succeeded.
Jun 16 21:15:13 box systemd[1]: Finished Reboot.
Jun 16 21:15:13 box systemd[1]: Reached target Reboot.
Jun 16 21:15:13 box systemd[1]: Shutting down.
Jun 16 21:15:13 box systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog
Jun 16 21:15:13 box systemd[1]: Failed to set timeout to 600s: Invalid argument
Jun 16 21:15:13 box systemd-shutdown[1]: Syncing filesystems and block devices.
Jun 16 21:15:14 box systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Jun 16 21:15:14 box systemd-journald[144]: Journal stopped

@tim-moody tim-moody changed the title iiab-expand-rootfs.service often starts 2 simultaneous copies of /usr/sbin/iiab-expand-rootfs, causing it to crash iiab-expand-rootfs.service often starts 2 simultaneous copies of /usr/sbin/iiab-expand-rootfs by booting twice simultaneously, causing it to crash Jul 29, 2022
@jvonau
Copy link
Contributor

jvonau commented Jul 29, 2022

to 124901371 blocks
resized to 16252928 blocks

Didn't finish, rebooted/powered off too soon.

@jvonau
Copy link
Contributor

jvonau commented Jul 29, 2022

resizing filesystem from 15728640 to 124901371 blocks

@tim-moody
Copy link
Contributor

there's still the segfault on the second boot

@holta
Copy link
Member Author

holta commented Jul 29, 2022

Potentially more revealing details:

root@box:~# grep -in 'ext4\|mmcblk' /var/log/messages.1 | wc -l
92
root@box:~# grep -in 'ext4\|mmcblk' /var/log/messages.1 | pastebinit -b sprunge.us
http://sprunge.us/EEpAjD

http://sprunge.us/EEpAjD?en

root@box:~# grep -in 'ext4\|mmcblk' /var/log/syslog.1 | wc -l
149
root@box:~# grep -in 'ext4\|mmcblk' /var/log/syslog.1 | pastebinit -b sprunge.us
http://sprunge.us/9Au8Yx

http://sprunge.us/9Au8Yx?en

@jvonau
Copy link
Contributor

jvonau commented Jul 29, 2022

How about pastebin the entire journal?

@jvonau
Copy link
Contributor

jvonau commented Jul 29, 2022

Jun 16 19:37:24 raspberrypi kernel: [ 1.586697] mmcblk0: mmc0:aaaa SC128 119 GiB
Jun 16 21:15:17 box kernel: [ 1.675755] mmcblk0: mmc0:aaaa SN512 477 GiB

Looks like 2 different SDcards were involved.

@jvonau
Copy link
Contributor

jvonau commented Jul 29, 2022

Think there is a race condition between systemd-fsck and iiab-expand-rootfs

Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Jun 16 20:52:13 box kernel: VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
Jun 16 20:52:13 box kernel: EXT4-fs (mmcblk0p2): re-mounted. Opts: (null). Quota mode: none.
Jun 16 20:52:16 box systemd-fsck[287]: /dev/mmcblk0p1: 293 files, 61656/516190 clusters
Jun 16 20:52:17 box systemd[1]: Started Periodic ext4 Online Metadata Check for All Filesystems.
Jun 16 20:52:17 box systemd[1]: Starting Remove Stale Online ext4 Metadata Check Snapshots...
Jun 16 20:52:17 box iiab-expand-rootfs[455]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:52:17 box iiab-expand-rootfs[496]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[455]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:52:18 box iiab-expand-rootfs[521]: ++ echo /dev/mmcblk0p2
Jun 16 20:52:18 box iiab-expand-rootfs[535]: ++ parted /dev/mmcblk0 -ms unit s p
Jun 16 20:52:18 box systemd[1]: Finished Remove Stale Online ext4 Metadata Check Snapshots.
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + growpart /dev/mmcblk0 2
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:19 box kernel: EXT4-fs (mmcblk0p2): resizing filesystem from 15728640 to 124901371 blocks
Jun 16 20:52:19 box kernel: kernel BUG at fs/ext4/resize.c:528!
Jun 16 20:52:20 box kernel: pc : ext4_flex_group_add+0x9a0/0x1828
Jun 16 20:52:20 box kernel: lr : ext4_flex_group_add+0x98/0x1828
Jun 16 20:52:20 box kernel:  ext4_flex_group_add+0x9a0/0x1828
Jun 16 20:52:20 box kernel:  ext4_resize_fs+0xa98/0x1108
Jun 16 20:52:20 box kernel:  __ext4_ioctl+0x11dc/0x17b0
Jun 16 20:52:20 box kernel:  ext4_ioctl+0x40/0x68
Jun 16 21:15:04 box systemd[1]: Stopped Periodic ext4 Online Metadata Check for All Filesystems.

@jvonau
Copy link
Contributor

jvonau commented Jul 29, 2022

Might want to look at using 'After=' in the systemd unit file, for better staging of events.

@tim-moody
Copy link
Contributor

what is

Jun 16 20:52:19 box kernel: kernel BUG at fs/ext4/resize.c:528!

reminds of the segfault

@jvonau
Copy link
Contributor

jvonau commented Aug 2, 2022

PS Now that several people are in fact installing IIAB on 1TB microSD cards, these kinds of people need to be aware that /usr/bin/iiab-expand-rootfs can take 8 minutes (or more?) to do its job, on 1st boot.

(Non-intuitive to people who don't realize what's happening, and could easily get frustrated by a very tedious 10+ minute boot sequence!)

How about a pop up a warning that the filesystem is being altered and not to reboot until complete? Tie the resize warning into the netwarn routine staged to display before the iiab-network question. Kind of makes the ssh version of netwarn a priority now. Still need to fix the staging to prevent the running of the resize routine before the fsck service is completed.

I've given more time to this project than I wished to in the past with the warnings unheeded, now the chickens are hatched and come home to roost. Good luck with whatever solution you choose for the home-brew'd images that are produced with the presently employed build routine.

@holta
Copy link
Member Author

holta commented Aug 2, 2022

need to fix the staging to prevent the running of the resize routine before the fsck service is completed

  1. Agreed.

  2. Today I got permission from the owner of the machine to keep it online for a few more days, just in case there's further need to probe the logs of this live test case machine (512GB microSD card in a Raspberry Pi 400) — hoping we can learn from that.

  3. A UX to educate operators of every different kind (in every different country) will never be easy, naturally, pop-ups and/or command-line notification reminders! Information Overload (TMI, banner blindness) are very real issues, which is what makes UX design so hard, during boot-time especially. Still, educating implementers/operators is necessary: so if there are very clean, low-maintenance and non-complex ways of doing this (for headless IIAB's as well, to keep Linux know-it-alls from shooting themselves in the foot too often!) then yes these should definitely be investigated. ✔️

@jvonau
Copy link
Contributor

jvonau commented Aug 8, 2022

The opening comment clearly demonstrates that the resize operation can be launched before systemd-fsck-root.service, that in itself was the red flag warning that was glossed over becoming the primary cause of this issue. Bit of fiddling around, discovery pointed to a better approach back then, but in the face of resistance put forth why bother life is too short for the adversarial environment present. I figured it was just a matter of time before somebody ran across this known weakness to support my point of view, kind of sorry others have had their data put at risk using the service. Just remember I can't be the one that the finger points at, no commit access here.

@jvonau
Copy link
Contributor

jvonau commented Aug 11, 2022

Jun 16 20:52:19 box iiab-expand-rootfs[631]: NOCHANGE: partition 2 is size 999210975. it cannot be grown
Jun 16 20:52:19 box iiab-expand-rootfs[455]: + resize2fs /dev/mmcblk0p2
Jun 16 20:52:19 box iiab-expand-rootfs[669]: resize2fs 1.46.2 (28-Feb-2021)
Jun 16 20:52:20 box iiab-expand-rootfs[455]: /usr/sbin/iiab-expand-rootfs: line 59: 669 Segmentation fault resize2fs $ROOT_PART
Jun 16 20:52:20 box iiab-expand-rootfs[455]: + rm -f /.expand-rootfs /.resize-rootfs

To prevent the erroneous removal of /.expand-rootfs when there was a problem running the script perhaps bash should be called with -e to exit on first error for an "anti-snowballing" effect. #3339 (comment)

@jvonau
Copy link
Contributor

jvonau commented Sep 13, 2022

Used http://download.iiab.io/8.0/Old/iiab-8.0preview2-220616-MEDICAL-raspios32-lite-g2e2824c3.img.zip dated 2022-Jun-16 21:52
written to SDCard booted directly on a RPi0-W
AP came up out of the box, ssh'd in over WiFi, ran some diagnostics, sudo journalctl -b 0 -u iiab-expand-rootfs > ~/W-boot0.txt
retrieved the file from an internet connected machine and to the WiFi LAN of the RPi0-W interesting results returned:

cat W-boot0.txt 
-- Journal begins at Mon 2022-04-04 12:05:58 UTC, ends at Tue 2022-09-13 12:47:16 UTC. --
Jun 16 20:54:07 box systemd[1]: Starting Root Filesystem Auto-Expander...
Jun 16 20:54:08 box iiab-expand-rootfs[286]: + '[' -f /.expand-rootfs ']'
Jun 16 20:54:08 box iiab-expand-rootfs[286]: + echo '/usr/sbin/iiab-expand-rootfs: Expanding rootfs partition'
Jun 16 20:54:08 box iiab-expand-rootfs[286]: /usr/sbin/iiab-expand-rootfs: Expanding rootfs partition
Jun 16 20:54:08 box iiab-expand-rootfs[305]: ++ findmnt / -o SOURCE -n
Jun 16 20:54:08 box iiab-expand-rootfs[286]: + ROOT_PART=/dev/mmcblk0p2
Jun 16 20:54:08 box iiab-expand-rootfs[331]: ++ lsblk -no pkname /dev/mmcblk0p2
Jun 16 20:54:09 box iiab-expand-rootfs[286]: + ROOT_DEV=/dev/mmcblk0
Jun 16 20:54:09 box iiab-expand-rootfs[348]: ++ echo /dev/mmcblk0p2
Jun 16 20:54:09 box iiab-expand-rootfs[349]: ++ grep -o '[[:digit:]]*$'
Jun 16 20:54:09 box iiab-expand-rootfs[286]: + ROOT_PART_NUM=2
Jun 16 20:54:09 box iiab-expand-rootfs[362]: ++ parted /dev/mmcblk0 -ms unit s p
Jun 16 20:54:09 box iiab-expand-rootfs[365]: ++ tail -n 1
Jun 16 20:54:09 box iiab-expand-rootfs[366]: ++ cut -f 1 -d:
Jun 16 20:54:23 box iiab-expand-rootfs[286]: + LAST_PART_NUM=2
Jun 16 20:54:23 box iiab-expand-rootfs[286]: + '[' 2 -ne 2 ']'
Jun 16 20:54:23 box iiab-expand-rootfs[286]: + growpart /dev/mmcblk0 2
Jun 16 20:54:24 box iiab-expand-rootfs[472]: /usr/bin/growpart: 522: arithmetic expression: expecting primary: "5*1000*1000+15*1000+32+"
Jun 16 20:54:24 box iiab-expand-rootfs[286]: + resize2fs /dev/mmcblk0p2
Jun 16 20:54:24 box iiab-expand-rootfs[484]: resize2fs 1.46.2 (28-Feb-2021)
Jun 16 20:54:25 box iiab-expand-rootfs[484]: The filesystem is already 1275303 (4k) blocks long.  Nothing to do!
Jun 16 20:54:25 box iiab-expand-rootfs[286]: + rm -f /.expand-rootfs /.resize-rootfs
Jun 16 20:54:25 box systemd[1]: iiab-expand-rootfs.service: Succeeded.
Jun 16 20:54:25 box systemd[1]: Finished Root Filesystem Auto-Expander.

The growpart line looks a bit strange, should just be the partition number and was not resized.

@jvonau
Copy link
Contributor

jvonau commented Sep 13, 2022

https://bugs.launchpad.net/cloud-utils/+bug/1881014 my guess is the patch mentioned has not made its way upstream to Debian yet.
Edit: Yup see change log below

@holta
Copy link
Member Author

holta commented Sep 13, 2022

https://bugs.launchpad.net/cloud-utils/+bug/1881014 my guess is the patch mentioned has not made its way upstream to Debian yet.

That's really great progress in helping everyone understand these frustrating failures!

Can a more recent OS (possibly Ubuntu 22.10 daily build) be used to help confirm?

Or 22.04.1 on RPi if that's faster to test?

@holta
Copy link
Member Author

holta commented Sep 13, 2022

I don't know if Ubuntu 22.10 daily build kinetic-live-server-arm64.iso is actually meant to run on RPi?

(Or is this a more generalized ARM build...that's possibly too painful to install on RPi prior to its 2022-10-20 release date?)

@jvonau
Copy link
Contributor

jvonau commented Sep 13, 2022

Ubuntu is distributing the patched version, it's fixed over there. The issue is what uname -r returns for the RasPiOS kernels, some kernels the routine works because there are no non-arithmetic trailing characters present within the output. Would be better to know what is returned for the various kernels once booted.

@holta
Copy link
Member Author

holta commented Sep 13, 2022

Ubuntu is distributing the patched version, it's fixed over there.

Indeed 👍

Testing a recent Ubuntu-on-RPi could be extremely useful confirmation that both (1) intermittent failures and (2) deterministic failures are...both...increasingly tamed (with a few prayers included!)

@jvonau
Copy link
Contributor

jvonau commented Sep 13, 2022

Known slowness on W. Pic of boot failure due to /boot not being clean, with that partition being a fat filesystem this could be caused by not 'ejecting' the media before removal on any given OS. TODO - check the version of cloud-guest-utils being used on RasPiOS, the patch mentioned above might be already included but is not enough to cover the /boot/kernel.img display of uname but otherwise works with the rest of the RasPiOS's kernels (kernel7.img kernel7l.img kernel8.img)

@jvonau
Copy link
Contributor

jvonau commented Sep 14, 2022

Info from booted W

iiab-admin@box:~ $ apt list cloud-guest-utils
Listing... Done
cloud-guest-utils/stable,now 0.31-2 all [installed]

iiab-admin@box:~ $ uname -r
5.15.32+

iiab-admin@box:~ $ ls /lib/modules
5.15.32+  5.15.32-v7+  5.15.32-v7l+  5.15.32-v8+

Looks like that patch went upstream as canonical/cloud-utils#2 but doesn't really cover X.X.X+ as the error above notes

"510001000+15*1000+32+"

Edit: Debian is lagging in updating and is behind upstream's version

apt changelog cloud-guest-utils
cloud-utils (0.31-2) unstable; urgency=medium

  * Add support for IMDSv2 in Amazon EC2. (Closes: #952563)
  * Bump debhelper-compat to version 12
  * Bump standards compliance to 4.5.0 (no changes needed)
  * Add noahm@debian.org to Uploaders

 -- Noah Meyerhans <noahm@debian.org>  Thu, 28 May 2020 15:11:57 -0700

@holta
Copy link
Member Author

holta commented Sep 14, 2022

Debian is lagging in updating and is behind upstream's version

I'll assume this same lag also applies to the very latest 64-bit Raspberry Pi OS from 2022-09-06.

i.e. it too has not yet included the https://bugs.launchpad.net/cloud-utils/+bug/1881014 patch.

(But if for some obscure this is wrong, please someone clarify if poss!)

@jvonau
Copy link
Contributor

jvonau commented Sep 14, 2022

Debian is lagging in updating and is behind upstream's version

I'll assume this same lag also applies to the very latest 64-bit Raspberry Pi OS from 2022-09-06.

sudo apt list cloud-guest-utils is your friend and can answer that question for you but from the looks of http://raspbian.raspberrypi.org/raspbian/pool/main/c/cloud-utils/ I'd say the newer version needed should be present now.

@holta
Copy link
Member Author

holta commented Sep 15, 2022

Even the very latest (from 2022-09-06) 64-bit "Raspberry Pi OS with desktop" unfortunately shows:

root@box:~# apt list cloud-guest-utils
Listing... Done
cloud-guest-utils/stable,stable,now 0.31-2 all [installed]

@holta
Copy link
Member Author

holta commented Sep 15, 2022

FWIW Debian 11.5 on x86_64 (also unfortunately) has the older version:

root@deb11-srv:~# apt list cloud-guest-utils
Listing... Done
cloud-guest-utils/stable 0.31-2 all

More positively Debian 12 (Bookworm) that many Debian community people are already using (prior to its release around mid-2023) is all set — as confirmed by:

https://packages.debian.org/bookworm/cloud-guest-utils

@holta
Copy link
Member Author

holta commented Sep 26, 2022

Now that PR #3337 is merged (thanks to testing work by @jvonau and @tim-moody) this ticket should be closed too in coming weeks — once reliable rootfs expansion is validated by others too.

(Preferably on both RasPiOS and Ubuntu!)

@holta
Copy link
Member Author

holta commented Dec 22, 2022

Appears solved:

Am moving this ticket to the IIAB 8.1 Milestone so these issues continue to be monitored / confirmed.

@holta holta modified the milestones: 8.0, 8.1 Dec 22, 2022
@holta holta modified the milestones: 8.1, 8.2 Jan 1, 2023
@holta holta modified the milestones: 8.2, 8.3 Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants