Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trim/discard after reboot on cf-card leads to failure #2357

Closed
c-schwamborn opened this issue Apr 13, 2018 · 16 comments
Closed

trim/discard after reboot on cf-card leads to failure #2357

c-schwamborn opened this issue Apr 13, 2018 · 16 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@c-schwamborn
Copy link

I just tested a fresh install (v.18.1) via nano bsd i386 image on an pcengines alix system with cf-card. The first boot looks fine, but after the first reboot the system doesn't boot anymore. Investigating the issue, the last console entry revealed, that a trim command was issued to the filesystem, leading to a disconnect of the cf-card:
clean, 7046762 free (1226 frags, 880692 blocks, 0.0% fragmentation) (ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00 (ada0:ata0:0:0:0): CAM status: Command timeout (ada0:ata0:0:0:0): Retrying command ada0 at ata0 bus 0 scbus0 target 0 lun 0 ada0: <SanDisk SDCFHSNJC-008G HDX 7.08> s/n BMZ080115030144 detached g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=2932998144, length=8192)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=2990456832, length=8192)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=3047964672, length=8192)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=3278061568, length=3072)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=5865824256, length=8192)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=5865832448, length=8192)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=8192, length=2048)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=32768, length=8192)]error = 6 g_vfs_done():ufs/OPNsense_Nano[READ(offset=9027584, length=4096)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=1897799680, length=8192)]error = 6 g_vfs_done():ufs/OPNsense_Nano[WRITE(offset=2875449344, length=8192)]error = 6 (ada0:ata0:0:0:0): Periph destroyed /usr/local/etc/rc: /usr/local/etc/rc.subr.d/recover: Device not configured /usr/local/etc/rc: /etc/rc.d/syscons: Device not configured /usr/local/etc/rc: /usr/local/etc/rc.importer: Device not configured /usr/local/etc/rc: /sbin/consco: not found vm_fault: pager read error, pid 1 (init) vnode_pager_generic_getpages_done: I/O read error 5
The last two lines repeat indefinitely.
After successfully booting the system in save mode, I disabled trim with tunefs for the root filesystem and after that, the system booted normal.
Maybe the resizing to the medium size leads to an enabling of the filesystems trim/discard option.

cheers
Christian

@fichtner
Copy link
Member

fichtner commented Apr 13, 2018

Hi Christian,

Can you try to add "# notrim" to the root disk entry in /etc/fstab prior to the first reboot?

Cheers,
Franco

@fichtner fichtner added the support Community support label Apr 13, 2018
@c-schwamborn
Copy link
Author

ufs doesn't seem to know the option 'notrim'
Trying to mount root from ufs:/dev/ufs/OPNsense_Nano [rw,notrim]... Mounting from ufs:/dev/ufs/OPNsense_Nano failed with error 22: mount option <notrim> is unknown.

@fichtner
Copy link
Member

no, at the end with a leading "#", it's a comment evaluated in our code:

https://github.com/opnsense/core/blob/master/src/etc/rc#L81-L84

The trouble is that the driver seems to say trim is ok, when it in fact may be not :/

@c-schwamborn
Copy link
Author

sorry, thought of a common fs option. Isn't that a bit unconventional ?
Sadly it didn't work:

Mounting filesystems...
tunefs: soft updates remains unchanged as enabled
tunefs: file system reloaded
tunefs: issue TRIM to the disk cleared
tunefs: file system reloaded
** /dev/ufs/OPNsense_Nano
FILE SYSTEM CLEAN; SKIPPING CHECKS
clean, 3158490 free (1234 frags, 394657 blocks, 0.0% fragmentation)
** /dev/ufs/OPNsense_Nano
FILE SYSTEM CLEAN; SKIPPING CHECKS
clean, 3158490 free (1234 frags, 394657 blocks, 0.0% fragmentation)
uhub0: 4 ports with 4 removable, self powered
(ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
(ada0:ata0:0:0:0): CAM status: Command timeout
(ada0:ata0:0:0:0): Retrying command
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <SanDisk SDCFH-004G HDX 5.11> s/n CLZ101210182346 detached

Hmm, during the first boot I found in fact tunefs: issue TRIM to the disk set

@fichtner
Copy link
Member

fichtner commented Apr 13, 2018 via email

@c-schwamborn
Copy link
Author

Stupid question: Why is trim/discard enabled on those images anyways? I thought they where only for legacy devices like these alix boards. To my knowledge it's not possible to use discard or trim on cf- or sd-cards, or do you intend these images for some sort of vm usage on ssd's or thin lvm?

@c-schwamborn
Copy link
Author

Nothing about enabling TRIM is conventional. :)

don't tell me about that, I read the ssd feature exclude list in the linux kernel code

Let me give you an image for that later today.

You don't have to create an image just for me, do it only if it helps to fix the issue for future images.
By the way: This must be a new issue, a friend of mine installed an alix system earlier this Year with the previeus release (17.x) and that worked well (except the fact that he could only update from the command line).

@fabianfrz
Copy link
Member

I use an APU 1 board with a SD card without issues with trim. Maybe the problem is vendor specific.

@fichtner
Copy link
Member

Stupid question: Why is trim/discard enabled on those images anyways? I thought they where only for legacy devices like these alix boards. To my knowledge it's not possible to use discard or trim on cf- or sd-cards, or do you intend these images for some sort of vm usage on ssd's or thin lvm?

It's a good question. I think this was done unilaterally on all images while we were on FreeBSD 10 almost exactly 2 years back (44b610a), but FreeBSD 11 changed the game somehow, causing these reboot-fail issues for very specific cases. @fabianfrz is right, I also have APUs/ALIX/WRAP and never seen the issue, otherwise I would have been able to inspect it.

We've tried almost everything here to help since early 2017 (FreeBSD 11 with OPNsense 17.1) but nothing seemed to help. It could be that this was missed all along. At least I know we never looked for this specifically.

Here is an image based on 18.1.6. It's quite important to try it, because we are about to release images and if this helps I'm willing to add it last minute:

https://pkg.opnsense.org/FreeBSD:11:i386/snapshots/OPNsense-18.1.6-notrim-OpenSSL-nano-i386.img.bz2

Thank you,
Franco

@c-schwamborn
Copy link
Author

loading the image now.

to be specific on the hardware: I tested this on three different alix.2d13 boards with two cf-cards both sandisk ultra (a 4GB and a 8GB card).

My guess is that not trim on principal is the issue here, but the underlying drive deciding which device is capable to perform trim/discards. I think sd-cards are similar to usb sticks where as cf-cards are connected to the old ide/ata interface, at least on those old alix boards.

@c-schwamborn
Copy link
Author

Good news: That worked.
I think the line you where looking for is tunefs: issue TRIM to the disk remains unchanged as disabled.

What did you change, if I may ask? I've seen the rootfs in /etc/fstab now contains # notrim at the end by default. Is that static, or somehow dynamically added?

@fichtner
Copy link
Member

fichtner commented Apr 14, 2018 via email

@c-schwamborn
Copy link
Author

Yes, sounds good. For the time being this might be the safest way to go.
I agree that most probable something has been changed or may be broken in FreeBSD 11. Enabling trim for a sd-card cannot be correct.
The trim feature is in my opinion insignificant for a firewall anyways, as the write-load on the device should be minimal.

@fichtner fichtner self-assigned this Apr 14, 2018
@fichtner fichtner added bug Production bug and removed support Community support labels Apr 14, 2018
@fichtner fichtner added this to the 18.7 milestone Apr 14, 2018
fichtner added a commit to opnsense/tools that referenced this issue Apr 14, 2018
@fichtner
Copy link
Member

Okay, commit via opnsense/tools@2fb26bf

Again, thanks for the help!

Cheers,
Franco

@c-schwamborn
Copy link
Author

Thank you, for your quick response.

Cheers,
Christian

@Tupsi
Copy link

Tupsi commented Mar 22, 2020

I am seeing this again after upgraded to 20.1.3. In fact I have trouble upgrading to it completely it seems. I installed from usb stick a view days ago and upgraded to 20.1.2 on the gui without any incidence, but now I am seeing strange effects after I did an upgrade to 20.1.3 and the console is throwing alot of ata trim errors (for the first time here). I tried adding the # no trim option as described to my fstab to no avail.

My changes in settings are not persistent through reboots any longer (like enabling ssh). Its gone after a reboot.
Just realised this is a closed ticket, so I better open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

4 participants