Skip to content
Iron Oxidizer edited this page Apr 2, 2022 · 20 revisions

Tips for setting up your Linux system running Moonfire NVR.

Configuring filesystems

A typical Moonfire NVR setup stores its SQLite3 database on the root filesystem on some kind of flash device and sample files on dedicated hard drives.

Root filesystem on flash

Moonfire NVR has no special requirements for the root filesystem. Your goal should be to avoid filesystem corruption without causing too many flash write cycles. If possible, choose forgiving hardware. Quoting the hardware recommendations:

microSD cards have a reputation for wearing out. Prefer M.2 NVMe SSDs, M.2/2.5" SATA SSDs, eMMC, or "high-endurance" microSD cards, in that order.

If you're stuck with a generic microSD card, you'll need to be especially vigilant about write cycles.

  • disable swap. swapon --show shouldn't show a swap partition or swap file. On Debian-based systems, you can disable a swapfile with sudo systemctl disable dphys-swapfile && sudo systemctl stop dphys-swapfile. If you're short on RAM, consider zram-config instead.

  • ensure filesystem journalling is enabled. It usually is, but some cheap SBCs disable it to reduce write cycles. Unfortunately, this comes at the expense of a near certainty of eventually corrupting the filesystem. There are wiser ways to reduce write cycles!

    If using ext4, find the device name of your root filesystem via df, then run sudo tune2fs -l /path/to/device to ensure it says Filesystem features: journal. If not, enable it with sudo tune2fs -j /path/to/device and reboot.

  • decide where to keep the systemd log (confusingly also called a "journal"). On flash is best for debugging. On RAM is best for reducing write cycles. Less common setups include on a hard disk (shared with a disk you use for Moonfire NVR's sample data), over the network, or a hybrid setup where logs are copied from RAM to permanent storage hourly or daily.

    • in RAM: add Storage=volatile to /etc/systemd/journald.conf and reboot. See the journald.conf manpage, for more information.
    • on flash: this is the default on most systems. journald.conf defaults to auto, which logs to disk if /var/log/journal exists, which it usually does.
    • on disk:
      • symlink /var/log/journal to a subdirectory of your hard disk mount point.
      • use systemctl edit systemd-journald.service to make journald depend on your mount point by adding text like the following:
        [Unit]
        RequireMountsFor=/path/to/mount
        
      • set a SystemMaxUse= in your /etc/journald.conf and account for its space when configuring Moonfire NVR's retention.
      • remove nofail from the /etc/fstab line for your disk if present.
    • in RAM, copied to flash or disk: see log2ram.
    • over the network: see this digitalocean.com guide.
  • look at the filesystem options for / via fgrep ' / ' /proc/mounts. There are several options to reduce write cycles:

    • relatime or noatime: see man 8 mount.
    • lazytime: see man 8 mount.
    • commit=300: wait up to 5 minutes for some writes. The default is 5 seconds on ext4. See the manual page for your filesystem: man 5 ext4 or man 5 btfs.
    • data=writeback: see man 5 ext4. This can't be set in /etc/fstab.
    • journal_async_commit: see man 5 ext4. This can't be set in /etc/fstab.

    The options which can't be set in /etc/fstab have to be passed to the kernel at boot time via its rootflags= commandline option. Trying to set them in /etc/fstab will make your system boot with a read-only filesystem. You'll have to undo this with sudo mount -o remount,rw /path/to/device / and remove them. Some other options (at least noatime) can't be set via rootflags= or your system won't boot at all until you remove them.

  • consider increasing the system-wide (not per-filesystem) time and memory thresholds for forcing dirty pages to be flushed to disk. You can do this by adding lines such as the following to /etc/sysctl.conf:

    vm.dirty_writeback_centisecs=30000
    vm.dirty_background_ratio=50
    vm.dirty_ratio=80
    

    These take effect on boot or immediately after running sudo sysctl -p. This lonesysadmin.net article describes these in detail.

  • consider using a IO tracing tool to find recurring writes. Eg. sudo btrace -a write /dev/sdb will list all write operations on /dev/sdb as they happen. (On Debian-based systems, install it first with sudo apt install blktrace.) These tools aren't user-friendly but they can tell you exactly what's going on.

  • consider using F2FS as it is designed from the ground up to increase flash endurance and performance.

Sample directory filesystems on hard disks

Moonfire NVR writes to these filesystems continuously, keeping them nearly full with large files. In general, write cycles are not a concern, although the same options described above can also be beneficial if the disks get too busy. (You can tell this via the %util column in iostat -x.)

  • Filesystem choice: avoid copy-on-write filesystems like btrfs in favor of old-fashioned ext4. Copy-on-write filesystems can have strange behaviors when nearly full. E.g. deletes can fail due to insufficient disk space.
  • Consider using mkfs.ext4 -T largefile to create the filesystem. The -T largefile will give you about 1.5% more disk space. The cost is that you won't be able to fill the filesystem with tiny files, but this isn't a problem with Moonfire NVR.
  • Consider reducing the superuser reserve from its default 5%. This reserve was created to allow the superuser to log in and recover from a full disk. This isn't relevant for a filesystem dedicated to an application. I don't recommend a 0% reserve because you may start to have excess fragmentation, but even 1% is likely fine on ext4. You can lower the reserve to 1% when creating the filesystem via mkfs.ext4 -m 1 or after the fact via tunefs -m 1.

You might also add nofail or noauto to /etc/fstab so that if the HDD is disconnected, broken, or corrupted, the system will still boot. (Caveat: due to a bug, systemd versions prior to 250 will still go into emergency mode if the HDD is corrupted.) This allows you to ssh into the system to examine it, rather than needing a console. Make sure that the systemd service which starts Moonfire NVR has a RequiresMountsFor=/path/to/mount line for the HDD. (If you are using Docker, use sudo systemctl edit docker.service to add this via a drop-in snippet.)

Disable UAS

Many Moonfire NVR setups run on cheap SBCs which have no built-in SATA; thus they use USB bridges to attach hard drives. Unfortunately cheap USB bridges often have a serious problem when running in "UAS" mode on Linux. This can cause filesystem corruption, as mentioned in the hardware recommendations and troubleshooting guide. (I'm unsure if the problem is with the devices' firmware, Linux's UAS implementation, or some combination.)

I recommend proactively disabling UAS when setting up a USB bridge, rather than waiting to see if you get filesystem corruption. UAS is often recommended because it improves I/O performance with SATA SSDs, but Moonfire NVR's performance will be perfectly fine without it.

First, check if you're using UAS. In the output below, Driver=uas tells us we have a dangerous configuration.

$ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M

Then, use the lsbusb command without -t to get the vendor and product ID (174c:55aa in the output below):

$ lsusb
Bus 002 Device 002: ID 174c:55aa ASMedia Technology Inc. Name: ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
...

Next, create a file which sets the quirks:VID:PID:u option (where VID:PID match the line above) on the usb-storage module, telling it to disable UAS:

$ sudo sh -c 'echo "options usb-storage quirks=174c:55aa:u" >> /etc/modprobe.d/blacklist_uas.conf'

If your system uses an initial ramfs, you'll need to regenerate it to include this new configuration. On Ubuntu systems:

$ sudo update-initramfs -u

(Raspberry Pi OS doesn't appear to use an initial ram fs. There, the command above will do nothing. That's fine.)

Then, reboot:

$ sudo shutdown -r now

Most importantly, confirm your system no longer uses UAS. lsusb -t should now say Driver=usb-storage rather than Driver=uas, and you should see messages like the following in journalctl:

$ journalctl -b0
Jul 12 07:51:36 nuc kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Jul 12 07:51:36 nuc kernel: usb 2-1: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00
Jul 12 07:51:36 nuc kernel: usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
Jul 12 07:51:36 nuc kernel: usb 2-1: Product: ASM1156-PM
Jul 12 07:51:36 nuc kernel: usb 2-1: Manufacturer: ASMT
Jul 12 07:51:36 nuc kernel: usb 2-1: SerialNumber: 00000000000000000000
...
Jul 12 07:51:36 nuc kernel: usb 2-1: UAS is ignored for this device, using usb-storage instead

You can find more information on this stackoverflow.com question.

Realtime clock on Raspberry Pi

Most Linux systems have a built-in battery-backed "realtime clock" (RTC). Since Linux 2.6, the kernel sets the system time on startup from this clock. This means that the time seen by applications will always be approximately correct, even after a long power outage. Later in the boot process a time syncer application (eg systemd-timesyncd, chrony, or ntpd) fetches exact correct time over the network. Often, the correction will be mild and done by "slewing" (adjusting the clock rate) rather than "stepping" (changing the current time).

The Raspberry Pi doesn't come with a RTC. Instead, /etc/init.d/fake-hwclock saves the current time to a file (hourly via /etc/cron.hourly/fake-hwclock and on shutdown via /lib/systemd/system/fake-hwclock.service) and loads it back on startup (via /lib/systemd/system/fake-hwclock.service again). This is better than thinking the time is January 1st 1970 on startup but can produce confusing results with Moonfire NVR. After a two-hour power outage, the time may be three hours behind on startup. When the time syncer fetches the exact time, the system time will be stepped, but likely too late. Moonfire NVR will have already started recording streams and noted their starting timestamp from an incorrect clock. Streams will keep having an incorrect time until you manually restart Moonfire NVR.

To fix this, you can buy an add-on RTC for your Pi such as this DS3231 module. It connects to your Pi's I2C bus via the GPIO header. Setting it up can be a little tricky. There are various blog entries describing how to do it. As of 2021-07-13, I find this following approach works on the Raspberry Pi OS 64-bit beta (based on Debian 10 Buster):

Create /etc/addon-hwclock

#!/bin/bash
# Use add-on hardware RTC module. See /etc/systemd/system/addon-hwclock.service.

# Be verbose so failures can be examined with `journalctl --unit addon-hwclock`.
set -o errexit
set -o xtrace

if [[ "$1" = "start" ]]; then
    # Load the kernel modules needed by the RTC.
    # udevd might do some/all of this later, but we want it now (before root fsck).
    for i in i2c_bcm2835 rtc_ds1307; do /sbin/modprobe $i; done

    # Potentially useful debugging commands. Uncomment to taste.
    # /usr/bin/dtc --in-format fs /proc/device-tree
    # /usr/sbin/i2cdetect -y 1

    echo ds3231 0x68 > /sys/class/i2c-adapter/i2c-1/new_device

    # Debugging, again.
    # /usr/bin/dtc --in-format fs /proc/device-tree
    # /usr/sbin/i2cdetect -y 1

    /sbin/hwclock --hctosys --utc --verbose

elif [[ "$1" = "stop" ]]; then
    /sbin/hwclock --systohc --utc --verbose
fi

Create /etc/systemd/system/addon-hwclock.service

[Unit]
Description=Add-on hardware RTC module
DefaultDependencies=no

# Run after fake-hwclock so it doesn't override the time set from the real hwclock.
After=fake-hwclock.service

# Filesystems record their last mount time; fsck complains if it's in the future.
# Run before the first fsck to avoid this. Earlier in the boot process is better anyway.
Before=sysinit.target systemd-fsck-root.service time-set.target
Conflicts=shutdown.target

# Note because of the WantedBy=sysinit.target, the system will boot into
# emergency mode on failure. Be a little defensive with the ConditionFileIsExecutable
# to avoid that annoying failure mode.
ConditionFileIsExecutable=/etc/addon-hwclock

[Service]
Type=oneshot
RemainAfterExit=yes

ExecStart=/etc/addon-hwclock start
ExecStop=/etc/addon-hwclock stop

[Install]
WantedBy=sysinit.target

Activate them

sudo chmod a+rx /etc/addon-hwclock
sudo systemctl enable addon-hwclock.service

That covers the stuff that should happen on each boot. Before rebooting, you'll also want to run a variation of those commands to set the hardware RTC properly the first time:

sudo sh -c 'for i in i2c_bcm2835 rtc_ds1307; do /sbin/modprobe $i; done'
sudo sh -c 'echo ds1307 0x68 > /sys/class/i2c-adapter/i2c-1/new_device'
sudo /sbin/hwclock --systohc --utc --verbose

Then reboot and check sudo journalctl -b0 to ensure it works. You should see a line that says Time read from Hardware Clock before the line that says Starting File System Check on Root Device.

Why not other ways?

  • This thepihut.com guide suggests setting the RTC from /etc/rc.local. But that happens later in the boot process, as defined by /lib/systemd/system/rc-local.service. We want the time to be set correctly before the root filesystem's fsck, and certainly before Moonfire NVR starts.
  • This learn.adafruit.com guide suggests:
    • setting dtoverlay=i2c-rtc,ds3231 in /boot/config.txt (see raspberrypi.org devicetree documentation and kernel.org devicetree documentation) rather than running an echo ds3231 0x68 > /sys/class/i2c-adapter/i2c-1/new_device command. This didn't make /dev/rtc0 appear after reboot. I later found it works if I also add device_tree_param=i2c1=on as in this raspmer.blogspot.com post. Later still, I realized that if I'd followed the raspi-config instructions earlier in the learn.adafruit.com guide, an equivalent dt_param=i2c_arm=on likely would have been added for me. Note the sudo i2cdetect -y 1 still won't work unless you load the i2c-dev module (eg via sudo modprobe i2c-dev or by adding to /etc/modules).
    • depending on /lib/udev/hwclock-set. It's started by /lib/systemd/system/system-udevd.service (which eventually launches it via /lib/udevd/rules.d/85-hwclock.rules). systemd-udevd.service doesn't start until after the initial fsck, and as far as I can tell nothing waits for its rules to complete before completing system startup.
    • modifying /lib/udev/hwclock-set. This is file is part of a system package, so upgrades may clobber your local changes.
    • disabling fake-hwclock.service: the fake hwclock is better than nothing, so I leave it in case something goes wrong with the hardware module setup. But the downside of leaving fake-hwclock.service in place is that journalctl log timestamps are wrong for the first several lines of the boot, so you might instead run systemctl disable fake-hwclock.service and remove the After=fake-hwclock.service from /etc/systemd/system/addon-hwclock.service.
  • This askubuntu.com answer sets up a systemd service:
    • It omits the Before=systemd-fsck-root.service so doesn't affect time during the initial fsck.
    • It races with fake-hwclock.service so its time may be overridden.
    • It depends on systemd-modules-load.service (and thus probably expects a /etc/modules-load.d/ file listing the relevant modules).
    • It creates a file in /lib/systemd when man systemd.unit says locally-installed files should go in /etc/systemd instead.
    • I'm unsure how /dev/rtc0 is created on there; I don't see anything which creates a device tree mapping.