-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filesystem corruption #107
Comments
Unless the error was
Wow, that is slow! But if it's still writing, I'd expect it to eventually complete (given unlimited patience...). But it might not be until your I haven't tried a vacuum on a Pi4 with a database as large as that. I'll try one shortly. The step that it's doing is a SQLite3 vacuum. It's not strictly necessary; you can use the My primary setup right now is an Intel NUC, with a 5.1G database (vs your 975M). A vacuum there takes about 10 minutes. Admittedly, it's a much faster machine with a Samsung M.2 SSD rather than an SD card.
btw, in looking into this I discovered a minor bug. My code tries to set the // WAL is the preferred journal mode for normal operation; it reduces the number of syncs
// without compromising safety.
set_journal_mode(&conn, "wal")?;
if !args.no_vacuum {
info!("...vacuuming database after upgrade.");
conn.execute_batch(r#"
pragma page_size = 16384;
vacuum;
"#)?;
} but the SQLite3 docs for pragma page_size say this:
I'd intended for that |
What does
|
Actually, my Pi4 does have a comparably sized database, and it's much faster there. Best case: you're using the old SQLite version which is why it's slow for you. Worst case: your microSD card is failing. :-/
|
* give a rule of thumb for update time in the documentation * log the SQLite3 version, which can affect performance * do the vacuum in non-WAL mode, to correctly set the page size and to avoid very slow behavior on older SQLite3 versions. Larger page sizes are generally faster (including subsequent vacuum operations). This won't help much for the first vacuum after this change, but it will help afterward. * likewise, set the page size properly on "moonfire-nvr init".
Haven't fully digested or considered Scott's comments above. But I left my process running and inserted a "date" command to log when the process finished. Conclusion: ~18 hours. Here's the console session started shortly after 12 noon on Wednesday, Feb 10th:
The other console that was monitoring the file sizes of the databases:
I then performed a check:
The results are for 6 months available at: https://pastebin.com/A3wu0Mkz Lastly, I tried to start up again and failed:
I'll review/consider Scott's comments during the day and may not get back until this evening; I have to answer to a higher call. |
That's "read-only filesystem", as you can see on errno(3). (I'm preparing a commit now to improve error messages. These cryptic errors are getting old.) I think if you take a look in your kernel logs ( |
Yup. dmesg -T is showing all kinds of problems. I was going to try to post the colorized version; it's pretty bad. I tried to reboot and no luck. I'll have to wait until this evening to debug and determine where things are. I guess I'm the problem child. But there may be others... someday. Edit [2/11/2021 19:46 PST]: colorized dmesg log at: http://salem1.mooo.com/moonfire/jlpoole_raspberry_pi_dmesg_Feb_11_2021.html |
Inspired by the poor error message here: #107 (comment) * print the friendlier Display version of the error rather than Debug. Eg, "EROFS: Read-only filesystem" rather than "Sys(EROFS)". Do this everywhere: on command exit, on syncer retries, and on stream retries. * print the most immediate problem and additional lines for each cause. * print the backtrace or an advertisement for RUST_BACKTRACE=1 if it's unavailable. * also mention RUST_BACKTRACE=1 in the troubleshooting guide. * add context in various places, including pathnames. There are surely many places more it'd be helpful, but this is a start. * allow subcommands to return failure without an Error. In particular, "moonfire-nvr check" does its own error printing because it wants to print all the errors it finds. Printing "see earlier errors" with a meaningless stack trace seems like it'd just confuse. But I also want to get rid of the misleading "Success" at the end and 0 return to the OS.
No worries about being the problem child. Your reports are really helpful. I guess SD card failure is pretty common in Raspberry Pi setups; I'll update the hardware recommendations to suggest using the Samsung Endurance microSD card or putting a SATA SSD in a USB dock alongside the hard drive. Raspberry Pi 4 supports USB booting now. With my latest commit, that error message is more informative:
|
This has me thinking. Two thing on my mind: 1) I recall specifically a vendor limiting their warranty on hard drives if used for video -- hence an acknowledgement that video storage really taxes the use of the heads and 2) SSDs are not good for repeated read/writes... they have a life of 100,00-300,00 if I recall. When I was investigating my SNUC for my xen server, I had a short dialog with their engineers about SSDs and concluded that SSDs are not good for hard-hitting read/writes which may exceed their design, e.g. two years worth of intensive read/writes. I do think my throwing a worst case at the set-up and software does help identify issues to consider. Stress testing. |
I took a stab at updating the hardware recommendations to account for flash endurance (and strengthen the existing language about HDD endurance). Tweaks welcome. fwiw, it's write cycles that matter, and Moonfire NVR tries to be as gentle as it can by only committing to the database once every few minutes (with the recommended |
Also, the write cycle count is per block. With write leveling, it's effectively the number of times you can rewrite the entire drive. Write amplification (extra data written because of journaling, minimum size of a write, etc.) eats into that, but it's not like x write cycles means that the device can only handle x database transactions before it fails. My understanding is that a modern, high-quality SSD will typically last for 10+ years with loads much heavier than what Moonfire NVR puts on it. This article seems consistent with that. I'm less confident in budget microSD cards, and they have no SMART monitoring to see when you're approaching the limit. I shouldn't have used a random Raspberry Pi starter kit microSD card in an example setup. Sorry! I'm also going to add a short section to the troubleshooting guide about storage device and filesystem problems. |
For recovering from corruption, as in #107. These should aid in restoring database integrity without throwing away the entire database. I only added the conditions that came up in #107, so far. * "Missing ... row" => --trash-orphan-sample-files * "Recording ... missing file" => --delete-orphan-rows * "bad video_index" => --trash-corrupt-rows
I think I've done all I can about this:
When you get working hardware and a clean filesystem, you can either start from a fresh database or try running I've done the equivalent of these flags by hand when I had my own filesystem corruption; now it's a bit more approachable. |
We had a local ice storm and power went down; of course, my raspberry pi is not on a power management so it would nicely shutdown. That gives opportunity for a corrupted database. I copied your command line from above and got this error message. I built this version of moonfire a couple of days ago after wrestling with installing rustc and yarn:
|
That output isn't from the latest version; maybe the version in your |
You are correct. I was in the wrong directory /usr/local/src/moonfire-nvr/target/release/ from the June 4, 2020, build, and the June 4 build was deployed to /usr/local/bin.
What tripped me up was not specifying the local file with a prefix "./" as you suggested, and the build target is in a different place: /usr/local/src/moonfire-nvr/server/target/release. I did not expect there to be two release directories after I brought current my git clone, so I wandered down the more familiar path of /usr/local/src/moonfire-nvr/target. I upgraded the database from V5 to V6, then performed the cleansing operation you suggested. 634 rows and files were deleted. I started up the Feb 11 2021 build and the service is now running. Thank you. |
Ahh, yeah, after I reorganized in dd66c7b, the top-level |
I had my moonfire-nvr running along for months with an 8 TB cap on four video feeds. On a raspberry Pi4. When I halted it and tried to start up, the process would commence and then die. I therefore tried "moonfire-nvr check" to see if that would rectify any database issue and there were several lines printed out. I then tried "moonfire-nvr upgrade" to see if that might fix the database issues.
I commenced the upgrade near 1:00 p.m. and it is over 4 hours later. In a separate console, I've been monitoring the database files and see that they are being updated.
Here's the console where I launched the update:
root@raspberrypi:/home/jlpoole# moonfire-nvr upgrade
I0210 122631.030 main moonfire_db::upgrade] Upgrading database from version 5 to version 5...
I0210 122631.051 main moonfire_db::upgrade] ...database now in journal_mode delete (requested delete).
I0210 122631.066 main moonfire_db::upgrade] ...database now in journal_mode wal (requested wal).
I0210 122631.066 main moonfire_db::upgrade] ...vacuuming database after upgrade.
Here's the other console showing listing of the the SQLite database files:
jlpoole@raspberrypi:~ $ sudo ls -la /var/lib/moonfire-nvr/db
total 1155968
drwx------ 2 moonfire-nvr moonfire-nvr 4096 Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4096 Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 1022054400 Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 327680 Feb 10 14:25 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 161314512 Feb 10 14:25 db-wal
jlpoole@raspberrypi:~ $ sudo ls -lah /var/lib/moonfire-nvr/db
total 1.2G
drwx------ 2 moonfire-nvr moonfire-nvr 4.0K Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4.0K Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 975M Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 320K Feb 10 14:25 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 155M Feb 10 14:25 db-wal
jlpoole@raspberrypi:~ $ sudo ls -lah /var/lib/moonfire-nvr/db
total 1.2G
drwx------ 2 moonfire-nvr moonfire-nvr 4.0K Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4.0K Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 975M Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 320K Feb 10 14:26 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 155M Feb 10 14:26 db-wal
jlpoole@raspberrypi:~ $ sudo ls -lah /var/lib/moonfire-nvr/db
[sudo] password for jlpoole:
total 1.2G
drwx------ 2 moonfire-nvr moonfire-nvr 4.0K Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4.0K Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 975M Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 352K Feb 10 14:47 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 166M Feb 10 14:47 db-wal
jlpoole@raspberrypi:~ $ sudo ls -lah /var/lib/moonfire-nvr/db
total 1.2G
drwx------ 2 moonfire-nvr moonfire-nvr 4.0K Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4.0K Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 975M Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 352K Feb 10 14:59 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 172M Feb 10 14:59 db-wal
jlpoole@raspberrypi:~ $ ps -efww |grep moonfire
root 18553 18539 0 12:26 pts/3 00:00:38 moonfire-nvr upgrade
jlpoole 18896 18772 0 15:01 pts/5 00:00:00 grep --color=auto moonfire
jlpoole@raspberrypi:~ $ sudo ls -lah /var/lib/moonfire-nvr/db
[sudo] password for jlpoole:
total 1.2G
drwx------ 2 moonfire-nvr moonfire-nvr 4.0K Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4.0K Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 975M Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 384K Feb 10 15:26 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 182M Feb 10 15:26 db-wal
jlpoole@raspberrypi:~ $ sudo ls -lah /var/lib/moonfire-nvr/db
[sudo] password for jlpoole:
total 1.2G
drwx------ 2 moonfire-nvr moonfire-nvr 4.0K Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4.0K Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 975M Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 416K Feb 10 17:38 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 210M Feb 10 17:38 db-wal
jlpoole@raspberrypi:~ $ sudo ls -lah /var/lib/moonfire-nvr/db
total 1.2G
drwx------ 2 moonfire-nvr moonfire-nvr 4.0K Feb 10 12:26 .
drwxr-xr-x 5 moonfire-nvr moonfire-nvr 4.0K Jul 10 2020 ..
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 975M Feb 10 12:26 db
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 416K Feb 10 17:39 db-shm
-rw-r--r-- 1 moonfire-nvr moonfire-nvr 210M Feb 10 17:38 db-wal
jlpoole@raspberrypi:~ $
Question: for 8 TBs worth of video (1920x1200), should a database upgrade take over 4 hours? Is there a metric I can use to predict when the upgrade will finish? Or am I in doldrums and should kill the process? While I wait for this to resolve, of course, my ability to record is off-line, so knowing what an "upgrade" should cost in terms of time is important as one may want to weigh the risk of downtime.
The text was updated successfully, but these errors were encountered: