New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Change compression algorithm for memory filesystem Netflow backups #5278
Comments
This issue has been automatically timed-out (after 180 days of inactivity). For more information about the policies for this repository, If someone wants to step up and work on this issue, |
Has this ever been changed yet? I never noticed this issue before until recently. Another improvement could be the ability to configure a log rotate for all of the reporting data (netflow, rrd, ...), so we only keep X days of that data. |
When using a memory disk, these files should not be in there anymore, the backup itself can safely be disabled in "System: Settings: Miscellaneous". |
What do you mean by memory disk? |
You can also use the backup/restore on disk, but usually it's the same disk so it doesn't hold much value (and may cause the machine to take a long time shutting down or starting up). We changed the defaults a long time ago in f1ea003. |
Okay, so the compression algorithm is still gzip? And disabling all periodic backups or the netflow backup at "System: Settings: Miscellaneous" will lead to dataloss of the aggregated data at each reboot? |
yes
no |
Ahh then I misunderstood the periodic backup function or maybe the description label is misleading. Sorry for bothering but I would really like to know how exactly this works, you can also just send me some documentation if that exists. |
to be honest, the documentation on this subject is a bit light, but you certainly don't need the archives when content is stored on disk, only for dhcp it more or less makes sense if you're reinstalling the device and want to keep your leases (reinstalling with config import can use these backups). So, long story short, just stick to our new defaults and you should be good. |
As the opener, I'm 99% sure that when I opened it, the option was to place the entire of As of right now on my 23.1.7 box, it seems like the option has been reduced to |
@g-a-c yes, that's indeed the case. full memory disk (sd/cf) installs are less common nowadays and we changed the behavior improve stability as well. |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Is your feature request related to a problem? Please describe.
I run OPNsense with memory filesystem on
/var
(to avoid constant SSD writes), which collects Netflow data for use with Insight. Part of this is obviously the backup system which on shutdown (or periodically) can back up that data to real storage to make it persistent.My Netflow data is now running into the gigabytes in size (currently it's approximately 1.6GB but it's been larger) and if I reboot the firewall appliance, this takes an increasingly long time which I think I've narrowed down to the backup step. The script that backs up Netflow data uses
gzip
compression with the default settings, and on my appliance this takes over 90 seconds to run currently with this 1.6GB of source data. This script can take so long to run that it causes the UI to refresh and the box to give the impression it has rebooted when actually the backup script is still running in the background and the appliance is yet to go down.I believe this is a similar problem to #2876 which was reported several years ago and resulted in a way to just turn off the backups completely.
Describe the solution you like
I compared a couple of different algorithms available in the OPNsense version of
bsdtar
, includinggzip
,gzip-1
,lz4
andzstd
to see if there was a better trade-off between compression algorithmsreal
time (compression)real
time (decompression)bzip2
defaultgzip
defaultgzip-1
zstd-1
lz4
defaultIt wasn't exactly a scientific test (more of a for-loop running different combinations on my Netflow data, compressing to the internal SSD and decompressing to
/dev/null
), but it looks like usingzstd
compression at level 1 achieves almost as small a Netflow archive as the currentgzip
default level, but in a fraction of the time.I discounted
lz4
because the files are considerably larger than the current default;bzip2
compression is considerably slower; andgzip-1
because it isn't better thanzstd
at either time/size. But switching tozstd
appears to have considerable upsides going into a reboot, and a very negligible downside coming out of the reboot afterwards. For anyone who has periodic backups enabled every hour, this should also make that periodic job run much quicker which means less missing data since the backup script stops theflowd_aggregate
service while it is running.Since the compression algorithm can be automatically detected by the
tar
extract step, there would still be portability in the ability to restore agzip
archive even through the same script that may generatezstd
archives because the decompression algorithm wouldn't have to be specified inside the script.Describe alternatives you considered
A clear and concise description of any alternative solutions or features you considered.
Additional context
These figures were got from an Atom C2558 firewall appliance with 16GB of DDR3 RAM and an internal mSATA SSD of some sort (I actually don't know the model number). I don't have a way of seeing if these figures are reproducible across other similar low-powered appliances like the APU series.
The text was updated successfully, but these errors were encountered: