Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to choose the compression algorithm #12

Merged
merged 4 commits into from
Jan 12, 2021
Merged

Conversation

maniackcrudelis
Copy link
Collaborator

Allow to select the compression algorithm for tar between gzip, lzop, zstd, bzip2, lzma, lzip, xz or no compression.

To use lzop, zstd or lzip. Those packages have to be installed: lzop, zstd or lzip.

And force compression for Yunohost backups

And force compression for yunohost backups
@lapineige
Copy link
Collaborator

Force new backups if compression format modified

I don't understand this: does that mean that if you change the parameter, it will start new backups right away ?

@maniackcrudelis
Copy link
Collaborator Author

That means if you change the compression format, it will recreate all the backups to stick to that new compression format.
If you have started with gzip (so far, you didn't have a choice) and change for zstd it will recreate the backups with zstd as compression algorithm, even if the backups are the same as before.

To make it simple, if you change the compression format, the script will ignore the checksums next time it runs.

@lapineige
Copy link
Collaborator

Ok, then it does recreate the backups even if nothing changed (which will not be the case for me) but only during next planned backup.
It will be next sunday for me, sorry 😅

Where do we choose the algorithm ?

I can't upgrade :

Error: There doesn't seem to be any manifest file in /var/cache/yunohost/from_file ... It looks like an app was not correctly installed/removed.

@maniackcrudelis
Copy link
Collaborator Author

I did not upgrade the package yet, you would have to update the script on its own.

And add the option to your Backup_list.conf to change the compression algorithm.

If you want to try it, you can run archivist without waiting for the cron to do it.

@maniackcrudelis
Copy link
Collaborator Author

You can upgrade to this branch YunoHost-Apps/archivist_ynh#29

It won't add the option to your config file since it doesn't update it.
But you can add it manually.

The config-panel doesn't work though...

@lapineige
Copy link
Collaborator

lapineige commented Jan 12, 2021

I don't understand were I am supposed to put ynh_compression_mode= and files_compression_mode= and what's the difference between them 😅

edit: ok I've got it from this https://github.com/maniackcrudelis/archivist/pull/12/files

But I still don't understand the difference.

edit 2 : it fails.
If I run archivist.sh, I have

yunohost: error: unrecognized arguments: --no-compress

@maniackcrudelis
Copy link
Collaborator Author

yunohost: error: unrecognized arguments: --no-compress

Looks like you didn't update the script.
You can try from YunoHost-Apps/archivist_ynh#29 if you want, could be easier.

@lapineige
Copy link
Collaborator

But that's what I did 😅

@maniackcrudelis
Copy link
Collaborator Author

If you did an upgrade from YunoHost in CLI, you have to add -F to force the upgrade, otherwise it'll just think there's nothing to do.

@lapineige
Copy link
Collaborator

lapineige commented Jan 12, 2021

Strangely it did upgrade, including the dependencies (which make me think if was the right version).

I'll try again, with -F this time as it's needed (yunohost app upgrade archivist -u https://github.com/YunoHost-Apps/archivist_ynh/tree/compression -F)

edit: still fails.

@maniackcrudelis
Copy link
Collaborator Author

Indeed... You're right I do have that error too on a VM...
I'll investigate and fix that.

@maniackcrudelis
Copy link
Collaborator Author

... Job half done...
-F is "Force the update, even though the app is up to date" but it actually doesn't affect ynh_check_app_version_changed...
Just a fucking layer over it...

sudo YNH_FORCE_UPGRADE=1 yunohost app upgrade archivist -u https://github.com/YunoHost-Apps/archivist_ynh/tree/compression -F

What a nice job...

@lapineige
Copy link
Collaborator

It fails :( : https://pastebin.com/yHXc4XXW

@maniackcrudelis
Copy link
Collaborator Author

It doesn't sound related to the package.
It fails because it can't find the previous backup, which is a tar.gz... and I bet the backup exist but is a simple tar

If you have turned on the global compression setting for YunoHost, it doesn't work, as explained on the forum.
It may be why. If so, I can tell you what to do.
If you didn't, that's a YunoHost bug and I can't help you on that.

@lapineige
Copy link
Collaborator

If you have turned on the global compression setting for YunoHost, it doesn't work, as explained on the forum.

I did turn it on. Could you link to that post on the forum ? I think I missed it.

I do have 2 archivist-pre-upgrade1.tar and archivist-pre-upgrade2.tar files, that are symbolic link to their .tar.gz version.

@maniackcrudelis
Copy link
Collaborator Author

Disable that setting, it doesn't work well.
And remove both tar links and tar.gz files, the code as it is won't remove the link and will break any further backup.
Clearly, that setting hasn't been tested...

@lapineige
Copy link
Collaborator

lapineige commented Jan 12, 2021

Ok, thanks a lot, I wouldn't have been able to do it alone 😅

Upgrade did work… as well as the backups ! 🎉

By the way : nice warning message, it helps to understand that there were a change.

WARNING: Compression format has been modified for YunoHost backups. All backups will be rebuilt
WARNING: Compression format has been modified for files and directories. All backups will be rebuilt

@maniackcrudelis
Copy link
Collaborator Author

Let's go for an upgrade then

@maniackcrudelis maniackcrudelis merged commit de9afa6 into master Jan 12, 2021
@maniackcrudelis maniackcrudelis deleted the compression branch January 12, 2021 17:08
@lapineige
Copy link
Collaborator

Bonus: during the backups, I saw this message

Warning: It's hightly recommended to make your backup when the service is stopped. Please stop synapse service with this command before to run the backup 'systemctl stop matrix-synapse.service'

It that something this app should do, or you we ping Synapse maintainer ?

@maniackcrudelis
Copy link
Collaborator Author

The backup is operated by the backup script of the app, so that's indeed something that should be done on Synapse package.

@lapineige
Copy link
Collaborator

lapineige commented Jan 12, 2021

I noticed it kept the (old) .gz files.
edit: no in fact it did stop before the end (connexion issue I guess), but after a 2nd try it worked.

Before I delete them, a quick size comparison (gzip size, then zstandard) :

  • 215k / 79k
  • 518M / 641M (1.8G tar) ← this one (Nextcloud) is strange, but I changed a lot of things in the meantime, including the installation of ~10 apps, that might explain the difference.
  • 7.6M / 7.1M
  • 2.3G / 2.1G (4.2G tar)
  • 4.9M / 4.8M
  • 1.2M / 1.2M
  • 159M / 206M (tar: 1.5G)
  • 1G / 0.9G (1.3G tar)
    Which gives (Nextcloud bias excluded) an overall gain around 347M for a weekly backup. Pretty small, but still something. I expected more.
    I could not measure and compare precisely compression time, but it was pretty short (a few tens seconds) for Nextcloud backup. That's a real gain for me.

I can compare this final size (tar archive + zsdt ones) to the previous one (tar.gz archives only) : more than 4GB lost with tar archives. For a single weekly backup.
And as a result of Yunohost removal of compression : archivist now create normal (uncompressed) tar backup + compressed backups, which results in a much bigger storage space use, while before it was 2 compressed backups.

So I'll still need to manually delete uncompressed backup (and replace them by the one made by archivist if needed).

Anyway, thanks for implementing that feature, it's a game changer for me.

@maniackcrudelis
Copy link
Collaborator Author

maniackcrudelis commented Jan 12, 2021

And as a result of Yunohost removal of compression : archivist now create normal (uncompressed) tar backup + compressed backups, which results in a much bigger storage space use, while before it was 2 compressed backups.

Actually the purpose of archivist isn't to replace YunoHost backup but to duplicate them elsewhere on a recurrent task.
It was, by the way, doing a cp, not a move.
You can probably add a post_backup instruction to remove them if you want.

I did myself a few benchmark with exactly similar directories and ynh core, it was successive backups for tests purpose. And clearly gzip is not great, while zstd was better.
I use xz on my devices, which is fast and efficient actually. Clearly a huge difference with uncompressed tar !!!

Anyway, thanks for implementing that feature, it's a game changer for me.

Very happy to read that :)

@lapineige
Copy link
Collaborator

Actually the purpose of archivist isn't to replace YunoHost backup but to duplicate them elsewhere on a recurrent task.

It wasn't a criticism :), but thanks for clarifying this.
My point was that now Yunohost backup use way more storage, and a tangible example.

You can probably add a post_backup instruction to remove them if you want.

I guess that would launch a script ?

And clearly gzip is not great, while Zstd was better.

And for both compression and decompression. Gzip is one of the slowest to decompress.

I use xz on my devices, which is fast and efficient actually.

If you have time and don't mind the memory use, indeed it is very good at compressing, and not that slow compared to Zstd (but much better than lzma).
See https://linuxreviews.org/Comparison_of_Compression_Algorithms for example.

That's indeed probably the best bet for archiving with minimum storage space. But I can't afford it on my tiny VPS and Raspberry Pi 😅

@lapineige
Copy link
Collaborator

Btw, regarding storage space: If I understand well, Archivist create temporary backup, check if they are different from previous ones, then do the real backup. Why does it make 2 backups ?
Also, am I right if I say that old backups are removed at the end of the whole process (after all backups) ?

@maniackcrudelis
Copy link
Collaborator Author

maniackcrudelis commented Jan 13, 2021

Btw, regarding storage space: If I understand well, Archivist create temporary backup, check if they are different from previous ones, then do the real backup. Why does it make 2 backups ?

For YunoHost backups it does, not for simple file backups. The reason why is that the final backup, compressed or not, contains a timestamp.
So even if that's exactly the same backup, the timestamp is different, so the checksum will be different.
So if you use a normal backup, it will consider it's different each time.
That's the reason why it makes a temporary backup which is nor compressed or even concatenated with tar. So it can compare the real backup, without the timestamp and decide whether or not it's the same.

Of course, if you're not moving your backup elsewhere, you don't care. But the purpose was to not recreate backups and send them on another server every day because the timestamp has changed.
But yet, somehow what you gain on bandwidth is lost on CPU time. Although without the compression is not that bad.

Also, am I right if I say that old backups are removed at the end of the whole process (after all backups) ?

Backups which are created by Archivist are indeed removed if not in the backup list anymore. At the end of the process.
That's a cleaning part which consist to check the list of backups to do and remove everything which is not in that list. Usually old backup files.

I guess that would launch a script ?

It can run a command or a script, as you prefer.
Keep in mind though that the backup system of YunoHost won't restore a tar.gz or a tar.zstd and won't either restore a backup which is not in its own directory.
So be careful when deleting those backups.

@lapineige
Copy link
Collaborator

Thanks a lot for these explanations.

Backups which are created by Archivist are indeed removed if not in the backup list anymore. At the end of the process.
That's a cleaning part which consist to check the list of backups to do and remove everything which is not in that list. Usually old backup files.

Does that include previous backup of an app that will be saved ?

 > I guess that would launch a script ?
It can run a command or a script, as you prefer.

Ok. I think I prefer to manually remove then, to be sure of what my system is doing.

Keep in mind though that the backup system of YunoHost won't restore a tar.gz or a tar.zstd and won't either restore a backup which is not in its own directory.
So be careful when deleting those backups.

I am aware of that, but as it takes a lot of space, and is essentially a (compressed) duplicate), I prefer to remove those tar files, and decompress the other ones when needed.

@maniackcrudelis
Copy link
Collaborator Author

Does that include previous backup of an app that will be saved ?

Not sure to get it.
In the case of an app that already has a backup, if archivist makes a new backup, it will override the previous one.
If that's a backup of an app that has been removed from YunoHost, it's not in the list anymore, it will be removed.

Anyway, archivist does not do incremental backups. Again, the initial purpose was mainly to duplicate backups in different places. And I thought at that time that incremental backup would eat up to many storage space.

@lapineige
Copy link
Collaborator

lapineige commented Jan 13, 2021

In the case of an app that already has a backup, if archivist makes a new backup, it will override the previous one.

My point is: it will remove the previous one, but does it do that at the end of all backups ? Or right after a backup for app1, then after backup of app 2… ?

Anyway, archivist does not do incremental backups. Again, the initial purpose was mainly to duplicate backups in different places. And I thought at that time that incremental backup would eat up to many storage space.

And I'm fine with that :) (just waiting for Borg to be integrated in Yunohost ;)

@maniackcrudelis
Copy link
Collaborator Author

My point is: it will remove the previous one, but does it do that at the end of all backups ? Or right after a backup for app1, then after backup of app 2… ?

When creating a new YunoHost backup, it deletes the previous backup made by archivist (using YunoHost command) and then creates a new one.
Then the previous backup, in $backup_dir is overwritten.

@lapineige
Copy link
Collaborator

And then it goes to the next app to backup ?

I so, then I have nothing to comment. If not (=if it does it after all new backups) then maybe it would be better to do that after every app backup (to clear some storage before doing another backup).

@maniackcrudelis
Copy link
Collaborator Author

It does it for each app, before going to the next one.
That the same for each part of the backup.

See just here
https://github.com/maniackcrudelis/archivist/blob/master/archivist.sh#L527-L539

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants