New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot Usage Consideration #670

Closed
mwaeckerlin opened this Issue Mar 9, 2018 · 14 comments

Comments

Projects
None yet
6 participants
@mwaeckerlin
Copy link

mwaeckerlin commented Mar 9, 2018

I did not find information about best usage patterns for lizardfs makesnapshot.

Most important question: Could there be performance losses if there are too many snapshots? E.g. because every snapshot increases the number of chunks? Even they are not physically doubled at the moment of the snapshot, they will be on next write.

Currently my system contains 567893 chunks.

The snapshot itself is fast,just some seconds, but what happens, if i snapshot every hour and keep 23 hourly, 6 daily, 3 #weekly, 12 monthly backups?

Native btrfs snapshots killed my gluster filesystem (from the performance perspective) that was running on top of it, so I better ask for recommended usage, before I write an snapshot cronjob.

@MarkyM

This comment has been minimized.

Copy link

MarkyM commented Mar 9, 2018

It will NOT deteriorate the performance due to LizardFS architecture made to facilitate this option.
The only drawback of snapshots is that they DO take master memory.
Chunks are not copied at all, only modified ones are kept.
But, each file in snapshot takes ~150 bytes of Master RAM, so if you are snapshotting directories with massive number of files, RAM usage of Master will grow.
NO IMPACT ON PERFORMANCE though
Just add RAM to Master and you are good.
When RAM is scarce, use BerkeleyDB option for saving RAM in Master server. It slows it down, but not that much actually with some cache assigned.

@mwaeckerlin

This comment has been minimized.

Copy link

mwaeckerlin commented Mar 9, 2018

Thx, @MarkyM. Well, my master has 40G memory and 31G are free, and it could be upgraded to something like 256G if necessary. So there should be no limitation from this side.

Do you think, in this case, hourly snapshot is fine?

@MarkyM

This comment has been minimized.

Copy link

MarkyM commented Mar 9, 2018

SHOULD be very OK, but it all depends on number of files. Generally 1 billion files takes 150GiB RAM. From data I have - 40GiB RAM, 31 GiB free, it makes ~6GiB RAM for LizardFS master. You need 24+6+12+1 = ~40 times more if everything is snapshotted that gives 240GiB total, so should be perfectly OK with expanded RAM.
Please make sure Masters and shadows have also enough (~1TB) disk space for metadata saves then.
Of course these numbers go down when not everything is snapshotted / Master takes less RAM.
It can be checked in GUI (there is table Master RAM usage there)
Please also note that initialisation of Master in case of outage will take a bit longer (reading metadata from local HDD + integrity check + some other bookkeeping operations).

@borkd

This comment has been minimized.

Copy link

borkd commented Mar 9, 2018

If you plan to use extended attributes filesystem wide, plan for the space they will occupy. In some scenarios this can easily multiply the initial memory requirements.

@HerbertKnapp

This comment has been minimized.

Copy link

HerbertKnapp commented May 3, 2018

Don't do it on a production system. You'll max out your master in no time even with a small setup (think Outlook folders with lots of small files) and will run into real trouble trying to restore your data. Doing this, exactly what you're trying to do because it sounded very alluring, was the only time I had to start from scratch, empty metadata file. If you still want to go down this road, monitor ram usage closely and do every snapshot manually before trying anything automated with short intervals.

@4Dolio

This comment has been minimized.

Copy link

4Dolio commented May 10, 2018

I would advise caution, the larger the metadata the slower any master start will become. Shadows need more time to sync and cold starts and stops will take non negligible amounts of time. Your masters metadata to disk read and write speed will become a factor as could network speeds. I find that as you get into the 15to50GiB(Ram) or 4to12G(Disk) range that you may need to consider for startup durations. While I was originally thrilled at the prospect of a massive singular optimized namespace I learned that in practice it is more practical to run multiple smaller fragments.

@mwaeckerlin

This comment has been minimized.

Copy link

mwaeckerlin commented May 11, 2018

Currently I have 70 snapshots of two volumes (48 hourly, 14 daily, 6 weekly, 2 monthly). Up to now, everything still works as it used to work before. So it seems not to be a problem.

@4Dolio

This comment has been minimized.

Copy link

4Dolio commented May 15, 2018

That is great. I am running older 2.6.0 version and i suspect snapshots work better in newer versions. Have you attempted to delete snapshots yet? I am curious how you find that behaves. And how many objects are in your snapshots, in older versions very large object counts would cause the master to get busy.

@mwaeckerlin

This comment has been minimized.

Copy link

mwaeckerlin commented May 16, 2018

Have you attempted to delete snapshots yet?

Yes, I use my script from here: https://mrw.sh/admin-scripts/backup

It uses the new lizardfs rremove.

It takes hourly, daily, weekly and monthly backups, but keeps only the las 24 hourly, 7 daily, 4 weekly snapsjots, so every hour there is a snapshot deleted (well in fact, at least two, since I split my volume in three subvolumes, one for the docker volumes, one for the configuration files and one for the backups).

These are the current capacities:
grafik

@mwaeckerlin

This comment has been minimized.

Copy link

mwaeckerlin commented May 17, 2018

Strange: After doing fine for more than a month, suddenly everything broke down yesterday. I don't know why, but I decreased now the number of snapshots I keep, and I reduced the trash-time from 24h to 1h. Let's see, if or when my services get stable again, currently I cannot even ssh to my server … :(

@guestisp

This comment has been minimized.

Copy link

guestisp commented May 17, 2018

@mwaeckerlin little bit ot (I don't want to hijack the thread): are you using 13GB RAM with about 47.000.000 objects ? Is this true ? I'm trying to estimate the ram usage for each single object for my new cluster

@mwaeckerlin

This comment has been minimized.

Copy link

mwaeckerlin commented May 17, 2018

40GB of Memory, but still got a crash and memory full. After reboot, I am now in a complete mess!

Who can help? Urgently, all my servers are down: #700

@mwaeckerlin

This comment has been minimized.

Copy link

mwaeckerlin commented Jul 19, 2018

Problem solved, daily snapshots are ok, hourly snapshots cannot be cleaned up fast enough (remove one huge snapshot per hour fails).

@mwaeckerlin

This comment has been minimized.

Copy link

mwaeckerlin commented Dec 25, 2018

Hint: Set trash time to 0 for all files, e.g. if mounted on /srv:

sudo  lizardfs settrashtime -r -l 0 /srv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment