Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trash Does Not Cleanup #702

Closed
mwaeckerlin opened this issue May 18, 2018 · 36 comments
Closed

Trash Does Not Cleanup #702

mwaeckerlin opened this issue May 18, 2018 · 36 comments

Comments

@mwaeckerlin
Copy link

mwaeckerlin commented May 18, 2018

I have a strange situation.

I create hourly, daily, weekly and monthly snaphots. After each snapshot, old snapshots are removed using lizardfs rremove. Since I have a lot of files and chunks, there are a lot of removals each hour. By default. trash-bin time is 24h, I reduced it yesterday to 1h.

But the trash space seems to be constantly growing. Current status:

LizardFS v3.12.0
Memory usage:   20GiB
Total space:    89TiB
Available space:        27TiB
Trash space:    382TiB
Trash files:    28424063
Reserved space: 0B
Reserved files: 0
FS objects:     50357273
Directories:    1309839
Files:  47818938
Chunks: 8350029
Chunk copies:   16699906
Regular copies (deprecated):    16699906

So many trash files, but where are they? I cannot see a single file in trash:

# mfsmount -o mfsmeta,mfsmaster=universum /mnt
mfsmaster accepted connection with parameters: read-write,restricted_ip
# ls /mnt
reserved  trash
# cd /mnt/trash
# time ls -lA
total 0

real    8m53.309s
user    0m0.000s
sys     0m0.004s

Questions:

  1. Why is there no restore folder?
  2. Why do I see no trash files?
  3. How can I force immediate trash bin cleanup?

Edit: ls -lA lasts ~9min minutes before it shows 0 files.

@mwaeckerlin
Copy link
Author

My current configuration:

In /etc/mfs/mfsmaster.cfg:

LOAD_FACTOR_PENALTY = 0.5
ENDANGERED_CHUNKS_PRIORITY = 0.6
REJECT_OLD_CLIENTS = 1
CHUNKS_WRITE_REP_LIMIT = 20
CHUNKS_READ_REP_LIMIT = 100

In /etc/mfs/mfschunkserver.cfg:

MASTER_HOST = universum
HDD_TEST_FREQ = 3600
ENABLE_LOAD_FACTOR = 1
NR_OF_NETWORK_WORKERS = 10
NR_OF_HDD_WORKERS_PER_NETWORK_WORKER = 4
PERFORM_FSYNC = 0

@mwaeckerlin
Copy link
Author

mwaeckerlin commented May 18, 2018

It's still growing, currently:

Trash space:    394TiB
Trash files:    29234626

Now it seems to cleanup the trash:

Two hours later:

Trash space:    390TiB
Trash files:    29197844

One more hour later:

Trash space:    381TiB
Trash files:    29331928

Three more hours later:

Trash space:    354TiB
Trash files:    29553804

But the questions above still remain: How can I immediately clear the trash?

@4Dolio
Copy link

4Dolio commented May 18, 2018

I think your trash is too large for ls perhaps. Try using fine instead. Find is faster as it does not attempt to stat each file as it goes. Of you just want to purge the trash find can -remove or -delete as it goes. 28 million files in one folder is quite a lot, so it can easily overwhelm things like ls.

@mwaeckerlin
Copy link
Author

Do you mean:

find /tmp/trash -exec rm {} \;

Or what command do you mean shall I execute?

Ok, it is running (since ½h), but does not seem to be a success:

# time find /mnt/trash
/mnt/trash
 [… still running …]

@guestisp
Copy link

find /tmp/trash -print -delete

@4Dolio
Copy link

4Dolio commented May 18, 2018

Good call with the print.. it is probably working but some verbose is helpful to see what it is doing. It could take a very long time... you should quote your '{}' if you do it the original way, guestisp way is 'better'. Do you see change in the cgi? Less trash files maybe?

@guestisp
Copy link

With the above command,every deleted file is also printed.

Anyway, with rsync is even faster, just sync an empty directory:

mkdir /tmp/empty
rsync -av --delete /tmp/empty /mnt/trash

@mwaeckerlin
Copy link
Author

The ssh connection aborts before find ends, but not s single file is found in this time. So there's something wrong with the meta filesystem!

Anyway, at least, trash space is still decreasing:

Trash space:    332TiB
Trash files:    28902103

@mwaeckerlin
Copy link
Author

@guestisp, @4Dolio, unfortunately it does not do anything at all. The metafilesystem is somehow defect!

$ time sudo rsync -av --delete /tmp/empty/ /mnt/trash/
sending incremental file list
./

sent 59 bytes  received 19 bytes  0.18 bytes/sec
total size is 0  speedup is 0.00

real    7m9.107s
user    0m0.008s
sys     0m0.000s

But still more or less the same as before (it was 328TiB / 28715421 just before I started rsync):

Trash space:    328TiB
Trash files:    28680974

@4Dolio
Copy link

4Dolio commented May 19, 2018

Try this:
find /mnt/trash/ | head

And your mfsexport has a valid entry for that client for metadata?

@mwaeckerlin
Copy link
Author

@4Dolio, head cuts the first lines of the output. Since there is no output at all, it wouldn't change anything.

@4Dolio
Copy link

4Dolio commented May 19, 2018

You are working on a massive set, can not hurt.

@4Dolio
Copy link

4Dolio commented May 19, 2018

Can you tab complete /mnt/trash/und^t or /mnt/res^t ?

@mwaeckerlin
Copy link
Author

mwaeckerlin commented May 19, 2018

Absolutely nothing, not with tab, not with ls, nor find, nor rsync. Directly on the host (no SSH), find terminates with no result after ~8 minutes.

So there is some massive bug in the meta file system!

@4Dolio
Copy link

4Dolio commented May 19, 2018

shrugs.. there should be a /mnt/reserved and feel like it should still exist and be visible. Seems like you could be not connected properly, like an export problem. Sometimes a normal dataset mount will act oddly if say no CS happen to be present for example. Just grasping for clues.

@4Dolio
Copy link

4Dolio commented May 19, 2018

If you set trash time to zero on a Non-empty file, and lock it, and delete it, i think it will show up in reserved until you let go of the lock(maybe from a different client?). .oO( managed to delete all data except locked reserved files once because lio iSCSI was still active and locked... manually rolled back the text changelog to before the deletion, rebuild, restarted, saved the reserved chunks and back online. Was lucky, don't try in production ;)

Maybe reserved no longer exists? I just rolled back my 3.10/3.12 to 2.6 so can not check.

Maybe mfsmeta is bugged out? It is normally magical and awesome. You should just be able to find $magic delete to purge all/some trash. Sigh.

I would say no more snapshots till it trashtime 0 + purges in real time-ish...

@4Dolio
Copy link

4Dolio commented May 19, 2018

Maybe try to mount the mfsmeta from a different client?

@mwaeckerlin
Copy link
Author

Yes, @4Dolio, /mnt/reserved exists, but is also empty. But the restore folder does not exist.

And yes, I already mounted directly on the master host and on other hosts too. It's everywhere the same.

The normal filesystem works. All data seem to be there. Just meta is strange.

@mwaeckerlin
Copy link
Author

Could it be, that deleting after the timeout is an expensive and slow operation? Could it be, that I snapshod and delete faster (all data once per hour) than the data can be cleaned up? Could this be the reason of my crash 2 days before, see #700?

Now I removed the hourly snapshot and reduce to daily snapshots.

@4Dolio
Copy link

4Dolio commented May 19, 2018

the existence of metadata.mfs.tmp indicates the master was dumping that file. But it did not finish that dump and rename it to metadata.mfs . And the lock indicates it did not exit cleanly and clear the lock.

We don't know why it died though nor what unusuail state that resulted in.

I have never used the rremove? You mentioned using so idk what that does, too new for my experience. If you didn't modify trash times first then you maybe still have 24 hours of retention after deletion, so accumulate 24*(objects count per snapshot). Perhaps your trash objects count is correct?

Maybe you need to settrashtime 0 before the rremove so they get purged more quickly? So they do not stack up and overload mfsmeta into your broken state.

Maybe the broken state is only related to the crash?

Only thing i can suggest is wait for it to clear out the trash.. maybe a dev can jump in with other things to try...

@mwaeckerlin
Copy link
Author

mwaeckerlin commented May 19, 2018

Well, I had to hard-reset the computer, since it was no more responsive.

rremove was introduced as fast `rm -f`` replacement. There is somewhere here an issue about this topic.

Yesterday I decreased the trash time from 24h to 1h. But there are still files in the trash that I removed on Thursday. I can't see those files, but I know they are there, because they have missing chunks and i still can get the list of missing chunks, and they are still here. The missing chunks are from February, when my harddisk ran full.

Yeah, still waiting…

@guestisp
Copy link

Do you have free space available ?

@mwaeckerlin
Copy link
Author

Now, plenty:

$ lizardfs-admin info universum 9421
LizardFS v3.12.0
Memory usage:   9.9GiB
Total space:    89TiB
Available space:        27TiB
Trash space:    218TiB
Trash files:    18714600
Reserved space: 0B
Reserved files: 0
FS objects:     35706013
Directories:    1144199
Files:  33335611
Chunks: 8366008
Chunk copies:   16729323
Regular copies (deprecated):    16729323

I added two more 10TB disks… :)

@4Dolio
Copy link

4Dolio commented May 20, 2018

Your trash objects are lower at least... 18 million... would be interesting to know if mfsmeta begins to work, and when..

@mwaeckerlin
Copy link
Author

I'll keep you up to date. All my services are down since Thursday. I hope, I'll get them back running today. It's a docker swarm running on top of lizardfs, where the nodes are both, lizard chunk-/ master-/ logger-server and swarm nodes at the same time. That used to work well for some month, but currently docker on the swarm master is no more responsive, so I migrated to another node. Migration is still in progress.

You find my configuration here.

@guestisp
Copy link

Are you using Lizard replicating over a Powerline? Seriously?

Powerline has huge and unstable latency...

@mwaeckerlin
Copy link
Author

mwaeckerlin commented May 20, 2018

@guestisp

Are you using Lizard replicating over a Powerline?

No more. After some problems described in #659, now all are on the same location connected through the same 1GB switch.

@mwaeckerlin
Copy link
Author

mwaeckerlin commented May 20, 2018

→ I updated my blog post.

@mwaeckerlin
Copy link
Author

Now, trash space begins to stabilize:

LizardFS v3.12.0
Memory usage:   12GiB
Total space:    89TiB
Available space:        27TiB
Trash space:    82TiB
Trash files:    6042804
Reserved space: 0B
Reserved files: 0
FS objects:     22029983
Directories:    1109053
Files:  19695237
Chunks: 8377626
Chunk copies:   16749271
Regular copies (deprecated):    16749271

@mwaeckerlin
Copy link
Author

Now that I don't create hourly backups any more and only delete a backup a day instead of one every hour, it's back to normal work:

Trash space:    69MiB
Trash files:    6743

So: Deletion of snapshots is much too slow!

@4Dolio
Copy link

4Dolio commented May 22, 2018

Did your mfsmeta start working?

True. And unfortunate. Long long ago, continuous snapshotting was floated. Can't do it yet, nor really used as rapidly as hourly.

@mwaeckerlin
Copy link
Author

Did your mfsmeta start working?

Yes!

@4Dolio
Copy link

4Dolio commented May 22, 2018

Ok, good. I wonder at what point it becomes broken?

@dminca
Copy link

dminca commented Jul 12, 2018

Another way of deleting trash files is to mount MFSMETA filesytem for manually trashing.

First of all, allow that server to mount LizardFS by editing the exports file with the server's IPv4 Address

On LizardFS master

# mfsexports.cfg
172.26.3.1/32 .                rw

On server X

sudo mkdir -p /mnt/mfsmeta
sudo mfsmount -m /mnt/mfsmeta/ -o mfsmaster=lizardfs-master.org

You can trash/purge the deleted files from here:

/mnt/mfsmeta/trash

@mwaeckerlin
Copy link
Author

Final conclusion: LizardFS is too slow in cleaning up for hourly snapshots (if the oldest snapshot is removed). Daily snapshots (incl. cleanup) are not a problem.

If you need a backup script for your cron job, try mine:
https://mrw.sh/admin-scripts/backup/src/branch/master/lizardfs

@mwaeckerlin
Copy link
Author

mwaeckerlin commented Nov 9, 2018

Last update of my backup scripts, I accidentally re-enabled hourly snapshots. So it creates and deletes one snapshot per hour. Now my master server is down since two days.

What I see, as additional information to be mentioned: Even though the server has 40GB and normally needs ~23GB, the memory exhausts and it is swapping. So memory seems to be the limit.

I suppose this is not because of the snapshots, but due to the delete operations?

I just ordered another 32GB RAM and I'll upgrade the server this evening to 72GB. Let's see how much this will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants