-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange chunk distribution and replication. #521
Comments
As for increased space usage, it can be a lot of things. My first idea is sparse files - perhaps the old filesystem on magnetic hard drives had them, and the new one does not. You can try running fallocate command mentioned in this issue #370 and see if space usage drops. If it's not that, it may still be something with filesystems. What was mounted before and what is mounted now? If the number of chunks is the same, it doesn't really looks like internal LizardFS problem (I don't completely rule it out, though). As for question 2: changing goal to 2 and then back to 3 won't kickstart replication. Here are the master's configuration entries that would (mfsmaster.cfg):
|
Hi @psarna, The FS and mount parameters for the drives are identical between the magnetic and SSD drives. I make a single GPT partition and run mkfs.xfs on it without arguments. These chunk drives are mounted with the recommended "rw,noexec,nodev,noatime,nodiratime,largeio,inode64,barrier=0". I'll take a look to see whether sparse files are in use, but I don't believe so. The dataset should be roughly the 550GB that we were seeing before. If I can get replication started again I can attempt to determine whether disk geometry or number of disks behave differently. The chunk loop parameters are configured to permit a high rate of replication without affecting client I/O. The first chunk server that replicated did so immediately and very quickly. Our current settings:
Here's the replication rate graph. Our first chunk server finished replicating around 18:00, taking nearly a week to complete. The second chunk server has not yet started, nearly 12 hours later. |
I was able to spend a little more time to investigate this issue. Sparse files are definitely involved, but are equally involved on all chunk servers (those with 500GB of use and those with 1.2TB of disk use). To make things easier:
Observations so far:
Would it be ill-advised to mark all disks on chunk server A, which has too much data on it, as evicted to see what happens when the master moves data off the disks? I'm curious as to whether the chunk server B will then replicate the undergoal chunks or will still refuse to. I'm tempted to rebuild both chunk server A and B from scratch, but during that time we'd only have one copy of each chunk. This will also complicate maintenance if we can't take chunk servers offline to upgrade or expand them without completely rebuilding them. |
Try turning down the OPERATIONS_DELAY_* values to 0 perhaps. |
Did you resolve the undergoal replication issue dogshoes? |
I'm running a LizardFS cluster with three chunk servers holding approximately 18 million files. The goal is "3". This is a pre-production cluster I'm evaluating.
I'm converting the chunk servers one by one from magnetic media to SSD. I downed the first chunk server and performed the maintenance: removed the old drives (four magnetic drives) and installed the new drives (seven SSDs). I started the server, it joined the cluster again, and chunks were replicated. About half-way through I noticed that space was being used on this chunk server at twice the rate expected, and this trend continued to when replication finished. The two untouched nodes had 509 GiB of data and the new node had 1.2 TiB of data while having the same number of chunks: 19512380.
At this point I made note of the oddity and moved on to the second chunk server. Similar maintenance was performed and the node was brought back online. However, no chunks are being replicated to this updated chunk server. At this point, all of our chunks are being reported as undergoal.
In an attempt to kickstart replication again I used mfssetgoal to change the goal from 3 to 2 and back to 3. The chunks went from undergoal to stable, and back to undergoal, but no replication started to the new node.
Since then, some data has been written to the cluster and those chunks are stable and are present on all three chunk servers. The only chunks present on the second chunk server are those written since the drives were swapped.
No errors reported on the chunk server or our metadata master. The mfschunkserver daemon has set up the expected directory structure on the new drives.
Not sure what's going on. Any advice?
The text was updated successfully, but these errors were encountered: