-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs send - quota bug - userused underflow #3789
Comments
We are seeing this bug as well, except in our case I don't know exactly how to reproduce it. It looks like it happens when files are added/removed and then in some cases userused is decremented to much, or it is not incremented enough when it should. But, like I said, unfortunately at this moment I am not sure how to reproduce. It is a very big problem however, as more and more of our users are getting "Disk quota exceeded" |
I'm also seeing it on a few datasets. # zfs userspace rubyweapon/home/dhe/private -p TYPE NAME USED QUOTA POSIX User root 18446744073708409344 none POSIX User dhe 4818886144 none Which is about equal to I went through snapshots looking for something a possible time indicator of when the issue began. The most recent snapshot I have is dated 2015-08-22, bootup/import date was 2015-08-04. That might help with guessing the time it started? One other thing. "find $mountpoint -uid 0" on a snapshot returns nothing, but userspace on the snapshot shows root has 1.5k of space. Is that pool metadata? |
I did a bit of digging to get to the root cause. The |
Just to be clear: I am seeing this symptom on a dataset that is not on the receiving end of zfs send/receive. Do you think it can be the same issue? |
If it matters, my dataset had send/recv done in the past, but that was last year at least. |
Let me clarify the problem I found a bit further: The specific issue I referred to above causes the entries in the user/group tracking objects to not be incremented as objects are restored to the dataset. The effect is as if they're all set to zero. As soon as storage is allocated in the newly-received filesystem, entries are created for the appropriate user/group and are incremented appropriately. Similarly, as storage is freed, entries are decremented which it what leads to the negative values. I've only researched the exact scenario laid out by @mw171 which involves a send/receive and have found the root cause of the problem which is outlined earlier in this issue. Since most of the code to do the work already exists, I'm working up a patch which will add a command to cause the usage values to be recomputed for a specific filesystem. This will allow to correct things if they ever get out of sync. I expect to have this working tomorrow once I get a chance to finish it up. It would certainly be interesting to know of any other situations under which the usage values get out of sync. |
Re-reading my comment, even further clarification is needed w.r.t. the phrase "as if they're all set to zero": This is only true in the context of a |
@dweeezil: Great! In the moment we can only provide some filesystems with bad used counts without userquota. So a "usage-recompute-command" would be a great help for us. Thanx! |
You accidentally closed this. |
sorry, wrong button... |
Here's an update on the insane hack I've whipped up to fix the user/group usage tables: It would be nice to develop a more proper and robust patch to rebuild the usage tables but in the mean time, the https://github.com/dweeezil/zfs/tree/userused branch contains one of the more wretched hacks I've ever developed. It adds A new module parameter,
The module parameter is global and will impact all operations. In other words, this is a hack of epic proportions and must be performed when the pool is otherwise quiescent. I wrote it mainly to understand how these tables are managed and not to provide a "production-ready" solution to a rebuild of the tables. I do have an idea in mind as to how a proper solution can be implemented but am not sure whether it's worth the effort. Mode 1 causes the values to be cleared, mode 2 causes them to be recomputed and mode 0 is normal tracking. There may be ways other than the send/receive mentioned in the original report from this issue in which the tables can become out-of-sync with reality. The usual disclaimer applies to this patch but with less dire warnings: It will very likely not eat your pool but if used improperly will definitely mess up your user/group used tables. |
@dweeezil awesome work! We have quite a few (864 at the moment) users that have underflowed, and probably quite a few users with decreased quota counts that haven't underflowed yet. As long as the bug is not fixed however, recalculating would not really help, since it will just get out of sync again. In any case I would really love there to be a "production-ready" solution. I am not sure how else we could recover from this. |
@tjikkun So far, I'm not aware of any other operation which cause the values to become out of sync. How old is your pool and at what spa version was it created and also how were the filesystems created and at what zpl version? Have your user/group values been out of sync for a long time? Have you got any idea when they became corrupted and whether that would correlate with a specific version of the zfs code? |
@dweeezil Well, it happens on pools of varying ages, some are only little more than a week old, some are a year old. Not sure exactly what was the version we started at, most probably it was 0.6.3. |
@behlendorf While tracking down the initially reported problem caused by I'm going to revert it locally and make sure I can reproduce #3443. Maybe that can shed some light on the manner in which |
Reverted the offending commit, a3000f9, from master until we have a better fix. We'll do the same for the 0.6.5.2 release. |
On Sep 24, 2015, at 1:06 PM, Ned Bass notifications@github.com wrote:
Can you try the attached patch? I do not have a physical ZFS test rig right now, so this has only gone through ztest. It should go through the ZFS test suite before being committed, especially given that ztest does not test the objset eviction path at all. Justin |
Hi @scsiguy, the attachment didn't seem to make it to github. |
You can pull it from this FreeBSD bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202607 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202607 Justin |
Like comment #3789 (comment) we too have lots of users with the wrong quota. What are the chances of a command like |
@tjikkun My hack in https://github.com/dweeezil/zfs/tree/userused does actually work. As I mentioned, however, it is not production-ready and must be used when no other access (actually, no new file creation or removal) is being performed on a filesystem. Here's a little script I used to test it:
That script will wipe all the used values and allow them to be recomputed. Once they're wiped, echo 2 into zfs_dmu_userused_action and run "zfs userspace_rebuild tank/whatever" and it'll recompute the values. After it's done, change the parameter back to 0 and it should be fixed. |
I have a ZFS test system on order and will try to root cause this problem once it arrives. |
@scsiguy I'm not sure the original problem happens any more. At least I've not seen it happen with current master code. |
@dweeezil While I appreciate that work very much, for me the problem exists in a production environment. My guess is a userspace recalculation would take as much time as a scrub. If that is so, then there is no way I can keep those systems offline for such an amount of time. |
@tjikkun: here the same.
|
@scsiguy yes, but only with an already infected zfs:
|
@mw171 so you have explicitly tested against a "clean pool" and been unable to repro the problem? |
@scsiguy yes, just tried once more:
|
@tjikkun Can you expand on why augmenting scrub to fix this corruption won't work for you? Scrub doesn't prevent access to the pool and it could be a special version of scrub which only recalculates user/group used without doing full read/verify of user data. |
Re: #3789 (comment)
|
@scsiguy augmenting scrub would be totally awesome. If that can be done that would be best. |
Unfortunately an underflow is still possible:
|
Same with 0.6.5-35_g5c79067 . It would really be very helpful in repairing a damaged ZFS if you could prevent underflows of userused. Please help! |
what do you think of following little patch? *** zfs/module/zfs/zap.c.orig Tue Oct 6 16:58:34 2015
--- zfs/module/zfs/zap.c Sun Nov 22 16:53:58 2015
***************
*** 1137,1142 ****
if (value == 0)
err = zap_remove(os, obj, name, tx);
! else
err = zap_update(os, obj, name, 8, 1, &value, tx);
return (err);
}
--- 1137,1144 ----
if (value == 0)
err = zap_remove(os, obj, name, tx);
! else {
! if (value >> 63) value = 0;
err = zap_update(os, obj, name, 8, 1, &value, tx);
+ }
return (err);
}
it does the job for me:
|
From a style standpoint you probably want to write |
Checking back in on this issue: First off, we definitely don't want a patch in the generic ZAP code as @mw171 is proposing since it would affect all other ZAP operations. If we really wanted a hack, IIRC, there is still some condition under which root's quota can underflow as shown above. I'm not sure why this happens and I've not tried to track it down. AFAIK, this problem doesn't happen for any non-root users with the current code. It would be nice to track down the problem with root, but it would also be interesting to know if anyone can cause an underflow in a non-root user on a fresh filesystem with current code. |
@dweeezil: Is there any zap operation in which it makes sense that an underflow can happen? |
@tjikkun Since the
I wish these functions didn't have such generic names because they might be tempting to use for ZAPs in other places in which this behavior wouldn't be appropriate. |
@dweeezil if I already have a faulty userused of e.g. 2^64-1 , how could I get a corrected value near zero with your patch? |
We've been hit by this on several servers and had to disable quotas on some as well as resort to du to calculate usage on the affected filesystems. This isn't a show-stopping bug, but it is a really annoying one. Is there a recommended method of fixing this once a user is affected or an ETA for a method to fix it? Some of the affected servers are larger than 250 TB, so the du workaround is really painful. |
Closing, to my knowledge this issue hasn't been observed in 0.7.x series or newer. |
hi,
there seems to be a bug in the userused computation under 0.6.5:
Regards,
Martin
The text was updated successfully, but these errors were encountered: