Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support single files scrubbing #16018

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Mar 21, 2024

Motivation and Context

Currently, we have various scrubbing approaches, including scrubbing an entire pool or scrubbing saved errors. However, the ability to target specific data for scrubbing can be particularly beneficial. For instance, one simple application of this is scrubbing a single file rather than the entire pool. One of the simplest examples is to scrub a single file instead of the whole pool. This scrubbing could be desired for files written prior to disk/controller errors or for older files that warrant inspection. For a large pools limiting the actual data that we have scrub can be quite usefull.

Description

This introduces the functionality of scrubbing data from files. Originally, the errloglist structure was utilized for tracking error blocks identified during scrub operations. Same structure can be used record blocks from specific files. By generating these records we can simbply scurb it. This allows scrubbing a single file, leveraging existing infrastructure.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

How Has This Been Tested?

A new test was added.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@oshogbo oshogbo force-pushed the oshogbo/scrub_single_file branch 3 times, most recently from 80cb074 to adca54a Compare March 21, 2024 20:05
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 27, 2024
@tonyhutter
Copy link
Contributor

  1. The two test cases are nearly identical except one makes a 10MB file and the other makes a 1GB file. Could you combine the tests?
  2. It would be desirable to have a test case where you scramble a file and they verify the file scrub actually repairs it.
  3. What is the behavior if you are actively scrubbing a file and someone else tries to scrub the pool (and vice versa)?

@oshogbo
Copy link
Contributor Author

oshogbo commented Mar 29, 2024

The two test cases are nearly identical except one makes a 10MB file and the other makes a 1GB file. Could you combine the tests?

Yes, I can do that.

It would be desirable to have a test case where you scramble a file and they verify the file scrub actually repairs it.

Ok, I can try to add such.

What is the behavior if you are actively scrubbing a file and someone else tries to scrub the pool (and vice versa)?

The file will be added to the error list; however, the scrub will fail, as you can't start another scrub.

man/man8/zfs-scrub.8 Outdated Show resolved Hide resolved
@tonyhutter
Copy link
Contributor

What is the behavior if you are actively scrubbing a file and someone else tries to scrub the pool (and vice versa)?

The file will be added to the error list; however, the scrub will fail, as you can't start another scrub.

Just so I'm understanding correctly - if I do:

zfs scrub /tank/huge_file
zpool scrub tank

...the zpool scrub will fail and the zfs scrub will continue, with /tank/huge_file being put in the error list? Is that correct? If it's in the error list, will /tank/huge_file be listed as having errors if I do zpool status -v ?

And conversely, if I do:

zpool scrub tank
zfs scrub /tank/huge_file

The zfs scrub will fail?

If this is correct, then we should print an error in both cases saying something to the effect of "I can't do that, there's a scrub in progress". We should mention this behavior in the man pages as well.

Some other scenarios that come to mind:

  1. What is the behavior if you do a zpool scrub, pause the scrub, then scrub a file? Does the "paused state" get canceled when you scrub the file, or can you resume the zpool scrub after scrubbing the file?

  2. If you start a zfs scrub is there any way to cancel the scrub? Will zpool scrub -s cancel it?

@oshogbo
Copy link
Contributor Author

oshogbo commented Apr 24, 2024

...the zpool scrub will fail and the zfs scrub will continue, with /tank/huge_file being put in the error list? Is that correct? If it's in the error list, will /tank/huge_file be listed as having errors if I do zpool status -v ?

Yes, that is correct. You will see the file on error list.

zpool scrub tank
zfs scrub /tank/huge_file

The zfs scrub will fail?

Yes, as there is already a running scrub.

If this is correct, then we should print an error in both cases saying something to the effect of "I can't do that, there's a scrub in progress". We should mention this behavior in the man pages as well.

I have added that only one scrub can work.

$ sudo ./zfs scrub /ztest1/t ; sudo zpool scrub ztest1
cannot scrub ztest1: currently resilvering

What is the behavior if you do a zpool scrub, pause the scrub, then scrub a file? Does the "paused state" get canceled when you scrub the file, or can you resume the zpool scrub after scrubbing the file?

So currently the file will be added to the error list and the scrub will be still paused.

If you start a zfs scrub is there any way to cancel the scrub? Will zpool scrub -s cancel it?

Yes, its a normal scrub it behaves the same.

This introduces the functionality of scrubbing data from
files. Originally, the `errloglist` structure was utilized
for tracking error blocks identified during scrub operations.
Same structure can be used record blocks from specific files.
By generating these records we can simbply scurb it.
This allows scrubbing a single file, leveraging existing
infrastructure.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Comment on lines +64 to +67
log_must zinject -a -t data -C 0,1 -e io $mntpnt/file

log_must zfs scrub $mntpnt/file
log_must is_pool_without_errors $TESTPOOL true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test corrupts the first two DVAs of $mntpnt/file then I would expect scrub to say "I repaired data" rather than "everything was fine". Does this check that scrub repaired data here?

Comment on lines +1976 to +1977

function is_pool_without_errors #pool <verbose>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return true if there were no errors, or does it return true if there were errors but they were all repaired?

@tonyhutter
Copy link
Contributor

If it's in the error list, will /tank/huge_file be listed as having errors if I do zpool status -v ?

Yes, that is correct. You will see the file on error list.

Hmm.. I'm worried that the user will think their files are corrupted when they really arn't. Is there a way around that?

Also, minor typo in commit message:

By generating these records we can simbply scurb it.

Lastly, are you planning to add delegation support for zfs scrub? If so, I could see some potential issues there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants