Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs-progs: scrub: show the scrubbing rate and estimated time to finish #177

Closed
wants to merge 1 commit into from

Conversation

gkowal
Copy link
Contributor

@gkowal gkowal commented Jun 4, 2019

The estimation is based on the allocated bytes, so it might be
overestimated.

Example output for running scrub:

scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
scrub started at Fri May 31 15:56:57 2019, running for 0:04:31
total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31
no errors found

Signed-off-by: Grzegorz Kowal grzegorz@amuncode.org

The estimation is based on the allocated bytes, so it might be
overestimated.

Example output:

scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        scrub started at Fri May 31 15:56:57 2019, running for 0:04:31
        total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31
        no errors found

Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org>
@kdave kdave added this to the v5.1.2 milestone Jun 13, 2019
@kdave
Copy link
Owner

kdave commented Jun 13, 2019

I think the numbers match, scrub needs to go through all physical copies and this is sotred in the total and used _bytes in dev info.

Naturally the IO performance can vary so the average speed can be off if it's based on just one measurement, but as it is now I think it's ok.

@kdave
Copy link
Owner

kdave commented Jun 14, 2019

So the device info value 'total_used' cannot be used because it's the chunk size and not the amount of data used in the chunks. Ie. the same what's printed in 'fi df' as total, while we're interested in 'used'. This can be obtained by get_df.

I've reworked the output too so now it looks a bit more readable and adds some fancy stuff:

UUID:             bf8720e0-606b-4065-8320-b48df2e8e669
Scrub started:    Fri Jun 14 19:49:47 2019
Status:           running
Duration:         0:14:11
Time left:        0:04:04
ETA:              Fri Jun 14 19:53:51 2019
Total to scrub:   182.55GiB
Bytes scrubbed:   141.80GiB
Rate:             170.63MiB/s
Error summary:    csum=7
  Corrected:      0
  Uncorrectable:  7
  Unverified:     0

For the reference, this is 'fi df':

Data, single: total=261.00GiB, used=179.91GiB
System, single: total=32.00MiB, used=48.00KiB
Metadata, single: total=5.00GiB, used=2.64GiB
GlobalReserve, single: total=375.23MiB, used=0.00B

I've run it several times, the average time is 18 minutes and the estimate is not bad, usually off by one or two minutes.

@gkowal
Copy link
Contributor Author

gkowal commented Jun 15, 2019

I thought that even the overestimated time left using the size of allocated chunks is better than nothing, and was planning to find out later how to get the used space.

Anyway, on an active filesystem, the info about the used space does not guarantee more precise estimation of the time left, since not only the current I/O activity can affect the estimation but also a situation can occur in which some already scrubbed data might be removed during the scrubbing process. This would potentially result in a negative time left at the end of scrubbing, since the total bytes to scrub would decrease because of the removed space, while it would still be counted in the scrubbed bytes. I am not an expert, so maybe it is not completely true. Still, any estimate, even a rough one is more useful than no estimate at all.

BTW, the new output is indeed much more readable.

@kdave
Copy link
Owner

kdave commented Jun 19, 2019

I'll merge your patch as it works so you get the credit, and then add my that update the output and calculations.

@kdave kdave closed this Jun 19, 2019
kdave pushed a commit that referenced this pull request Jun 19, 2019
The estimation is based on the allocated bytes, so it might be
overestimated.  Scrub reports the size of all bytes scrubbed, taking
into account the replication, so we're comparing that with total sum
over all devices that we get from DEV_INFO, in the same units.

Example output:

scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        scrub started at Fri May 31 15:56:57 2019, running for 0:04:31
        total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31
        no errors found

Pull-request: #177
Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this pull request Jul 3, 2019
The estimation is based on the allocated bytes, so it might be
overestimated.  Scrub reports the size of all bytes scrubbed, taking
into account the replication, so we're comparing that with total sum
over all devices that we get from DEV_INFO, in the same units.

Example output:

scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
        scrub started at Fri May 31 15:56:57 2019, running for 0:04:31
        total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31
        no errors found

Pull-request: #177
Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org>
Signed-off-by: David Sterba <dsterba@suse.com>
@kdave kdave modified the milestones: v5.1.2, v5.2 Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants