New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
btrfs-progs: scrub: show the scrubbing rate and estimated time to finish #177
Conversation
The estimation is based on the allocated bytes, so it might be overestimated. Example output: scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx scrub started at Fri May 31 15:56:57 2019, running for 0:04:31 total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31 no errors found Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org>
I think the numbers match, scrub needs to go through all physical copies and this is sotred in the total and used _bytes in dev info. Naturally the IO performance can vary so the average speed can be off if it's based on just one measurement, but as it is now I think it's ok. |
So the device info value 'total_used' cannot be used because it's the chunk size and not the amount of data used in the chunks. Ie. the same what's printed in 'fi df' as total, while we're interested in 'used'. This can be obtained by get_df. I've reworked the output too so now it looks a bit more readable and adds some fancy stuff:
For the reference, this is 'fi df':
I've run it several times, the average time is 18 minutes and the estimate is not bad, usually off by one or two minutes. |
I thought that even the overestimated time left using the size of allocated chunks is better than nothing, and was planning to find out later how to get the used space. Anyway, on an active filesystem, the info about the used space does not guarantee more precise estimation of the time left, since not only the current I/O activity can affect the estimation but also a situation can occur in which some already scrubbed data might be removed during the scrubbing process. This would potentially result in a negative time left at the end of scrubbing, since the total bytes to scrub would decrease because of the removed space, while it would still be counted in the scrubbed bytes. I am not an expert, so maybe it is not completely true. Still, any estimate, even a rough one is more useful than no estimate at all. BTW, the new output is indeed much more readable. |
I'll merge your patch as it works so you get the credit, and then add my that update the output and calculations. |
The estimation is based on the allocated bytes, so it might be overestimated. Scrub reports the size of all bytes scrubbed, taking into account the replication, so we're comparing that with total sum over all devices that we get from DEV_INFO, in the same units. Example output: scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx scrub started at Fri May 31 15:56:57 2019, running for 0:04:31 total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31 no errors found Pull-request: #177 Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org> Signed-off-by: David Sterba <dsterba@suse.com>
The estimation is based on the allocated bytes, so it might be overestimated. Scrub reports the size of all bytes scrubbed, taking into account the replication, so we're comparing that with total sum over all devices that we get from DEV_INFO, in the same units. Example output: scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx scrub started at Fri May 31 15:56:57 2019, running for 0:04:31 total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31 no errors found Pull-request: #177 Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org> Signed-off-by: David Sterba <dsterba@suse.com>
The estimation is based on the allocated bytes, so it might be
overestimated.
Example output for running scrub:
scrub status for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
scrub started at Fri May 31 15:56:57 2019, running for 0:04:31
total 62.55GiB scrubbed at rate 236.37MiB/s, time left: 0:12:31
no errors found
Signed-off-by: Grzegorz Kowal grzegorz@amuncode.org