Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable LBA weighting on files and SSDs - extend to SMR drives #10182

Closed
Stoatwblr opened this issue Apr 4, 2020 · 11 comments
Closed

Disable LBA weighting on files and SSDs - extend to SMR drives #10182

Stoatwblr opened this issue Apr 4, 2020 · 11 comments
Labels
Status: Stale No recent activity for issue Type: Performance Performance improvement or performance problem

Comments

@Stoatwblr
Copy link

This was originally fb40095

Owing to the way that SMR drives work, LBA weighting is extremely counterproductive.

Unfortunately there are a lot of SMR drives in CMR clothing out there and the number is increasing - making things worse is that many don't report themselves as zoned devices.

However there's a ray of sunshine: Any rotational device which reports "trim" functionality is definitely SMR - or at least using zones internally and should not be LBA-weighted. (Example WDx0EFAX (RED) drives (EFRX are CMR)

Detecting SMR-but-not-trimmable drives is a bit harder (Example: Seagate ST3000DM003)

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Apr 6, 2020
@behlendorf
Copy link
Contributor

You can disable LBA weighting by setting the module option metaslab_lba_weighting_enabled=0. If you find this to be beneficial for your devices we could consider adding some logic to metaslab_space_weight() which infers that rotational devices which support trim are almost certainly SMR devices.

@Stoatwblr
Copy link
Author

Stoatwblr commented Apr 16, 2020

Unless/until I can sucessufully get my drives to resilver and stay in the array - see ticket #10214 and the recent block and files postings, I can't benchmark this. (I have WD40EFAX drives too)

However if you take the time to sit through Manfred Berger's presentation on SMR (OpenZFS forum, Paris 2015- https://www.youtube.com/watch?v=a2lnMxMUxyc) you'll understand why LBA weighting is utterly valueless on SMR drives - and quite likely to be harmful due to it causing drives to generate massively large indirect tables

Essentially a SMR zone is the mechanical equivalent of a SSD block and the drives translate every LBA you send them to something in the zones - meaning it doesn't matter where you THINK you're putting data on the drive, the drive is making up its own mind about where it ends up. There is a complete disconnect between LBA space and the actual position on the platters.

Adding confusion the CMR (conventional(non-shingled) magnetic recording) landing space for writes can be analogised to TLC/QLC SSD's SLC write cache space - stuff sits there until the drive decides a final resting place during quiet periods (or until the CMR zone fills up)

FWIW: Reports for resilvering on DM-SMR drives without the WD RED firmware bug all indicate that throughput either grinds to a near halt (kilobytes per second) or actually does stop for extended periods whilst the CMR zone is flushed out. As this can be up to 100GB in size, that flush can take quite a while.

@Stoatwblr
Copy link
Author

Stoatwblr commented Apr 20, 2020

After some more digging, I think LBA weighting is essentially valueless beyond the first few LBAs even on CMR drives.

The reason is because of the way sectors are addressed:

Explanation from
https://github.com/westerndigitalcorporation/DiskSim/blob/master/STABLE/DiskSim_Linux_Generic_2.01.016/diskmodel/doc/dm_manual.pdf

1.4.2 G1 Layout
dm layout g1 LBN-to-PBN mapping scheme int required

-----
This specifies the type of LBN-to-PBN mapping used by the disk. 0 indicates that the conventional
mapping scheme is used: LBNs advance along the 0th track of the 0th cylinder, then along the
1st track of the 0th cylinder, thru the end of the 0th cylinder, then to the 0th track of the 1st
cylinder, and so forth. 1 indicates that the conventional mapping scheme is modified slightly, such
that cylinder switches do not involve head switches. Thus, after LBNs are assigned to the last track
of the 0th cylinder, they are assigned to the last track of the 1st cylinder, the next-to-last track of
the 1st cylinder, thru the 0th track of the 1st cylinder. LBNs are then assigned to the 0th track of
the 2nd cylinder, and so on (“first cylinder is normal”). 2 is like 1 except that the serpentine pattern
does not reset at the beginning of each zone; rather, even cylinders are always ascending and odd
cylinders are always descending.
---

In other words: LBA to platter/head allocation runs along the platters before switching heads and may (or may not) run in a serpentine manner to avoid the actuator having to seek from one extreme of the platter to the other when changing between heads on a sequential write/read

Which in turn means that LBA weighting is only valuable for the first N tracks of the first platter, as beyond that you have no idea what the speed will actually be. It may be that speed decreases, to a minimum with each track step and then increases to a peak before decreasing again (serpentine pattern), or it may decrease to a minimum, then snap back to peak speeds as the head switches to track0 on the next platter (traditional linear pattern)

We, as the end user do not know how the disk is laid out, are not expected to be able to find out and are likely to be prevented from doing so by any means other than by benchmarking the drive from end to end to build a response histogram

(NB: The terminology "cylinder" in the document is likely to derive from the ancestral device, which was a rotating magnetic drum(cylinder), not a flat platter - a "cylinder" in this context is one face of a platter)

@amotin
Copy link
Member

amotin commented Apr 20, 2020

While I have no objections that LBA weighting likely does not have much sense for SMR (the question is how to reliably detect that for device-managed SMR), as I see from the code, it is not really used for 3 years now, since 4e21fd0 switched weighting to SPACEMAP_HISTOGRAM.

@Stoatwblr
Copy link
Author

In which case disabling or removing it entirely really should be a no-brainer

@amotin
Copy link
Member

amotin commented Apr 21, 2020

It may still be used for older pool, not upgraded to new features yet. But since it should not affect new pools, I would not focus on it too much. If there is a way to detect SMR drives (like rotating disk with TRIM support or something better), I see no problem to use it. Otherwise I would leave it as-is

@ryao
Copy link
Contributor

ryao commented Jun 4, 2020

Detecting SMR-but-not-trimmable drives is a bit harder (Example: Seagate ST3000DM003)

This is going to need a black list. However, even a blacklist is not reliable given Western Digital and others putting SMR into models that were previously PMR/CMR. :/

@jdrch
Copy link

jdrch commented Jun 22, 2020

Seagate ST3000DM003

Are there more examples of such HDDs? Unless I'm mistaken, this seems to be a legacy product no longer made by Seagate. Unless there are more examples, this may just be an edge case worth documenting and writing a honking warning about, and not necessarily coding support for in the near term.

@Stoatwblr
Copy link
Author

ALL current Seagate Barracuda and Barracuda compute drives are undeclared DM-SMR without TRIM, Toshiba's drives don't declare TRIM either.

It really is a mess. Apart from the Hattis law class action in California over WD REDs, there's another case filed in New York over the other drives: https://classactionsreporter.com/wp-content/uploads/Western-Digital-SMR-Hard-Drives-Compl.pdf

@jdrch
Copy link

jdrch commented Jun 24, 2020

ALL current Seagate Barracuda and Barracuda compute drives are undeclared DM-SMR without TRIM

I don't think this is the case. I recently deployed a 2 TB Seagate Barracuda 2.5 in HDD that is SMR with TRIM. Here's the output of # hdparm -I for it.

@stale
Copy link

stale bot commented Jun 24, 2021

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jun 24, 2021
@stale stale bot closed this as completed Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

5 participants