-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore BIO_FLUSH command if there is no volatile cache to flush #710
Ignore BIO_FLUSH command if there is no volatile cache to flush #710
Conversation
…he GEOM disk layer by default returns ENOTSUPP in response to a BIO_FLUSH command. On AWS, local volumes do not advertise themselves as having write-cache enabled. When they are selected for L3 on all HDD nodes, the L3 subsystem may inadvertently kick these L3 devices if a BIO_FLUSH command fails with an ENOTSUPP return code. The fix is to make GEOM disk return success (0) when this condition occurs and add a sysctl to make this error handling config-driven
What is the L3 subsystem? It kinda sounds like a bug there at first blush... there are several block devices without caches to flush that cope with this situation correctly.... I'm not sure I understand here, so please help me fill in the gaps... |
This was in the context of a filesystem above geom built on FreeBSD, L3 being one path that was submitting the cache flush requests. I think the general idea would be that some consumers might prefer to treat a request to flush a drive cache to a drive without said cache as success, rather than track the device capability themself and use that to handle ENOTSUPP (or avoid issuing flushes). In my recollection a number of drives would let you issue cache flush commands if they had no such cache in the hardware world, but in this case geom has collected that information from the nvme capability information for the drive in question and rejects cache flush bios if the drive reports no volatile cache - again, accurate and not wrong as said flush would be a no-op, but for some consumers treating that as success may simplify their own implementation. If this isn't something upstream would take, we would probably head down the path you suggested of modifying the consumer behavior :) |
SYSCTL_ADD_BOOL(&sc->sysctl_ctx, | ||
SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO, "ignore_flush", | ||
CTLFLAG_RWTUN, &sc->ignore_flush, sizeof(sc->ignore_flush), | ||
"Do not return EOPNOTSUPP if there is no cache to flush"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, never mind... ENOTSUP == EOPNOTSUPP
Some producers of BIO_FLUSH use ENOTSUP as a hint to never send BIO_FLUSH again. So converting NOTSUP to 0 feels wrong to me, but the current code in the tree would at most have an optimization bypassed. So it's not terrible to cope with other upper layers that assign more malice to this failure than is really there (eg L3 dropping it). It's a terrible name, though. flush_notsup_succeed is better, but still not great. It also needs to be documented.... but there's no good man page to document this weird quirk. |
Thanks for your comments, it appears to me that suggestion is to not return 0, as there other upper layers/consumers using this error code. And for the consumer of us, probably drop in L3 for such error code Is my understanding right ? if yes, may be close/drop this PR ? |
No. I think that we should do this PR. While normally this functionality would be counterproductive, the sysctl that you have to enable it would allow people that need to cope with an upper layer / filesystem not honoring the protocol of ENOTSUP correctly would be a reasonable compromise. My only real objection is the name. |
Got it. flush_notsup_succeed looks good to me. Updated the diff with the same |
Please let me know if there is anything else to be done from my end to close this one |
Landed... Sorry for the delay. d3eb9d3 |
When a storage device reports that it does not support cache flush, the GEOM disk layer by default returns ENOTSUPP in response to a BIO_FLUSH command. On AWS, local volumes do not advertise themselves as having write-cache enabled. When they are selected for L3 on all HDD nodes, the L3 subsystem may inadvertently kick these L3 devices if a BIO_FLUSH command fails with an ENOTSUPP return code. The fix is to make GEOM disk return success (0) when this condition occurs and add a sysctl to make this error handling config-driven Reviewed by: imp Pull Request: #710
When a storage device reports that it does not support cache flush, the GEOM disk layer by default returns ENOTSUPP in response to a BIO_FLUSH command. On AWS, local volumes do not advertise themselves as having write-cache enabled. When they are selected for L3 on all HDD nodes, the L3 subsystem may inadvertently kick these L3 devices if a BIO_FLUSH command fails with an ENOTSUPP return code. The fix is to make GEOM disk return success (0) when this condition occurs and add a sysctl to make this error handling config-driven Reviewed by: imp Pull Request: freebsd/freebsd-src#710
When a storage device reports that it does not support cache flush, the GEOM
disk layer by default returns ENOTSUPP in response to a BIO_FLUSH command.
On AWS, local volumes do not advertise themselves as having write-cache enabled.
When they are selected for L3 on all HDD nodes, the L3 subsystem may inadvertently
kick these L3 devices if a BIO_FLUSH command fails with an ENOTSUPP return code.
The fix is to make GEOM disk return success (0) when this condition occurs and
add a sysctl to make this error handling config-driven