-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use internal error codes instead of UNIX errnos #280
Comments
Yes, we should have had this long time ago. Now is the good time never the less. I am thinking to have similar approach for #273 too. We need our own stat structure which can capture lot more information and also has option for reserved fields. When passing it to higher layers, we can convert it and send. |
|
A patch https://review.gluster.org/18069 has been posted that references this issue. |
Provide option to define both errno and error string (for logging). For all the standard 'errno' use 'strerror()' as the error string. For all the new error codes, developers have to define the error string while initializing. Updates gluster#280 Change-Id: I2e481bc3143c69130a5d3d015814c68b1754dff8 Signed-off-by: Amar Tumballi <amarts@redhat.com>
A patch https://review.gluster.org/18069 has been posted that references this issue. |
A patch https://review.gluster.org/18084 has been posted that references this issue. |
A patch https://review.gluster.org/18069 has been posted that references this issue. |
A patch https://review.gluster.org/18087 has been posted that references this issue. |
A patch https://review.gluster.org/18069 has been posted that references this issue. |
A patch https://review.gluster.org/18087 has been posted that references this issue. |
Provide option to define both errno and error string (for logging). For all the standard 'errno' use 'strerror()' as the error string. For all the new error codes, developers have to define the error string while initializing. Updates #280 Change-Id: I2e481bc3143c69130a5d3d015814c68b1754dff8 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/18069 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
A patch https://review.gluster.org/18087 has been posted that references this issue. |
1 similar comment
A patch https://review.gluster.org/18087 has been posted that references this issue. |
Solution proposal.Some basic possibilities:
To do this, we need an interface, where give a errno, we need a corresponding error-code. And given an error-code, we need what errno it is, and possible error message to log for the given error code. Currently, the implementation in 'experimental' branch is to have all this taken care in More possibiities?
Challenges
Calls for a half-day hackathon for all the maintainers/peers and other contributors! Feedbacks welcome! @gluster/gluster-all ☝️ |
Some considerations: I think we could have ranges of error codes assigned to each xlator and module, exactly as we have done for log messages. This way, each xlator will be able to create more detailed errors. This can be easily seen in some higher level xlators that detect issues in the consistency of data, like AFR and EC. They need to generate new errors that are conceptually different from any other error. We could also assign errors to modules (memory management, dict management, inode management, ...). Instead of assigning errors sequentially as they are needed (this would mix errors from multiple sources and meanings) we could reserve ranges of errors to each xlator or module that needs them. This way it would be easier, given an error, to determine what happened and where. Additionally we could also add an 'originator' id inside the error code itself. This way, we won't only know that we have detected an invalid argument error, for example, but we'll also know which xlator has detected it, without needing each xlator to log a message for each error or dig into big logs to locate them. With this approach, a single error code will be very meaningful, irrespective of where it has been logged. Since we are dealing with 32 bits error codes, we could use the following mapping:
The xlator/module code 0 could be reserved for raw errno codes, and use id 1 and higher for modules and xlators. Another thing to consider is that some code makes use of 'errno' variable to pass the error code between function calls. We should avoid that and use regular return of error codes. The main drawback of this approach is that most probably it will require changes in many places outside compat-errno.{c,h} because of the handling of the originator field. |
+1 for the originator id.
…On Tue, Sep 26, 2017 at 3:10 PM, Xavier Hernandez ***@***.***> wrote:
Some considerations:
I think we could have ranges of error codes assigned to each xlator and
module, exactly as we have done for log messages. This way, each xlator
will be able to create more detailed errors. This can be easily seen in
some higher level xlators that detect issues in the consistency of data,
like AFR and EC. They need to generate new errors that are conceptually
different from any other error. We could also assign errors to modules
(memory management, dict management, inode management, ...). Instead of
assigning errors sequentially as they are needed (this would mix errors
from multiple sources and meanings) we could reserve ranges of errors to
each xlator or module that needs them.
This way it would be easier, given an error, to determine what happened
and where.
Additionally we could also add an 'originator' id inside the error code
itself. This way, we won't only know that we have detected an invalid
argument error, for example, but we'll also know which xlator has detected
it, without needing each xlator to log a message for each error or dig into
big logs to locate them.
With this approach, a single error code will be very meaningful,
irrespective of where it has been logged.
Since we are dealing with 32 bits error codes, we could use the following
mapping:
- Lower 12 bits to represent an error code. We have up to 4096
available error codes for each xlator and module.
- Middle 10 bits to represent the module or xlator to which the error
code is associated.
- Upper 10 bits to indicate the xlator where the error has been
detected for the first time.
The xlator/module code 0 could be reserved for raw errno codes, and use id
1 and higher for modules and xlators.
Another thing to consider is that some code makes use of 'errno' variable
to pass the error code between function calls. We should avoid that and use
regular return of error codes.
The main drawback of this approach is that most probably it will require
changes in many places outside compat-errno.{c,h} because of the handling
of the originator field.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKV-SvThxqzkzMtQhd-cWfj9ZwVLANwlks5smMaOgaJpZM4Ohaky>
.
--
Thanks and Regards,
Kotresh H R
|
Another thing I would like to consider is the possibility of using negative error codes. This way we can return an error (a negative value), a plain success (0), and, optionally, some extra info in case of success (>0). It would also be interesting to enforce (from now on, not changing old code, at least initially) to always return an error code if the function can fail. For example, it's very common today to have functions that return a pointer. If the pointer is NULL, it implies that the function has failed, but this doesn't give any information about the real cause of the failure. Most of the times ENOMEM is assumed, but there could be other causes that we have lost. Loosing this information is as bad as not using proper error codes. So the idea would be to return an error code from these functions and pass the reference to the pointer as an additional argument instead of returning it. |
I propose this to release-8.0 as this has a potential of helping in debugging exactly where the error originated by just looking at the client log file. We can have range details as @xhernandez suggested. |
I tried to collect thoughts on this one. Happy to hear feedback. Challenges
Suggestions from above comments
Proposed solution
Future improvementsDesign a way to get details of translators running on server-side, and their run-time-id, during client-handshake. That way, we can pin-point the issue to a particular instance of translator in future, instead of the static id. This will be useful for translators designed and written outside of glusterfs too. |
A patch https://review.gluster.org/24747 has been posted that references this issue. cluster/ec: use IS_ERROR/IS_SUCCESS macros Updates: #280 Change-Id: I75354f8b2a82eaabb68eb36eaa27ee9bdd36a000 |
A patch https://review.gluster.org/24746 has been posted that references this issue. cluster/dht: use IS_ERROR/IS_SUCCESS macros Updates: #280 Change-Id: Id1001e93c31da78d580455a9601facb3e6df643c |
A patch https://review.gluster.org/24748 has been posted that references this issue. xlators/features: use IS_ERROR/IS_SUCCESS macros Updates: #280 Change-Id: Iecf2004e1a4951ae8927b9e35249f6689a65c001 |
A patch https://review.gluster.org/24749 has been posted that references this issue. xlators/nfs: use IS_ERROR/IS_SUCCESS macros Updates: #280 Change-Id: I1a67c0268fddfd99b3568592a11fa90c99fcb001 |
A patch https://review.gluster.org/24750 has been posted that references this issue. xlators/{meta,playground,posix-acl}: use IS_ERROR/IS_SUCCESS macros Updates: #280 Change-Id: Idba6a611bccdc4ec4a8892cf31fb32b286125d0e |
A patch https://review.gluster.org/24752 has been posted that references this issue. mount/fuse: use IS_ERROR/IS_SUCCESS macros Updates: #280 Change-Id: I4abadc82d96e8c823a4ddd4671961ce1bf4fb178 |
A patch https://review.gluster.org/24751 has been posted that references this issue. xlators/performance: use IS_ERROR/IS_SUCCESS macros Updates: #280 Change-Id: I4d8a15e772ad28e34a090c0d9c3792b94bf73e88 |
A patch https://review.gluster.org/24753 has been posted that references this issue. libglusterfs: use IS_ERROR() and IS_SUCCESS() macro Also introduce gf_strerror() and gf_errorcode() methods to set Updates: #280 |
A patch https://review.gluster.org/24754 has been posted that references this issue. storage/posix: use IS_ERROR/IS_SUCCESS macros Also use 'SET_ERROR()' macro which allows to set the return value Updates: #280 Change-Id: I034f939cc2fd45200f8c25a5ff4168edb0de6715 |
These macros helps to clearly identify all negetive checks are 'errors', and all 0 and above are success. With this clasification, we can add more error codes to the process / method / function. Updates: #280 Change-Id: I0ebc5c4ad41eb78e4f2c1b98648986be62e7b521 Signed-off-by: Amar Tumballi <amar@kadalu.io>
A patch https://review.gluster.org/24736 has been posted that references this issue. contrib, extras, doc: use IS_ERROR/IS_SUCCESS macro Updates: #280 Change-Id: I783d67c7a1bda1e465adaefc9a05e5f7217ac02c |
@xhernandez / @pranithk / @gluster/gluster-all while I see that you are reviewing things, it would be great if you consider to review series of changes @ https://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:ref-280 There are merge conflicts, but I plan to resolve it once i hear some feedback. While reviewing consider below, as we decided to have a practice like below:
Getting it reviewed and getting it in would help us to mandate the check with IS_ERROR() everywhere, and start returning different errorcodes, which can help debug more thoroughly and also can reduce the logs in glusterfs. |
After a brief discussion with @xhernandez, we both agree it would be great if we can make change to the signature of fop return itself, to keep the changes easier to review (even though it would be more changes). Mainly because fop changes wouldn't change the behavior of other places to start with. I will work on a basic prototype and submit the new method. Happy to hear feedback, and getting volunteers to take it to finish line. |
A patch https://review.gluster.org/25019 has been posted that references this issue. all: return value signature changes
Change-Id: Ib91216afaa89d7e4509757b50dd8a7d60a6bf0a9 |
* return signature of all 'fops' callback changed to `gf_return_t`. * provide IS_ERROR()/IS_SUCCESS() macros to check only this type of argument. * provide 2 global variables `gf_success` and `gf_error` introduced, which can used instead of -1 and 0 return used largely today. Change-Id: Ib91216afaa89d7e4509757b50dd8a7d60a6bf0a9 Updates: gluster#280 Signed-off-by: Amar Tumballi <amar@kadalu.io>
Thank you for your contributions. |
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it. |
Thank you for your contributions. |
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it. |
Over the years we've had many problems with the meaning of various UNIX errnos being overloaded, usually because some subsystem uses a particular error code internally. An ESTALE generated internally means something different than an ESTALE from a posix syscall. ENOTCONN has too many different meanings even to keep track of. There are also portability issues when a particular errno value doesn't exist or has a different meaning on NetBSD/FreeBSD than on Linux. Using an internal error type instead of overloading random errno values with random meanings will help avoid both problems.
My proposal is to have our own set of error values, and only use those internally. Whenever we need to convert back and forth - e.g. from a syscall/libc errno to an internal gf_errno or in the opposite direction when reporting errors via FUSE/NFS - we can use a conversion table by default, but there might also be many cases where this default needs to be overridden.
The text was updated successfully, but these errors were encountered: