You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the design goals of encoding/gob is to be fully self-describing. From Gobs of data:
Gob streams must be self-describing. Each gob stream, read from the beginning, contains sufficient information that the entire stream can be parsed by an agent that knows nothing a priori about its contents. This property means that you will always be able to decode a gob stream stored in a file, even long after you’ve forgotten what data it represents.
While this is true in theory - all the information is there - the existing API of encoding/gob falls well short of this goal.
The current Decoder API seems to assume (though it isn't made explicit) that any error returned from Decode is to be considered as fatal. But if an unknown type is in the stream, it would be possible to skip that value and recover parsing at the next encoded message. That would be a good step towards the goal of taking advantage of Gobs self-describing design.
I thus propose to add a new type TypeError to encoding/gob. For now, it can be left as an opaque struct type. I then propose to change the behavior of Decode and DecodeValue to guarantee:
If the next value can not be decoded because of a lack of type information or a mismatch between the type of e and the encoded value, Decode returns a *TypeError and discard the encoded value. Any other returned error (except io.EOF) is unrecoverable and invalidates Decoder.
Optionally, we could also change Encode and EncodeValue to guarantee:
If the passed value contains an interfaces whose dynamic value has not been Registered, a *TypeError is returned and nothing is written to the underlying Writer.
This seems like useful behavior, but it is not as important, as encoding unknown types can always be considered a bug.
Rationale
Use case
I would like to use encoding/gob to collect debugging information which can be attached to a bug report and be used to replay a reproducer. This is very useful for end-user applications which run outside our own infrastructure, so there is no logging or tracing information. Essentially, I want to use encoding/gob as a structured debug-log.
If I provide a tool to inspect these logs and/or if this package is re-used across applications, lacking or mismatching types are invariably going to happen.
I send the actual values as an any. If the decoder encounters an unknown type, I want to report that, giving the user the choice to skip it (which is usually fine for my use case). If I'm trying to feed the output of this program into this program, it becomes obvious that this doesn't work, though. Decode reports two separate errors for one encoded type. What's more, the second error message makes clear that the Decoder bailed on parsing after the type error, leaving the reader in the middle of the encoded value - so even the observed behavior seems a coincidence and not a safe assumption.
Fields of TypeError
TypeError should be a struct, to be able to report the exact nature of the unknown type (e.g. the type name and maybe the offset) for informational purposes.
Long-term it would also be ideal to have an API to reflect on gob streams without having any type information available at all, fulfilling the original design-goal. If we had that API, it would be useful to add the encoded type descriptor and value to the *TypeError, which could then be used to inspect the value using reflection. E.g. by having a func FromTypeError(*TypeError) Value function.
This proposal wants to keep open that possibility, while not going down the rabbit hole of deciding what the most useful information to provide to the user is. Thus, it keeps all fields of TypeError unexported for now.
Cost
In terms of API surface, this proposal tries to do the minimum incremental step to progress towards decoding fully unknown streams. As the user needs to be able to distinguish between a recoverable and non-recoverable (i.e. I/O or general data corruption) error, we need to add an error type.
On the implementation side, encoding/gob is a relatively complicated package and it might require a bit of work and potentially some cleanup, to be able to reliably recover from type errors this way. I'm unsure about the general state and priority of encoding/gob maintenance and extension. I'm willing to try to implement this on my own time, but I'm not sure I will be able to.
I assume the runtime cost will be zero or negligible.
I believe this change is backwards compatible, as the current behavior of Decode in case of a type error is not documented, so every error must be assumed to be fatal. Either way, the returned error can't be safely inspected, so no user should rely on the actual returned error value.
The text was updated successfully, but these errors were encountered:
Merovius
changed the title
proposal: encoding/gob: reliable recover from type-errors when decoding
proposal: encoding/gob: reliably recover from type-errors when decoding
Sep 9, 2022
Proposal
One of the design goals of
encoding/gob
is to be fully self-describing. From Gobs of data:While this is true in theory - all the information is there - the existing API of
encoding/gob
falls well short of this goal.The current
Decoder
API seems to assume (though it isn't made explicit) that any error returned fromDecode
is to be considered as fatal. But if an unknown type is in the stream, it would be possible to skip that value and recover parsing at the next encoded message. That would be a good step towards the goal of taking advantage of Gobs self-describing design.I thus propose to add a new type
TypeError
toencoding/gob
. For now, it can be left as an opaque struct type. I then propose to change the behavior ofDecode
andDecodeValue
to guarantee:Optionally, we could also change
Encode
andEncodeValue
to guarantee:This seems like useful behavior, but it is not as important, as encoding unknown types can always be considered a bug.
Rationale
Use case
I would like to use
encoding/gob
to collect debugging information which can be attached to a bug report and be used to replay a reproducer. This is very useful for end-user applications which run outside our own infrastructure, so there is no logging or tracing information. Essentially, I want to useencoding/gob
as a structured debug-log.If I provide a tool to inspect these logs and/or if this package is re-used across applications, lacking or mismatching types are invariably going to happen.
I send the actual values as an
any
. If the decoder encounters an unknown type, I want to report that, giving the user the choice to skip it (which is usually fine for my use case). If I'm trying to feed the output of this program into this program, it becomes obvious that this doesn't work, though.Decode
reports two separate errors for one encoded type. What's more, the second error message makes clear that theDecoder
bailed on parsing after the type error, leaving the reader in the middle of the encoded value - so even the observed behavior seems a coincidence and not a safe assumption.Fields of
TypeError
TypeError
should be a struct, to be able to report the exact nature of the unknown type (e.g. the type name and maybe the offset) for informational purposes.Long-term it would also be ideal to have an API to reflect on gob streams without having any type information available at all, fulfilling the original design-goal. If we had that API, it would be useful to add the encoded type descriptor and value to the
*TypeError
, which could then be used to inspect the value using reflection. E.g. by having afunc FromTypeError(*TypeError) Value
function.This proposal wants to keep open that possibility, while not going down the rabbit hole of deciding what the most useful information to provide to the user is. Thus, it keeps all fields of
TypeError
unexported for now.Cost
In terms of API surface, this proposal tries to do the minimum incremental step to progress towards decoding fully unknown streams. As the user needs to be able to distinguish between a recoverable and non-recoverable (i.e. I/O or general data corruption) error, we need to add an error type.
On the implementation side,
encoding/gob
is a relatively complicated package and it might require a bit of work and potentially some cleanup, to be able to reliably recover from type errors this way. I'm unsure about the general state and priority ofencoding/gob
maintenance and extension. I'm willing to try to implement this on my own time, but I'm not sure I will be able to.I assume the runtime cost will be zero or negligible.
I believe this change is backwards compatible, as the current behavior of
Decode
in case of a type error is not documented, so every error must be assumed to be fatal. Either way, the returned error can't be safely inspected, so no user should rely on the actual returned error value.The text was updated successfully, but these errors were encountered: