You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm writing a serde implementation for the X12 data format, which is a very old, widely used, and enterprise-y format that functions vaguely similarly to a CSV. Delimiters are configurable, but generally speaking, each file will contain a series of segments separated by ~. Each segment begins with a segment code, followed by *, followed by however many fields are contained in that segment, delimited by *. Elements/fields can themselves be containers of multiple values, separated by :. Here's an example (simplified for brevity):
ISA*00*00~
GS*HC*123*1234~
ST*837*000000001*0000~
One approach to do this is a generic way, and while I haven't spent as much time on this approach, it feels fairly straight forward. (assume #[derive(Serialize, Deserialize) for all structs)
For this to work, there's a notion of a "passthrough" struct, which is just a container and has no name to match on (e.g. Document above has no "name" associated), and "segment" structs, which fail to parse if the first consumed token does not match.
It's a little hacky to "detect" whether a struct is passthrough or not, but so far, the story works out pretty well.
Where this breaks down significantly, is that some structs have optional fields. So there's a struct like:
// note there's no name, so this is passthough/containerstructPerson{name:Nm,address:Option<Loc>}#[serde(name = "NM")]structNm{first:String,last:String,}#[serde(name = "LOC")]structLoc{first:String,last:String,}
What needs to happen is the deserializer needs to check the first token, and ask, "does this match what's inside Option?", and if so, give the value. If it doesn't match, then set the field to None. However, it seems like serde's API doesn't support that kind of backtracking, and it only supports looking at the string statically (i.e. without knowledge of what's contained in the option) to decide whether something is Some(T) or None.
In code, the solution I'd like to write would look something like this:
impl<'de,'a>Deserializer<'de>for&'a mutX12Deserializer<'de>{fndeserialize_option<V>(self,visitor:V) -> Result<V::Value>whereV:Visitor<'de>,{match visitor.visit_some(self){Ok(v) => Ok(v),Err(X12DeserializerError::InvalidType{ .. }) => visitor.visit_none(),// alternatively, there's no way to return V::default() or similarErr(e) => Err(e),}}}
However, this code doesn't compile, because all the visitor methods take an own self, so only one method can be called. There's also no way to call V::default() or just directly return None because of the (lack of) generic bounds.
Am I thinking about this problem in the right way? What is a valid solution to doing some kind of backtracking?
(I was able to get the code compiling and working correctly by using unsafe to forcibly copy the visitor, but that's ridiculously unsafe voodoo magic, but it demonstrates the logic works as desired. I think the only reason this doesn't seg fault is visitor.visit_none might be a no-op in many implementations (?). )
Interestingly, implementing back tracking for Vec is supported because each element of the Vec is a fallible deserialization. Match/catch the appropriate error, and backtracking is achieved. I thought I'd be able to call visit_seq from deserialize_option, using a custom SeqAccess implementer that itself validates that the seq is of length 0 or 1. After all, Option can be thought of as an iterator of max length 1. However, that fails with an Unexpected type error, with seq received but expected Option.
The text was updated successfully, but these errors were encountered:
kurtbuilds
changed the title
Backtracking during deserialization
Deserialization of X12 data format / Backtracking during Option<T> deserialization
Feb 26, 2024
I'm writing a serde implementation for the X12 data format, which is a very old, widely used, and enterprise-y format that functions vaguely similarly to a CSV. Delimiters are configurable, but generally speaking, each file will contain a series of segments separated by
~
. Each segment begins with a segment code, followed by*
, followed by however many fields are contained in that segment, delimited by*
. Elements/fields can themselves be containers of multiple values, separated by:
. Here's an example (simplified for brevity):One approach to do this is a generic way, and while I haven't spent as much time on this approach, it feels fairly straight forward. (assume
#[derive(Serialize, Deserialize)
for all structs)So the above document would get parsed into:
However, to give these structs semantic meaning, I'd like to parse them into data structures like this:
For this to work, there's a notion of a "passthrough" struct, which is just a container and has no name to match on (e.g. Document above has no "name" associated), and "segment" structs, which fail to parse if the first consumed token does not match.
It's a little hacky to "detect" whether a struct is passthrough or not, but so far, the story works out pretty well.
Where this breaks down significantly, is that some structs have optional fields. So there's a struct like:
What needs to happen is the deserializer needs to check the first token, and ask, "does this match what's inside Option?", and if so, give the value. If it doesn't match, then set the field to None. However, it seems like
serde
's API doesn't support that kind of backtracking, and it only supports looking at the string statically (i.e. without knowledge of what's contained in the option) to decide whether something isSome(T)
orNone
.In code, the solution I'd like to write would look something like this:
However, this code doesn't compile, because all the
visitor
methods take an ownself
, so only one method can be called. There's also no way to callV::default()
or just directly returnNone
because of the (lack of) generic bounds.Am I thinking about this problem in the right way? What is a valid solution to doing some kind of backtracking?
(I was able to get the code compiling and working correctly by using unsafe to forcibly copy the
visitor
, but that's ridiculously unsafe voodoo magic, but it demonstrates the logic works as desired. I think the only reason this doesn't seg fault is visitor.visit_none might be a no-op in many implementations (?). )Interestingly, implementing back tracking for Vec is supported because each element of the Vec is a fallible deserialization. Match/catch the appropriate error, and backtracking is achieved. I thought I'd be able to call visit_seq from deserialize_option, using a custom SeqAccess implementer that itself validates that the seq is of length 0 or 1. After all, Option can be thought of as an iterator of max length 1. However, that fails with an Unexpected type error, with seq received but expected Option.
The text was updated successfully, but these errors were encountered: