You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, from what I can tell this may be a bug here (vs me misunderstanding)
I'm trying to deserialize JSON that contains strings with unpaired "lone" surrogates. I saw eg #828 and so tried making the "target" field type a serde_bytes::ByteBuf but am still seeing it fail with an "unexpected end of hex escape" deserialization error
I made a test repo at https://github.com/helixbass/test-serde-bytes-unpaired-surrogate-deserialization that seems to demonstrate that deserializing lone surrogates into a "top-level" ByteBuf field or nested inside an outer struct is working as expected but that for some reason (I don't understand the "inner machinery" of serde/serde_json so don't really have a guess as to why this is) when it's nested inside eg a tagged or untagged enum type it is failing
The text was updated successfully, but these errors were encountered:
Sniffing at eg the output of cargo expand, it looks like the basic reason this is happening is because the derived Deserialize implementations for untagged/internally-tagged enums use .deserialize_any() (+ Content?) which has to sort of "blindly" decide what to do with a JSON string that it encounters, and so understandably defaults to trying to deserialize it as a "string" (vs as "bytes")
So I'm assuming this should not be considered a "bug" of any kind but am curious if there are known strategies that would enable using serde/serde_json to deserialize eg internally-tagged enums with fields that may contain unpaired surrogates?
The two general ideas that I can picture are:
Have like an alternate version of serde_json whose .deserialize_any() defaults to treating JSON strings as bytes, not as strings
If you know that your "tag" field should come first in the JSON (which I think is a safe assumption/invariant in my use case) then it seems like you could defer deserializing the rest of the JSON object keys/values until after you've deserialized/"recognized" the tag field (at which point you could then avoid using .deserialize_any() because you'd know which enum variant you were deserializing, similar to how I'm assuming that "externally tagged" enums apparently avoid this issue)?
Hi, from what I can tell this may be a bug here (vs me misunderstanding)
I'm trying to deserialize JSON that contains strings with unpaired "lone" surrogates. I saw eg #828 and so tried making the "target" field type a
serde_bytes::ByteBuf
but am still seeing it fail with an"unexpected end of hex escape"
deserialization errorI made a test repo at https://github.com/helixbass/test-serde-bytes-unpaired-surrogate-deserialization that seems to demonstrate that deserializing lone surrogates into a "top-level"
ByteBuf
field or nested inside an outer struct is working as expected but that for some reason (I don't understand the "inner machinery" ofserde
/serde_json
so don't really have a guess as to why this is) when it's nested inside eg a tagged or untagged enum type it is failingThe text was updated successfully, but these errors were encountered: