Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RON incorrectly adds extra level of Base64 when roundtripping Bytes #436

Closed
Tracked by #397
GoldsteinE opened this issue Jan 29, 2023 · 2 comments · Fixed by #438
Closed
Tracked by #397

RON incorrectly adds extra level of Base64 when roundtripping Bytes #436

GoldsteinE opened this issue Jan 29, 2023 · 2 comments · Fixed by #438
Labels

Comments

@GoldsteinE
Copy link

This code (playground):

#[derive(Debug, serde::Deserialize, serde::Serialize)]
#[serde(rename = "b")]
struct BytesVal {
    pub b: bytes::Bytes,
}

#[derive(Debug, serde::Deserialize, serde::Serialize)]
#[serde(untagged)]
enum Bad {
    Bytes(BytesVal),
}

fn main() {
    let v: Bad = ron::from_str(r#"(b: "dGVzdA==")"#).unwrap();
    dbg!(&v);
    let s = ron::to_string(&v).unwrap();
    dbg!(&s);
    let v: Bad = ron::from_str(&s).unwrap();
    dbg!(&v);
    println!("---");
    let v: Bad = serde_json::from_str(r#"{"b": "dGVzdA=="}"#).unwrap();
    dbg!(&v);
    let s = serde_json::to_string(&v).unwrap();
    dbg!(&s);
    let v: Bad = serde_json::from_str(&s).unwrap();
    dbg!(&v);
}

has the following output:

[src/main.rs:23] &v = Bytes(
    BytesVal {
        b: b"dGVzdA==",
    },
)
[src/main.rs:25] &s = "(b:\"ZEdWemRBPT0=\")"
[src/main.rs:27] &v = Bytes(
    BytesVal {
        b: b"ZEdWemRBPT0=",
    },
)
---
[src/main.rs:30] &v = Bytes(
    BytesVal {
        b: b"dGVzdA==",
    },
)
[src/main.rs:32] &s = "{\"b\":[100,71,86,122,100,65,61,61]}"
[src/main.rs:34] &v = Bytes(
    BytesVal {
        b: b"dGVzdA==",
    },
)

It fully survives the roundtrip with serde_json, but adds extra level of Base64 with RON.

@juntyr
Copy link
Member

juntyr commented Jan 29, 2023

This really is a tricky one! I think though that I now understand what is going on here. The problem is the #[serde(untagged)] which means that the ron string is deserialised without having any type information available. Since ron encodes bytes as base64 encoded strings, but without declaring that they are bytes specifically, the deserialiser just sees a string when no type information is available. When bytes::Bytes then deserialises, it sees a string and just gets its bytes. I.e. even though they were base64 bytes, at no point was ron asked to turn them back into bytes.

ron in general is a format that does not support deserialising without any type information (i.e. anything that touches deserialize_any). However, I agree that this is a very confusing and unfortunate case. The only way of fixing this (to work with untagged enums) that I can think of is to make a breaking change to ron's format in how bytes are represented. It could be as simple as prefixing any byte string with b, i.e. b"base64" during serialisation. During deserialisation, we could continue to accept a string without the b-prefix if type information is available. However, old ron code would no longer be able to read files produced by new code. Those would be tough decisions to make ...

@torkleyy What are your thoughts on this?

@torkleyy
Copy link
Contributor

I think having special support built-in for bytes / base64 encoding can be an advantage, even if it means breaking some compatibility. I'm not sure if we can ease the transition somehow to not cause any breakage in practice.

@juntyr juntyr mentioned this issue Aug 18, 2023
17 tasks
juntyr added a commit to juntyr/ron that referenced this issue Aug 25, 2023
juntyr added a commit to juntyr/ron that referenced this issue Sep 1, 2023
juntyr added a commit that referenced this issue Sep 1, 2023
* Switch from base64 to rusty byte strings, deprecate base64 support

* Add the Value::Bytes variant

* Extend Value tests for Value::String and Value::Bytes

* Include byte strings in the RON grammar

* Fix ASCII escape decoding for strings and byte strings

* Fix byte string error display for #462 test

* Fix byte string error test

* Add a CHANGELOG entry

* Added a deprecation error test for v0.10

* Add tests for v0.9 optional base64 byte string support

Co-authored-by: Sebastian Dröge <sebastian@centricular.com>

* Add an example for using base64-encoded bytes with ron

* Fix formatting in README

* Remove outdated extension docs

* Add tests for unescaped and raw byte strings

* Fix fuzzer-found issue with serialising invalid UTF-8 byte strings

* Fix fuzzer found issue with `br#` being parsed as the identifier `br`

* Fix parsing of byte escapes in UTF-8 strings to produce proper Unicode characters

* Fix fuzzer-found interaction with unwrap_variant_newtypes

* Add support for strongly typed byte literals

* Add missing Value serialising tests

* Add test to show that #436 is solved with strongly typed base64 user-side types

* Add more coverage tests

---------

Co-authored-by: Sebastian Dröge <sebastian@centricular.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants