Add value (de)serialization #47

georgeclaghorn · 2023-01-16T21:09:03Z

napi is a library that provides Rust-Node.js bindings. It provides the function to_js_value, enabled by a feature flag, which uses Serde to “serialize” any Rust type into a JS value. A complementary function, from_js_value, deserializes properly-formatted JS values back into Rust types. We can do the same thing for Ruby.

Here’s why I’m interested in this feature. I’m working on Classmate, a library of Ruby bindings to the Lightning CSS Rust crate. On Node.js, Lightning CSS offers a powerful custom transforms feature. It traverses a parse tree for a stylesheet and calls user-provided functions for each node. The provided functions can return replacement nodes. Under the hood, the syntax nodes are converted from Rust structs to JS objects using napi to_js_value. The returned substitute nodes are converted from JS objects back to Rust structs using from_js_value.

I would like to implement an equivalent feature in Classmate. I’d very much prefer not to have to create a Typed Data wrapper or build my own Hash for every type of syntax node. There are quite a lot, because CSS is a complex language, and they change frequently as Lightning CSS gains support for new CSS features.

This PR introduces a new Magnus feature flag named serde. When the serde feature is enabled, two new functions are exposed: magnus::value::serialize and magnus::value::deserialize. serialize takes any Rust type that implements serde::Serialize and converts it into a Ruby approximation. deserialize turns a Ruby value back into a Rust type that implements serde::Deserialize. See the included docs for serialize and deserialize to understand how they translate between Rust and Ruby.

matsadler

This is really neat!

I think right now this might be better as a seperate serde_magnus crate. There are a few different ways I can see people wanting serialisation to work (do you serialise to hashes, structs, objects?) and with more people starting to use Magnus it's getting more awkward to experiment with different approaches within Magnus itself without breaking things. If it turns out there are two ways people want serialisation to work then they can chose between two crates, or if it turns out everyone settles on one then I'd be happy to merge it into Magnus.

I think with a few tweaks everything here should be able to be implemented without access to Magnus internals.

I'll make RBignum::is_negative public as Ruby has a public api (as a macro) for this.

I was aiming to make the various Fixnum::from_value(), RArray::from_value(), etc functions the way to assert a Value is of a particular type, rather than exposing the Value::rb_type() method. I think a chain of if let Some(x) = Foo::from_value() { calls should compile down to pretty much the same as a match value.rb_type() {.

If that sounds good, then please let me know when you have a crate available and I'll link to it from the Magnus readme & docs. If not, I'm happy to discuss alternatives.

matsadler · 2023-01-17T05:39:07Z

src/error.rs

@@ -126,6 +128,8 @@ impl fmt::Display for Error {
    }
 }

+impl error::Error for Error {}


My worry with implementing std::error::Error for magnus::Error is that the ? operator can convert any Result<T, E> where E: std::error::Error to Result<T, Box<dyn Error>>. Example:

fn ruby_add(a: i64, b: i64) -> Result<i64, Box<dyn Error>> { eval!("a + b", a, b)? }

magnus::Error may contain a Ruby exception object, and with it in a Box like that the object can't be seen by the garbage collector, so it might get collected, and then when you try and print the error you get a segfault.

For the most part I'm happy to just add the disclaimer to the docs saying "Don't put Ruby objects on the heap (i.e. in a Box/Vec/Hash/etc)" because there isn't really any way to prevent it with the type system/lifetime rules, but this one seems like such an easy to make mistake, and the utility of implementing std::error::Error seems pretty low.

matsadler · 2023-01-17T05:45:59Z

src/value/de.rs

+/// ```
+pub fn deserialize<'i, T, U>(value: T) -> Result<U, Error>
+where
+    T: Into<Value>,


Use T: Deref<Target = Value> to accept any type that is a Value (e.g. RArray, Fixnum, etc).

Into<Value> is anything that converts to a Value, this includes Rust types like Vec and i64. It'd be weird to pass a Rust value to convert into Ruby, to then deserialise back to Rust.

matsadler · 2023-01-17T05:52:31Z

src/value/de.rs

+    where
+        T: Visitor<'i>,
+    {
+        visitor.visit_bytes(self.value.try_convert::<String>()?.as_bytes())


I think you want something like:

if let Some(s) = RString::from_value(self.value) { unsafe { visitor.visit_bytes(s.as_bytes()) } } else { // error }

Converting to a Rust String will do a utf-8 check on the Ruby string and error if it contains non-utf-8 bytes.

matsadler · 2023-01-17T06:00:51Z

src/value/de.rs

+}
+
+struct HashDeserializer<'i> {
+    hash: &'i RHash,


I might be missing something here, but I don't think you need the reference here. All Ruby types (RHash, Value, etc) are basically pointers, so effectively they already are references (with the real data being owned by Ruby), and they all implement Copy, so like

let a = RHash::new(); let b = a; a.aset("foo", "bar").unwrap(); let x: String = b.fetch("foo").unwrap(); assert_eq!(x, "bar");

georgeclaghorn · 2023-01-18T04:26:59Z

Thanks @matsadler! I started on this over at georgeclaghorn/serde-magnus, incorporating your feedback.

However, I’m fleshing out the tests (georgeclaghorn/serde-magnus@d0bc6ba), and I’m getting some sporadic failures that suggest Serde is moving Values to the heap. I haven’t looked more closely yet, but I do see some suspicious Box and Vec usage in Serde (example), so this experiment might be over.

matsadler · 2023-01-18T05:16:58Z

The errors you're seeing might be down to Rust's test runner. It runs tests in parallel in threads, which Ruby really doesn't like. It might mostly work if you put the cleanup object in a mutex, so the tests run one at a time, but the rules Ruby lays out for calling init (once, and in a stack frame above whatever is calling Ruby) are basically impossible to follow in the Rust unit tests. I'm not sure how much Ruby really cares about you following those rules.

In Magnus I've worked around this using Rust's 'integration' style tests, with one test method per file. Each file will be run as a separate process, so if you only have one test per file, then Ruby is happy.

Magnus also makes extensive use of doctests, as each doctest is compiled and run separately.

I think the nextest runner might run each test as a separate process, so that might be worth a look too (I've not seriously investigated for Magnus it as it didn't support doctests when I last looked).

It might be possible to work around Serde putting stuff on the heap.

You could do something like create a new Ruby Array at the start of the serialize function, and make sure this is passed through to all the serialisation functions, then every Ruby object you create you also add to the array. That way even if Serde moves that object to the heap there's still a route through the array for the GC to find it. Then at the end of the serialize function hopefully all the Boxes and Vecs should have gone away, the result is a full tree of Ruby objects, and you just discard the array.

georgeclaghorn · 2023-01-20T02:06:04Z

Switching to integration tests resolved my issues.

Regarding Serde possibly moving values on the heap: I think this is possibly not an issue in practice because each value Serde might temporarily Box ends up back on the stack before we call into Ruby again. Let me know if I’m wrong about that. Tracking intermediate values in a Ruby Array is a great idea if it’s necessary.

I published v0.1.0 of serde_magnus. Thanks again for your help.

Add value (de)serialization

a3a11e2

matsadler reviewed Jan 17, 2023

View reviewed changes

georgeclaghorn closed this Jan 20, 2023

georgeclaghorn deleted the serde branch January 20, 2023 02:06

gintama91 mentioned this pull request Jul 20, 2023

serde_magnus discontinued? #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add value (de)serialization #47

Add value (de)serialization #47

georgeclaghorn commented Jan 16, 2023 •

edited

Loading

matsadler left a comment

matsadler Jan 17, 2023

matsadler Jan 17, 2023

matsadler Jan 17, 2023

matsadler Jan 17, 2023

georgeclaghorn commented Jan 18, 2023 •

edited

Loading

matsadler commented Jan 18, 2023

georgeclaghorn commented Jan 20, 2023

Add value (de)serialization #47

Add value (de)serialization #47

Conversation

georgeclaghorn commented Jan 16, 2023 • edited Loading

matsadler left a comment

Choose a reason for hiding this comment

matsadler Jan 17, 2023

Choose a reason for hiding this comment

matsadler Jan 17, 2023

Choose a reason for hiding this comment

matsadler Jan 17, 2023

Choose a reason for hiding this comment

matsadler Jan 17, 2023

Choose a reason for hiding this comment

georgeclaghorn commented Jan 18, 2023 • edited Loading

matsadler commented Jan 18, 2023

georgeclaghorn commented Jan 20, 2023

georgeclaghorn commented Jan 16, 2023 •

edited

Loading

georgeclaghorn commented Jan 18, 2023 •

edited

Loading