Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add value (de)serialization #47

Closed
wants to merge 1 commit into from
Closed

Add value (de)serialization #47

wants to merge 1 commit into from

Conversation

georgeclaghorn
Copy link
Contributor

@georgeclaghorn georgeclaghorn commented Jan 16, 2023

napi is a library that provides Rust-Node.js bindings. It provides the function to_js_value, enabled by a feature flag, which uses Serde to “serialize” any Rust type into a JS value. A complementary function, from_js_value, deserializes properly-formatted JS values back into Rust types. We can do the same thing for Ruby.

Here’s why I’m interested in this feature. I’m working on Classmate, a library of Ruby bindings to the Lightning CSS Rust crate. On Node.js, Lightning CSS offers a powerful custom transforms feature. It traverses a parse tree for a stylesheet and calls user-provided functions for each node. The provided functions can return replacement nodes. Under the hood, the syntax nodes are converted from Rust structs to JS objects using napi to_js_value. The returned substitute nodes are converted from JS objects back to Rust structs using from_js_value.

I would like to implement an equivalent feature in Classmate. I’d very much prefer not to have to create a Typed Data wrapper or build my own Hash for every type of syntax node. There are quite a lot, because CSS is a complex language, and they change frequently as Lightning CSS gains support for new CSS features.

This PR introduces a new Magnus feature flag named serde. When the serde feature is enabled, two new functions are exposed: magnus::value::serialize and magnus::value::deserialize. serialize takes any Rust type that implements serde::Serialize and converts it into a Ruby approximation. deserialize turns a Ruby value back into a Rust type that implements serde::Deserialize. See the included docs for serialize and deserialize to understand how they translate between Rust and Ruby.

Copy link
Owner

@matsadler matsadler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really neat!

I think right now this might be better as a seperate serde_magnus crate. There are a few different ways I can see people wanting serialisation to work (do you serialise to hashes, structs, objects?) and with more people starting to use Magnus it's getting more awkward to experiment with different approaches within Magnus itself without breaking things. If it turns out there are two ways people want serialisation to work then they can chose between two crates, or if it turns out everyone settles on one then I'd be happy to merge it into Magnus.

I think with a few tweaks everything here should be able to be implemented without access to Magnus internals.

I'll make RBignum::is_negative public as Ruby has a public api (as a macro) for this.

I was aiming to make the various Fixnum::from_value(), RArray::from_value(), etc functions the way to assert a Value is of a particular type, rather than exposing the Value::rb_type() method. I think a chain of if let Some(x) = Foo::from_value() { calls should compile down to pretty much the same as a match value.rb_type() {.

If that sounds good, then please let me know when you have a crate available and I'll link to it from the Magnus readme & docs. If not, I'm happy to discuss alternatives.

@@ -126,6 +128,8 @@ impl fmt::Display for Error {
}
}

impl error::Error for Error {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry with implementing std::error::Error for magnus::Error is that the ? operator can convert any Result<T, E> where E: std::error::Error to Result<T, Box<dyn Error>>. Example:

fn ruby_add(a: i64, b: i64) -> Result<i64, Box<dyn Error>> {
    eval!("a + b", a, b)?
}

magnus::Error may contain a Ruby exception object, and with it in a Box like that the object can't be seen by the garbage collector, so it might get collected, and then when you try and print the error you get a segfault.

For the most part I'm happy to just add the disclaimer to the docs saying "Don't put Ruby objects on the heap (i.e. in a Box/Vec/Hash/etc)" because there isn't really any way to prevent it with the type system/lifetime rules, but this one seems like such an easy to make mistake, and the utility of implementing std::error::Error seems pretty low.

/// ```
pub fn deserialize<'i, T, U>(value: T) -> Result<U, Error>
where
T: Into<Value>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use T: Deref<Target = Value> to accept any type that is a Value (e.g. RArray, Fixnum, etc).

Into<Value> is anything that converts to a Value, this includes Rust types like Vec and i64. It'd be weird to pass a Rust value to convert into Ruby, to then deserialise back to Rust.

where
T: Visitor<'i>,
{
visitor.visit_bytes(self.value.try_convert::<String>()?.as_bytes())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want something like:

if let Some(s) = RString::from_value(self.value) {
    unsafe {
        visitor.visit_bytes(s.as_bytes())
    }
} else {
    // error
}

Converting to a Rust String will do a utf-8 check on the Ruby string and error if it contains non-utf-8 bytes.

}

struct HashDeserializer<'i> {
hash: &'i RHash,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something here, but I don't think you need the reference here. All Ruby types (RHash, Value, etc) are basically pointers, so effectively they already are references (with the real data being owned by Ruby), and they all implement Copy, so like

let a = RHash::new();
let b = a;
a.aset("foo", "bar").unwrap();
let x: String = b.fetch("foo").unwrap();
assert_eq!(x, "bar");

@georgeclaghorn
Copy link
Contributor Author

georgeclaghorn commented Jan 18, 2023

Thanks @matsadler! I started on this over at georgeclaghorn/serde-magnus, incorporating your feedback.

However, I’m fleshing out the tests (georgeclaghorn/serde-magnus@d0bc6ba), and I’m getting some sporadic failures that suggest Serde is moving Values to the heap. I haven’t looked more closely yet, but I do see some suspicious Box and Vec usage in Serde (example), so this experiment might be over.

@matsadler
Copy link
Owner

The errors you're seeing might be down to Rust's test runner. It runs tests in parallel in threads, which Ruby really doesn't like. It might mostly work if you put the cleanup object in a mutex, so the tests run one at a time, but the rules Ruby lays out for calling init (once, and in a stack frame above whatever is calling Ruby) are basically impossible to follow in the Rust unit tests. I'm not sure how much Ruby really cares about you following those rules.

In Magnus I've worked around this using Rust's 'integration' style tests, with one test method per file. Each file will be run as a separate process, so if you only have one test per file, then Ruby is happy.

Magnus also makes extensive use of doctests, as each doctest is compiled and run separately.

I think the nextest runner might run each test as a separate process, so that might be worth a look too (I've not seriously investigated for Magnus it as it didn't support doctests when I last looked).


It might be possible to work around Serde putting stuff on the heap.

You could do something like create a new Ruby Array at the start of the serialize function, and make sure this is passed through to all the serialisation functions, then every Ruby object you create you also add to the array. That way even if Serde moves that object to the heap there's still a route through the array for the GC to find it. Then at the end of the serialize function hopefully all the Boxes and Vecs should have gone away, the result is a full tree of Ruby objects, and you just discard the array.

@georgeclaghorn
Copy link
Contributor Author

Switching to integration tests resolved my issues.

Regarding Serde possibly moving values on the heap: I think this is possibly not an issue in practice because each value Serde might temporarily Box ends up back on the stack before we call into Ruby again. Let me know if I’m wrong about that. Tracking intermediate values in a Ruby Array is a great idea if it’s necessary.

I published v0.1.0 of serde_magnus. Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants