Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional user-controlled version tag #2

Closed
ecton opened this issue Dec 21, 2021 · 5 comments
Closed

Add optional user-controlled version tag #2

ecton opened this issue Dec 21, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@ecton
Copy link
Member

ecton commented Dec 21, 2021

One common problem with long-lived serialized types is version compatibility. Sometimes, it's best to just start fresh with a structure design. I realized that since Pot has a built-in version header, we could adopt a new version that supports encoding a single integer value at the front of the Pot file.

Parsing a Pot file would use an API like this:

pot::from_slice(&data, |version, deserializer| {
    match version {
        0 => CurrentStruct::from(deserializer.deserialize::<V0Struct>()),
        1 => deserializer.deserialize::<CurrentStruct>(),
        _ => Err(pot::Error::UnknownVersion)
    }
})?

I'm sure an even better pattern can be devised, but that snippet should get the idea across of what I want to accomplish. Parsing a file with an older header would call the callback with 0, allowing someone to upgrade to this new flow from existing data without any breaking changes.

@ecton ecton added the enhancement New feature or request label Dec 21, 2021
@ecton ecton added this to To do in Khonsu Labs Roadmap via automation Dec 21, 2021
@daxpedda
Copy link
Member

I was trying to come up with a solution how to automate this or make it less error-prone and have to write less manual code.

  • Version wrapper enum:
    enum StructVersions {
      V0(V0Struct),
      V1(V1Struct),
      Current(CurrentStruct),
    }
  • The code you wrote could also be automatically generated by a proc macro:
    #[pot_versioned(2)]
    struct CurrentStruct(..);
    The 2 signifies the current version, so it will try to find V0Name and V1Name and generate the code you wrote above. Maybe it would generate a method called deserialize_versioned for example.
  • Using a combination of both could make it easier to provide compatibility with other serializers. It could be used with https://serde.rs/field-attrs.html#with for example.

Just brain-dumping, hope you find any of it useful.

@ecton
Copy link
Member Author

ecton commented Dec 22, 2021

so it will try to find V0Name and V1Name and generate the code you wrote above

My understanding of proc macros is that what you wrote is impossible. I was pretty sure that proc macros only can operate on the type they're being attached to and cannot find any other code outside of the TokenStream its associated with. If my assumption is correct, then CurrentStruct's implementation can't actually find StructVersions, and even if it could, it couldn't look at the contents of StructVersions.

It could be used with https://serde.rs/field-attrs.html#with for example.

I don't think this is possible, because at the point that the with function is being invoked, the only way to retrieve bytes is to call deserialize_bytes() to retrieve the payload. In the current pushed code of dataversion, this isn't an issue because the callback receives a &[u8] but in the new code, it receives a Read generic -- allowing support for streaming dataversion'ed payloads. But, part of that is not knowing how long the input is.

If I proceed with allowing streaming deserialization of payloads, I don't think there's a way to use with.

I do think we can improve dataversion, but I'm also pretty sold on this being separate from Pot at this point. I don't have a good enough reason to leave it in Pot.

@ecton
Copy link
Member Author

ecton commented Dec 22, 2021

If I proceed with allowing streaming deserialization of payloads, I don't think there's a way to use with.

I did -- and this is what I meant by deserializing receiving a Read generic. The complexity arises from needing to be able to peek-read the header, in case we decide it's not actually a dataversion payload. Thus, we have to wrap the original Read in a BufReader to accomplish this, and then that resulting BufReader is what must be used to deserialize the remaining data.

https://github.com/khonsulabs/dataversion/blob/eaf53d44ddc3064493429c3cf0af47b756faf0f5/examples/switching-serializers.rs#L55

One bonus of how dataversion is approaching this currently is that dataversion can support migrating between Serde and non-Serde serialization frameworks. Start with rkyv and want to move to serde? No problem.

I think we can still improve dataversion more without proc macros, but I also think that you're right -- ultimately to make this easy, we will need a proc macro.

@daxpedda
Copy link
Member

so it will try to find V0Name and V1Name and generate the code you wrote above

My understanding of proc macros is that what you wrote is impossible. I was pretty sure that proc macros only can operate on the type they're being attached to and cannot find any other code outside of the TokenStream its associated with. If my assumption is correct, then CurrentStruct's implementation can't actually find StructVersions, and even if it could, it couldn't look at the contents of StructVersions.

It can't find anything. But if you write #[dataversion(2)] it can simply use V0Struct and V1Struct by simply using the name, if it can't find them it will just throw a compiler error. Basically was this does is use the fact that version names are standardized (they are not, we are requiring a specific version-naming-scheme for the proc-macro to work), to implement this.

We also don't need access to the contents, it can't implement the From implementation, the user has to do that anyway. All it does is just build the match statement. Basically, the user only has to write From conversion for all versions and name them correctly, the rest is taken care of.

It could be used with https://serde.rs/field-attrs.html#with for example.

I don't think this is possible, because at the point that the with function is being invoked, the only way to retrieve bytes is to call deserialize_bytes() to retrieve the payload. In the current pushed code of dataversion, this isn't an issue because the callback receives a &[u8] but in the new code, it receives a Read generic -- allowing support for streaming dataversion'ed payloads. But, part of that is not knowing how long the input is.

If I proceed with allowing streaming deserialization of payloads, I don't think there's a way to use with.

I had to read this a couple of times, lol, I'm very much out of the serialization game. The reason I'm proposing the enum and #[serde(with = ...)] is to make compatibility as easy as possible. That way users can use the same API as the always did, they just need to deserialize a different type, which then gets converted to the type they actually want.
Actually, thinking about it, if we generate this enum and use the usual de/serialize, I'm not sure what else we need to do.
The limitation you were describing, I have no clue really, sounded to me like we are trying to handle the version serialization ourselves, which I believe isn't necessary if the enum already contains that information.

I'm sry if I'm not understanding something here properly.

I do think we can improve dataversion, but I'm also pretty sold on this being separate from Pot at this point. I don't have a good enough reason to leave it in Pot.

That sounds great to me.

@ecton
Copy link
Member Author

ecton commented Dec 24, 2021

After discussing on Discord we got on the same page. The process will be continued at khonsulabs/dataversion, although I may rename that crate depending on what all it tackles.

@ecton ecton closed this as completed Dec 24, 2021
Khonsu Labs Roadmap automation moved this from To do to Done Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants