scijava-persist : persisting objects with Gson and scijava discovery mechanism #30

NicoKiaru · 2021-05-03T14:10:09Z

This draft PR brings a scijava service DefaultScijavaAdapterService which provides a way to serialize and deserialize java objects.

Vocabulary : adapter = serializer (object to text) + deserializer (text to object)

The most meaningful demo is in SerializationTests.java. A test is included, which basically makes a loop: object -> serialization -> string1 -> deserialization -> object -> serialization -> string2, and checks that string1.equals(string2)

Scijava persist uses Gson serialization ( with RunTime adapters, which provide a way to serialize/deserialize objects extending abstract class or implements interface, see javadoc of RuntimeTypeAdapterFactory), and discovers and registers adapters at runtime. The adapters which are autodiscovered are of two kinds:

IClassAdapter
IClassRuntimeAdapter for adapters which require runtime adaptation (is this RealTransform object an affine transform ? or a sequence of transform, or a thinplateSplineTransform ? One runtime adapter needs to be registered per class)

A few points to discuss:

I modified the pom of other incubated repo to make it work - it's probably related to the fact that I couldn't make it work for Java 11
The serializer can be built and retrieved from a static class which require a scijava context as an input. I don't know if you like it. If not, we can change that (see ScijavaGsonHelper)
Gson serialization is super convenient, but how to ensure compatibility on the long term ? Versions ? How to put version and keeping this convenient to use ?
Potentially, people can write two different serializers for the same class of objects. How to best deal with potential conflicts ?
To avoid some conflicts, we could use the full class name when writing runtime adapters instead of the simple name (think java.awt.List vs java.util.List)
Maybe, instead of making a big serializer which gather all adapters, we should tag (with annotations) adapters based on some other properties ( their repositories ), in order to be more modular about which adapters you want or which you don't want.

gselzer · 2021-05-21T18:40:23Z

Hey @NicoKiaru! Welcome to the incubator!

This new module looks super cool! Let's see if we can't get this merged 💪

I modified the pom of other incubated repo to make it work - it's probably related to the fact that I couldn't make it work for Java 11

I'd be happy to help you out. I cloned your fork, it seems that your issues are due to version differences in scijava-common; the incubator is a bit behind the version you are using. It might help to use AbstractTestEnvironment as a blueprint for the Context you are using. I'd be happy to try it myself, if you are okay with that.

While we are at it, shall we make this a JPMS module? This is the direction @ctrueden and I are pushing for all of scijava. All you need is a module-info.java; you can just copy and alter SciJava Ops' module-info to your liking.
Is there a reason that you are adding ImageJ as a dependency? Could we get away with a dependency only on SciJava Common? Ideally, we don't want SciJava modules to depend on ImageJ modules.

Vocabulary : adapter = serializer (object to text) + deserializer (text to object)

How tied are you to this vocabulary? "adapter" seems a bit general for this context (maybe it just rubs me wrong because we already use that terminology for op adaptations in SciJava Ops). What do you think about the name Persister? Transcriber?

This commit makes a few changes which are needed in combination, resulting in a incubator ready Maven module 1) Adds scijava-persist as a child module of the incubator 2) Denotes the incubator as the parent of scijava-persist 3) Removes the ImageJ dependency in favor of SciJava Common

gselzer · 2021-05-21T20:37:38Z

@NicoKiaru I added two commits to your PR. Feel free to keep either or both of them:

The first aligns scijava-persist with the incubator model. As per the commit message, I had to:

Declare the incubator as the parent
Add scijava-persist as a module of the incubator
Remove the imagej dependency in favor of SJC. I didn't see any usage of ImageJ beyond obtaining a Context, which we can do on our own with just SJC. If you need ImageJ for anything else, more work should be done. Please let me know if this is the case.

The second is really an in-progress commit that attempts to make a JPMS Module out of scijava-persist. Unfortunately, this creates a few compile errors because the GSON developers did not intend for the com.google.gson.internal package to be used outside of the project. Even if we don't make a JPMS module out of scijava-persist, we should probably avoid this API, yeah?

After reading through the code a couple more times, I think I can answer some more of your discussion points:

The serializer can be built and retrieved from a static class which require a scijava context as an input. I don't know if you like it. If not, we can change that (see ScijavaGsonHelper)

Hmm, I think that ScijavaGsonHelper is definitely needed, however I wonder if we could avoid exposing all of the Gson classes. Do you think it would be worth writing some abstract JSON serializer/deserializer interface (Transcriber? A good name escapes me...) that the IObjectScijavaAdapterService is responsible for creating? Something like how OpService is responsible for returning an OpEnvironment, which handles the usage of Ops? In a similar way, your IObjectScijavaAdapterService could have a method initTranscriber(), which returns a Transcriber. We could then have a GSONAdapterService, with method initTranscriber(), containing relevant logic from ScijavaGSONHelper, returning some wrapped version of Gson? Does this make any sense? It would promote extensibility, and might provide some sort of answer about this question:

Gson serialization is super convenient, but how to ensure compatibility on the long term ? Versions ? How to put version and keeping this convenient to use ?

I might be completely crazy, but it sounds like a nice idea in my head 😁

Potentially, people can write two different serializers for the same class of objects. How to best deal with potential conflicts ?

My first instinct suggests that we should fail/skip both of the relevant serializers, and log an error. I think it is a mistake to do otherwise; if there are multiple serializer implementations, they should be reconciled into one. We do not want any confusion about which is being run. What do you think of this plan?

To avoid some conflicts, we could use the full class name when writing runtime adapters instead of the simple name (think java.awt.List vs java.util.List)

This is with respect to RuntimeTypeAdapterFactory.of, yeah? Can you describe a situation where it would be useful to declare typeFieldName? I am not seeing a use for it, and want to suggest that we don't allow users the option to give a name, but I'm probably missing something 😁. In any case, I definitely like using the full class name.

Maybe, instead of making a big serializer which gather all adapters, we should tag (with annotations) adapters based on some other properties ( their repositories ), in order to be more modular about which adapters you want or which you don't want.

Generally speaking, I like the idea of modularity, can you elaborate a bit more about the idea of annotations? How might the user benefit from a particular scenario?

Do you think it would be useful for the user to specify a list of Classes that they would want to adapt, and only let the serializer gather relevant adapters for that list? I would think that generally the user would know what classes they are going to be dealing with, so I'd think that route might be more convenient? But I am pretty naive here.

NicoKiaru · 2021-05-25T08:32:16Z

Declare the incubator as the parent
Add scijava-persist as a module of the incubator
Remove the imagej dependency in favor of SJC. I didn't see any usage of ImageJ beyond obtaining a Context, which we can do on our own with just SJC. If you need ImageJ for anything else, more work should be done. Please let me know if this is the case.

Thanks, that's perfect. And the context is enough, yes.

The second is really an in-progress commit that attempts to make a JPMS Module out of scijava-persist. Unfortunately, this creates a few compile errors because the GSON developers did not intend for the com.google.gson.internal package to be used outside of the project. Even if we don't make a JPMS module out of scijava-persist, we should probably avoid this API, yeah?

I don't know anything about JPMS, but I looked quickly and I kind of understand.
The thing about RuntimeAdapterFactory is that it's really key to a convenient serialization. This class is not mine, it's a class which is not in the core gson repo, but in 'extras', and many many projects use it because it allows to conveniently serialize and deserialize interfaces or abstract class (cf documentation in the header of RuntimeTypeAdapterFactory).

In QuPath, I believe the 'internal' not being visible is kind of solved by putting this class in its original package:
https://github.com/qupath/qupath/blob/88c7cc45648c1d5b09a840bd1e497ea9a46453aa/qupath-core/src/main/java/com/google/gson/RuntimeTypeAdapterFactory.java

But I guess this won't solve the JPMS issue ?

Do you think it would be worth writing some abstract JSON serializer/deserializer interface

Hum, if I understand correctly, this looks like adding another layer on top of Gson. Why not, but is it worth the effort ? While there's no use case and given all what needs to be maintained in scijava, do you think it's beneficial ?

Would it be an idea to have 'scijava-gson' module which is rather unsafe (no version), and then make a 'scijava-persist' module, which is using scijava-gson or something else.

Also one thing to consider is the compatibility with other software. If we rewrite, (using transcribers) serializers which are deviating too much from the standard serialization, we may lose the benefit or using the same adapters in different software. Quick example : for imglib2 realtransform objects (Affine transform, ThinPlateSplineTransform, WrappedIterativeInvertibleTransform), I could easily use the registration result of BigWarp into QuPath, by using the same adapters. And I think Gson / json is used way beyond Fiji and QuPath.

My first instinct suggests that we should fail/skip both of the relevant serializers, and log an error. I think it is a mistake to do otherwise; if there are multiple serializer implementations, they should be reconciled into one. We do not want any confusion about which is being run. What do you think of this plan?

👍

This is with respect to RuntimeTypeAdapterFactory.of, yeah? Can you describe a situation where it would be useful to declare typeFieldName? I am not seeing a use for it, and want to suggest that we don't allow users the option to give a name, but I'm probably missing something 😁. In any case, I definitely like using the full class name.

Not sure I understand your question, but maybe here's an example:

Let's say I want to serialize a RealTransformSequence object ( which implements RealTransform). This object contains somehow the series of RealTransforms objects to apply, that makes this sequence ( we don't know what kind of transform..., just that they implements RealTransform).

Now let's suppose that this sequence contains:

an AffineTransform3D object which implements RealTransform
followed by a ThinPlateSpline object

Serializing the sequence object will give something like:

{ // serialization of transform list or array
  transform: {
     matrix:{0,0,0, etc.}
  }.
  transform: {
     landmark:{fixed coordinates,moving coordinates},
     landmark:{fixed coordinates,moving coordinates},
     landmark:{fixed coordinates,moving coordinates},
     etc.
  }
}

Fine. Now when deserializing, you will write something like gson.deserialize(myfile, RealTransform.class); but how can gson know which kind of realtransform is contained in the file ? There's no obvious way what kind of RealTransform object gson is supposed to deserialize.

By using RuntimeTypeAdapter, an extra field (named type by default), will be written to identify the class of the object being serialized, and used during deserialization to discriminate what sort of object should be constructed. The json file will look something like:

Serializing the sequence object will give something like:

{ // serialization of transform list or array
  type: "RealTransformSequence",
  transform: {
     type: "AffineTransform3D",
     matrix:{0,0,0, etc.}
  }.
  transform: {
     type:"ThinPlateSplineTransform",
     landmark:{fixed coordinates,moving coordinates},
     landmark:{fixed coordinates,moving coordinates},
     landmark:{fixed coordinates,moving coordinates},
     etc.
  }
}

And it can now be deserialized easily as a RealTransform object.

FYI, I'm already using this and the adapters are in https://github.com/bigdataviewer/bigdataviewer-playground/tree/master/src/main/java/net/imglib2/realtransform

Generally speaking, I like the idea of modularity, can you elaborate a bit more about the idea of annotations? How might the user benefit from a particular scenario?

Not sure yet, but the example of transform above could be such a scenario.

Do you think it would be useful for the user to specify a list of Classes that they would want to adapt, and only let the serializer gather relevant adapters for that list? I would think that generally the user would know what classes they are going to be dealing with, so I'd think that route might be more convenient? But I am pretty naive here.

Yes, so if you do that I'd say you can use gson directly, no need for extensibility. The problem arises when you have adapters scattered in different repos. For instance, for the transform objects, some adapters are located in a 'core' repo (bigdataviewer-playground), but I also define other kinds of transforms and their respective adapters in extra repositories (like in this one: https://github.com/BIOP/bigdataviewer-biop-tools/tree/20391be90a54a26529e614dc084e4a2e3effd572/src/main/java/bdv/util).

I want the serialization mechanism to be able to serialize any transform, and potentially one which I did not define myself.

TODO: Scijava common is unable to find the Adapter Service.

gselzer · 2021-05-25T16:40:15Z

I don't know anything about JPMS, but I looked quickly and I kind of understand.
The thing about RuntimeAdapterFactory is that it's really key to a convenient serialization. This class is not mine, it's a class which is not in the core gson repo, but in 'extras', and many many projects use it because it allows to conveniently serialize and deserialize interfaces or abstract class (cf documentation in the header of RuntimeTypeAdapterFactory).

Hmm, I think the easiest thing to do here then is to fork the needed classes from com.google.gson.internal (I made a commit to do this). It seems the "least bad" option to me, as we now have no compiler errors and we are no longer subject to changes made to these classes (which seem likely at some point due to the package-info for that package). Let me know what you think of this; it likely needs some restructuring. We might want to put those three files in some internal package that is not exported.

Why not, but is it worth the effort ? While there's no use case and given all what needs to be maintained in scijava, do you think it's beneficial ?

Good point 😁. I still think this is a good idea if we could envision any other serialization protocol being useful within the SciJava world, but I don't want that to hold up the merging of this PR. The nice thing about the incubator is that it leaves us free to make these changes. So let's ignore this for now?

Would it be an idea to have 'scijava-gson' module which is rather unsafe (no version), and then make a 'scijava-persist' module, which is using scijava-gson or something else.

I could see an argument for that if we abstract the gson code into a Transcriber. Then scijava-persist could contain those Transcriber interfaces, scijava-gson could just be the implementation, and there could be other modules akin to scijava-gson that provide other serialization implementations. Then users can decide which implementations they want and only import those modules. My YAGNI sense is tingling a bit, however; I'll defer to you on whether or not this would be useful or just too complicated.

Not sure I understand your question, but maybe here's an example:

Let's say I want to serialize a RealTransformSequence object ( which implements RealTransform)...

I think I understand this, my question is more about why this method exists. I figured that it had something to do with this question you had:

To avoid some conflicts, we could use the full class name when writing runtime adapters instead of the simple name (think java.awt.List vs java.util.List)

If we are forking this class, it seems to me like this String is a liability; it would be awesome if we could automate the String to be the full class name, we'd avoid conflicts. Am I going crazy?

Yes, so if you do that I'd say you can use gson directly, no need for extensibility.

Aha. Makes sense to me 😁

NicoKiaru · 2021-05-26T11:25:12Z

Thanks a lot!!

I could see an argument for that if we abstract the gson code into a Transcriber. Then scijava-persist could contain those Transcriber interfaces, scijava-gson could just be the implementation, and there could be other modules akin to scijava-gson that provide other serialization implementations. Then users can decide which implementations they want and only import those modules. My YAGNI sense is tingling a bit, however; I'll defer to you on whether or not this would be useful or just too complicated.

I think such a decision should not be taken by me, because I don't know enough about good software practice...

Here's what I have in mind:

I would tend to use gson as it is now, the standard way - improved by a little bit of scijava extensibility. We know it won't cover every use case (simple name instead of full name, no version, we cannot have a field name 'type', circular references lead to stack overflow), but the strength is in it ease of use, and its compatibility with other software. Trying to overcome gson limitations is probably counter productive.
Using transcriber sounds great, but could we make sure that a transcriber can be written in such a way that:
- a transcriber is easy to write
- a transcriber will not need customisation because of specific downstream implementation (gson or any other one)
- it leads, when using scijava-gson implementations, to a 'standard' serialisation

Trying to fix gson will cause complexity, plus we probably won't be able to cover every use case, and we will lose the 'default' serialisation procedure.

Regarding the other comments:

If we are forking this class, it seems to me like this String is a liability

To be honest, I didn't look at all at how this RuntimeTypeAdapterFactory class works 😁. This method is not used currently, we can probably remove it.

If we want to put the full class name, we could just replace return registerSubtype(type, type.getSimpleName()); with return registerSubtype(type, type.getName());, right ?

One last point regarding conflicts: there is still the confilct of the name used to specify the type. For instance a class like this:

class SerializeMeIfYouCan {
    String type = "this field can't be serialize with gson runtime adapters";
}

Cannot be serialized with runtime adapters.

@NicoKiaru

From @NicoKiaru: This PR brings a scijava service DefaultScijavaAdapterService which provides a way to serialize and deserialize java objects. Vocabulary : adapter = serializer (object to text) + deserializer (text to object) The most meaningful demo is in SerializationTests.java. A test is included, which basically makes a loop: object -> serialization -> string1 -> deserialization -> object -> serialization -> string2, and checks that string1.equals(string2) Scijava persist uses Gson serialization ( with RunTime adapters, which provide a way to serialize/deserialize objects extending abstract class or implements interface, see javadoc of RuntimeTypeAdapterFactory), and discovers and registers adapters at runtime. The adapters which are autodiscovered are of two kinds: IClassAdapter IClassRuntimeAdapter for adapters which require runtime adaptation (is this RealTransform object an affine transform ? or a sequence of transform, or a thinplateSplineTransform ? One runtime adapter needs to be registered per class) See #30

@NicoKiaru

From @NicoKiaru: This PR brings a scijava service DefaultScijavaAdapterService which provides a way to serialize and deserialize java objects. Vocabulary : adapter = serializer (object to text) + deserializer (text to object) The most meaningful demo is in SerializationTests.java. A test is included, which basically makes a loop: object -> serialization -> string1 -> deserialization -> object -> serialization -> string2, and checks that string1.equals(string2) Scijava persist uses Gson serialization ( with RunTime adapters, which provide a way to serialize/deserialize objects extending abstract class or implements interface, see javadoc of RuntimeTypeAdapterFactory), and discovers and registers adapters at runtime. The adapters which are autodiscovered are of two kinds: IClassAdapter IClassRuntimeAdapter for adapters which require runtime adaptation (is this RealTransform object an affine transform ? or a sequence of transform, or a thinplateSplineTransform ? One runtime adapter needs to be registered per class) See #30

initial commit of scijava-persist

43bdb5f

gselzer force-pushed the scijava/scijava-persist/init branch from 0afe0db to 61947b3 Compare May 21, 2021 19:43

gselzer added 2 commits May 25, 2021 11:10

Fork non-exported gson classes

b575376

WIP: Create the module org.scijava.persist

e7f2c3b

TODO: Scijava common is unable to find the Adapter Service.

gselzer force-pushed the scijava/scijava-persist/init branch from 03b282b to e7f2c3b Compare May 25, 2021 16:11

hinerm closed this Jun 15, 2021

gselzer mentioned this pull request Jun 18, 2021

Data provenance mechanism #35

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scijava-persist : persisting objects with Gson and scijava discovery mechanism #30

scijava-persist : persisting objects with Gson and scijava discovery mechanism #30

Uh oh!

NicoKiaru commented May 3, 2021 •

edited

Loading

Uh oh!

gselzer commented May 21, 2021

Uh oh!

gselzer commented May 21, 2021

Uh oh!

NicoKiaru commented May 25, 2021

Uh oh!

gselzer commented May 25, 2021

Uh oh!

NicoKiaru commented May 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scijava-persist : persisting objects with Gson and scijava discovery mechanism #30

scijava-persist : persisting objects with Gson and scijava discovery mechanism #30

Uh oh!

Conversation

NicoKiaru commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gselzer commented May 21, 2021

Uh oh!

gselzer commented May 21, 2021

Uh oh!

NicoKiaru commented May 25, 2021

Uh oh!

gselzer commented May 25, 2021

Uh oh!

NicoKiaru commented May 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NicoKiaru commented May 3, 2021 •

edited

Loading