Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration Path for Immutability #9047

Open
lukaseder opened this issue Aug 16, 2019 · 22 comments
Open

Migration Path for Immutability #9047

lukaseder opened this issue Aug 16, 2019 · 22 comments

Comments

@lukaseder
Copy link
Member

lukaseder commented Aug 16, 2019

This issue collects various discussions about our projected path to immutability of the query object model. Immutability will help us achieve a variety of things much more easily, including:

  • Caching of SQL strings per query part
  • Caching of query parts by the user
  • Simpler expression tree transformation than what is currently done in VisitListener. An immutable expression tree representation would be easier to make a public API.

Types that will require immutability

  • Settings: This will be relatively simple as only a very limited amount of user code touches on settings. Breaking backwards compatibility here could be forgiveable in a minor release, if we produce compilation errors, i.e. when we remove the setXY() and withXY() methods and replace them by a new immutable version. Caveat: Lists are currently mutable.
  • QueryPart implementations: Some query parts are already immutable (e.g. Condition), but they may reference mutable query parts (e.g. Select) internally. The entire query object model must be made immutable. The exact implementation using ADT, future java.lang.Record, etc. is out of scope for this issue.
  • DSL API: This API also extends QueryPart, but is "special", as it is widely understood to be mutable right now. It includes all the types suffixed by Step

Types that will remain mutable

  • Model API: The point of this API is mutable manipulation of the query object model. The API must remain mutable, even if it internally operates on immutable objects after these changes. It will be very useful for the relationship to be inversed first, i.e. for the model API to wrap the DSL API, instead of the current status quo: Reverse relationship between model and DSL APIs #11241

Possible paths

Keep the DSL API and add a feature flag (system property) to turn off/on immutability

Pros:

  • Less API maintenance
  • New users have no effort to use mutable API

Cons:

  • Existing users might not notice the change until production

Add a new DSL API

Pros:

  • No regressions in user code

Cons:

  • A lot of effort to maintain and document the two parallel APIs

More discussions will follow

@lukaseder
Copy link
Member Author

lukaseder commented Sep 18, 2019

In #9167, I have started evaluating a few annotation processors that help create immutable types. These include:

Obviously, in all cases, we should generate code and check it in version control, and there must be no runtime dependency / magic, imposed by these libraries. I think all of them satisfy these conditions. Immutables seems promising and highly configurable at first sight

@lukaseder
Copy link
Member Author

Whatever code generation we're using here for the immutable query object model, it will need to provide some sort of generated visitor, which allows for exhaustive pattern matching.

Other languages support these things using algebraic data types / sum types / sealed types. Java might as well, soon: https://cr.openjdk.java.net/~briangoetz/amber/pattern-match.html. Until Java does support these things, we might work around the limitations using code generation.

Java's proposed record types might also help. They do support final members: https://cr.openjdk.java.net/~briangoetz/amber/datum.html

@knutwannheden
Copy link
Contributor

I think it might also be a good idea to have a central factory (defined by an interface) through which the objects get instantiated.

@lukaseder
Copy link
Member Author

Yes, there will be such a factory. Why would it be defined by an interface rather than a class with static methods?

@knutwannheden
Copy link
Contributor

I suppose an interface isn't really required. I had some idea that the interface could be used by the parser to return implementation objects which have offset and length information for all objects. But now I don't really see how that would help / work.

@lukaseder
Copy link
Member Author

I think that we could use an IdentityHashMap somewhere to map each object to such additional meta information externally... Of course, a factory could allow for swapping implementations, but right now, I don't see the use case(s).

@lukaseder
Copy link
Member Author

Note, I'm still experimenting also with Scala and Kotlin for this API. I think we could have much better expression tree transformation capabilities in these languages.

@knutwannheden
Copy link
Contributor

One more interesting aspect of this is that some jOOQ ultralite edition would then no longer have to contain the DSL API anymore (except for parts of DSL I assume). As long as at least also the parser is included :-)

@lukaseder
Copy link
Member Author

Hah... Perhaps! Nice thinking. For those who don't need the API itself, but only the SQL transformation capabilities, this could be nice.

A simple prototype (which I might publish tonight) already shows how extremely easy the existing runtime render mapping feature is to implement with pattern matching. This new object model will really add a lot of power to jOOQ.

@lukaseder
Copy link
Member Author

Here are two simple proof of concepts:

Neither claims to be using the relevant language features / library features maximally - there is obviously room for improvement.

Both show that using ADTs, we can very easily write SQL transformation logic that is located externally from the relevant QueryPart types, which will allow for pipelining transformations and replacing a variety of existing features by more generic ones - for example the existing schema/table mapping feature.

Both gists do the same thing. They try to map the T table to the X table in a simple predicate:

// Expression tree
CAnd(CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(T)),CIdent(C)),CVal(2)),CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C)),CVal(2)))

// Output
// S.T.C = 2 AND S.U.C = 2

// Transformed expression tree
CAnd(CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(X)),CIdent(C)),CVal(2)),CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C)),CVal(2)))

// Output
// S.X.C = 2 AND S.U.C = 2

@lukaseder
Copy link
Member Author

The second revision of the Scala version implements a simple example of row level security, where in the presence of table access to S.U, we generate an additional predicate on S.U.C in the where clause:

// Expression tree
CSelect(List(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(T)),CIdent(C)), CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C))),CJoin(CTableRef(CSchemaRef(CIdent(S)),CIdent(T)),CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(T)),CIdent(C)),CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C)))),CAnd(CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(T)),CIdent(C)),CVal(2)),CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C)),CVal(2))))

// Output
// SELECT S.T.C, S.U.C FROM S.T JOIN S.U ON S.T.C = S.U.C WHERE S.T.C = 2 AND S.U.C = 2

// Transformed expression tree
CSelect(List(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(X)),CIdent(C)), CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C))),CJoin(CTableRef(CSchemaRef(CIdent(S)),CIdent(X)),CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(X)),CIdent(C)),CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C)))),CAnd(CAnd(CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(X)),CIdent(C)),CVal(2)),CEq(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C)),CVal(2))),CIn(CFieldRef(CTableRef(CSchemaRef(CIdent(S)),CIdent(U)),CIdent(C)),List(CVal(1), CVal(2), CVal(3)))))

// Output
// SELECT S.X.C, S.U.C FROM S.X JOIN S.U ON S.X.C = S.U.C WHERE S.X.C = 2 AND S.U.C = 2 AND S.U.C IN (1, 2, 3)

See: https://gist.github.com/lukaseder/ebc7c9d9a2eb178939b2149014e75581

@lukaseder
Copy link
Member Author

The exposed transformation does both things in one go. Sequential application would be simple to achieve as well

@knutwannheden
Copy link
Contributor

To be future proof I think we will want to define Java interfaces for all object model types. That should for instance allow us to use Java record types as the implementation, once they are available.

I suppose this would then be a good reason to use some kind of metamodel. Does Derive4J fit that bill or should it be something Java independent?

@lukaseder
Copy link
Member Author

To be future proof I think we will want to define Java interfaces for all object model types. That should for instance allow us to use Java record types as the implementation, once they are available.

Yes, I agree. Forward compatibility will be non trivial. I've ignored this topic for now as it seems secondary, but it will be important once we settle on specific API

I suppose this would then be a good reason to use some kind of metamodel. Does Derive4J fit that bill or should it be something Java independent?

I don't think any out of the box solution will fit the bill for everything we want here. I'm just using them for now to get some rapid prototyping feedback, and take some inspiration for our own code generation. Writing a generator will not be the difficult part here. And personally, I don't think it will be annotation based, but explicitly programmatic, using templating, similar to what we have already with our various Xtend generators. One reason is that Java independence (i.e. templating) might be a big plus in the future.

@lukaseder
Copy link
Member Author

The more I think about this, the less I think we can depend on an external utility. These utilities are very valuable for quick-and-dirty API design, which is very valuable when a big set of data structures needs to be designed in an ordinary business application, and the specific API look-and-feel is secondary.

However, we will likely need much more than what these tools can offer.

@lukaseder
Copy link
Member Author

... I'm almost included to explore XSD as a source of truth and write a code generator based on XSD introspection :)

@knutwannheden
Copy link
Contributor

All comes down to background and experience
Ecore (EMF) would have been my first choice 😉

@lukaseder
Copy link
Member Author

EMF did cross my mind, but then I thought it might be overkill...

@lukaseder
Copy link
Member Author

Currently experimenting with a Java based model and template based code generation using Xtend, reflecting on the Java model. This is much more usable than anything else I've tried so far. much more convenient that annotation processing, IMO.

Generating a visitor API, and some default implementations is effortless. I doubt I will find a better solution right now :)

@knutwannheden
Copy link
Contributor

Cool! Xtend's active annotations can also be quite nice and convenient at times. But for me Xtend is also the best when it comes to generating text.

@lukaseder
Copy link
Member Author

Oh interesting. Of course, Xtend can implement annotations quite differently. Will investigate their utility. You probably mean that I could design the meta model in Xtend rather than in Java, and then reflect on that - or rather than reflect on it, implement code generation in the Xtend annotation processor?

@lukaseder
Copy link
Member Author

... personally, I like Xtend's templating, but I'm not sure if we want to add a tighter dependency on the language itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants