Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer Result.intoMap(), intoGroups(), intoSet() functionality as Collector #9288

Closed
11 tasks done
lukaseder opened this issue Sep 27, 2019 · 11 comments
Closed
11 tasks done

Comments

@lukaseder
Copy link
Member

lukaseder commented Sep 27, 2019

We should have a composable, possibly Collector based API to collect complex results into nested DTO projections, etc. An example is this support request here:
#8597

The goal (among others) would be to produce nested POJOs, such as for example:

class Author {
  String firstName;
  String lastName;
  List<Book> books;
}

class Book {
  String title;
}

Perhaps, some inspiration can be found in SimpleFlatMapper: https://simpleflatmapper.org/0106-getting-started-jooq.html

This issue will be split into a set of subtasks as we evaluate what steps are necessary to implement what users find most missing in this area, right now. It is also used as a host for discussions around the issue.

Tasks

  • Add Result<R>.collect(Collector<? super R, A, X>): X
  • Add <E, R extends Record1<E>> Records.intoArray(): Collector<R, ?, E[]>
  • Add <K, V, R extends Record2<K, V>> Records.intoGroups(): Collector<R, ?, Map<K, List<V>>>
  • Add <E, R extends Record1<E>> Records.intoList(): Collector<R, ?, List<E>>
  • Add <K, V, R extends Record2<K, V>> Records.intoMap(): Collector<R, ?, Map<K, V>>
  • Add <E, R extends Record1<E>> Records.intoSet(): Collector<R, ?, Set<E>>
  • Refactor Result implementations to use Result::collect
  • Refactor ResultQuery implementation to user ResultQuery::collect
  • Refactor Result.getValues(int, Class) and related methods

To avoid conflicts with other statically imported collectors from java.util.stream.Collectors, such as toList(), ours will be prefixed into, e.g. intoMap(), just like the Result.intoMap() methods they represent / replace.

Interactions with ExecuteListener

While the Result refactorings are backwards compatible, refactoring existing ResultQuery implementations are not, if we skip firing the ExecuteListener::resultStart and ExecuteListener::resultEnd events. To allow for reverting to the current behaviour, let's offer a new Settings that governs whether we create the intermediate Result data structure: #11892, the default being not to create it if not explicitly requested

@knutwannheden
Copy link
Contributor

Let me first see if I got this straight:

While some people regard jOOQ as an ORM, it is certainly foremost a query builder. And since ORMs (or FRMs) have their place, it would also make sense for jOOQ to provide more value in this area, as currently jOOQ more or less only makes it easy to map a single tuple to a single Java object. (Mapping a tuple representing a chain of one-to-one relations to a corresponding chain of connected Java objects also works quite well.)

AFAICT what customers most want is a simple way to load a given set of rows / tuples into a "corresponding" collection of "connected" Java objects (e.g. an aggregate with a one-to-many relation like customer-address). And for simple use cases (e.g. only one-to-many relations (with foreign keys) and one-to-one mappings between tables and classes), jOOQ should make this dead simple, while it should also be more or less easy to achieve this goal for more complex scenarios.

In jOOQ we want to implement this functionality on top of the Collector API (and ResultQuery#collect(Collector)) and thus provide some kind of DSL API to create the required Collector.

Since the new API will somehow need to capture how the rows and columns get mapped to objects and properties (also references), I am wondering if we should also design for the use case of allowing the actual SELECT to be derived from this "specification", rather than just a Collector, which processes the results of a user-defined query. Or would we be tapping too far into ORM-land with this? If not, could this then in the future even allow deriving INSERT, UPDATE (and UPSERT) queries for such aggregates?

@lukaseder
Copy link
Member Author

While some people regard jOOQ as an ORM, it is certainly foremost a query builder

The paradigm I think suits most here is: jOOQ is SQL. The fact that there's a query builder is secondary, even if it does appear to be the primary feature. But I think it's better to think of jOOQ = SQL, and derive all other discussion from that paradigm.

And since ORMs (or FRMs) have their place, it would also make sense for jOOQ to provide more value in this area

jOOQ does add more value in this area where we do not oppose the philosophies of SQL. I personally think the "third manifesto" didn't get enough traction outside of Oracle and PostgreSQL yet, and even there, it didn't find the adoption it deserves.

The ideal way to implement what I believe most customers want is to support MULTISET and related operators: #3884, that would allow for nesting collections in SQL. Related paradigms that have seen broader adoption are JSON and XML for nesting data structures.

Apart from that, additional client side convenience that works in similar fashions is definitely desireable as well, as long as it does not oppose SQL. By that, I mean we're mainly working on sets of tuples (values), not graphs of entities (objects). As long as we avoid introducing the notion of a record or object "identity", we will be able to "keep it simple".

Having said so, yes, we definitely need much more convenience to produce hierarchical data structures from queries.

AFAICT what customers most want is a simple way to load a given set of rows / tuples into a "corresponding" collection of "connected" Java objects (e.g. an aggregate with a one-to-many relation like customer-address). And for simple use cases (e.g. only one-to-many relations (with foreign keys) and one-to-one mappings between tables and classes), jOOQ should make this dead simple, while it should also be more or less easy to achieve this goal for more complex scenarios.

Yes, of course. But the difficulty will always be the mismatch between the denormalising join operator, and what customers actually wanted, the multiset operator. I don't think it is possible to re-connect denormalised tuples to what they were intended to be. I would really like to avoid any cleverness around that, because it breaks apart very easily.

I think it is important to make sure everyone using jOOQ (or SQL) knows that a join is a filtered cartesian product, not a way to nest collections.

However, as the popularity of fetchGroups() and third party mappers has shown, providing convenience around the nesting of single one-to-many relationships can be useful, as long as users don't think this is a good approach in general (e.g. when there are more than 1 one-to-many relationships, or several levels of nested collections)

In jOOQ we want to implement this functionality on top of the Collector API (and ResultQuery#collect(Collector)) and thus provide some kind of DSL API to create the required Collector.

Yes, Collector is a very good abstraction for this task. jOOλ provides a variety of useful collectors, and we can push this even further, maybe move some of jOOλ into jOOQ:
https://www.jooq.org/products/jOOλ/javadoc/latest/org/jooq/lambda/Agg.html

I think that we should still also offer convenience on ResultQuery and Result for the most common collection scenarios, as such API is much more discoverable than collectors. But the strategic goal is to embrace collectors.

This will limit new API features to the Java 8+ distributions, which I think is OK.

Since the new API will somehow need to capture how the rows and columns get mapped to objects and properties (also references), I am wondering if we should also design for the use case of allowing the actual SELECT to be derived from this "specification", rather than just a Collector, which processes the results of a user-defined query. Or would we be tapping too far into ORM-land with this? If not, could this then in the future even allow deriving INSERT, UPDATE (and UPSERT) queries for such aggregates?

You just re-invented my MULTISET operator idea :) Yes, we should be able to derive some collections from the projection automatically. Ideally, using operators that can also be implemented in SQL.

But some emulations are possible as well. E.g. embeddable types (#2530), which are going to be worked on as well in jOOQ 3.13 are nothing but an emulation of a nested record.

Another recent idea was this one: #9126. Another way to declare a nested record.

This is not opposed to the ideas of the SQL language. In PostgreSQL and Oracle, we can create "object types" / "record types", which then allow for things like:

-- Oracle
SELECT my_object_type(a, b, c) FROM t

-- PostgreSQL
SELECT (a, b, c)::my_object_type FROM t

It is perfectly reasonable to 1:1 map Java DTO / POJO types onto such nested record types in SQL, and provide convenience around that, using reasonable API.

The main risk here is that we'll be doing this in the SELECT clause, which is not the last clause in the logical order of operations in a SELECT:
https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-true-order-of-sql-operations/

After SELECT, there's also DISTINCT, UNION (and the others), and ORDER BY, which all interact with the projection from SELECT, so we must make sure that we're not limiting the possible SQL constructs too much by whatever features we introduce.

Summary

This issue was originally about more powerful client side data collection constructs (does not affect or interact with the SQL query, but operates on the org.jooq.Result, without knowledge on how it came to be)

There are other issues that talk about deriving such collection algorithms from a more powerful way to express the SELECT clause. These mainly include:

  • Nesting collections (e.g. using MULTISET)
  • Nesting records (e.g. using object types / embeddable types / etc)

@knutwannheden
Copy link
Contributor

I have been thinking if it could make sense to define the projection as part of the jOOQ DSL API, using some kind of mini DSL extension. Possibly something along the lines of this:

project(BOOK.ID, BOOK.TITLE)
    .using(jpaProjection(Book.class))
    .from(BOOK);

Possibly even allow projecting nested records:

project(BOOK.ID, BOOK.TITLE,
    project(AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME)
        .using(jpaProjection(Author.class)))
    .using(jpaProjection(Book.class))
    .from(BOOK)
        .join(AUTHOR).on(BOOK.AUTHOR_ID.eq(AUTHOR.ID));

The project() clause would basically just render the same SQL as a select(), but would then know how to project the tuples client-side into Java objects.

@lukaseder
Copy link
Member Author

People like jOOQ because it mimicks SQL almost perfectly (unless the Java language prevents it, e.g. with CTE or derived tables). Replacing the SELECT clause by something "clever" is going to drastically increase the learning curve and feed the arguments that jOOQ is not exactly SQL.

This is just me criticising the suggested execution here. We could definitely profit from supporting additional projection capabilities. See also the suggestion here: #9126

In other words, we're not gaining anything from avoiding the SQL keywords select() and row(), by replacing them by project(). Nested projections are already possible today using row or rowField() (the Field suffix being a workaround for an IDE / compiler performance problem, see #5233)

I'm sure we can implement this functionality entirely by enhancing the existing select() or from() clause, without introducing alternative, synthetic clauses

Besides, I really think we should not get distracted too much by what JPA is offering here. JPA has a major flaw:

  • Defaulting to eager fetching is silly, because we're always fetching too much data when we don't need it
  • Defaulting to lazy fetching is silly, because we're always producing N+1 queries when we do need it

The workaround is to use JOIN FETCH in JPQL or HQL queries, but those syntaxes are not available as annotations (luckily!) in the JPA meta model. Hence, we should not reproduce the mess produced by JPA's defaulting to produce queries from 2 equally bad default choices of eager/lazy fetching nested collections.

While I'm not excluding adding value in this area in the future, our main focus here should not be on JPA, but on the SQL view of how collections can be nested.

@knutwannheden
Copy link
Contributor

People like jOOQ because it mimicks SQL almost perfectly (unless the Java language prevents it, e.g. with CTE or derived tables). Replacing the SELECT clause by something "clever" is going to drastically increase the learning curve and feed the arguments that jOOQ is not exactly SQL.

Makes sense. Of course some kind of project() expression could also be passed as an argument to select().

This is just me criticising the suggested execution here. We could definitely profit from supporting additional projection capabilities. See also the suggestion here: #9126

I have seen that feature request, which is of course related. One of the proposals there was to add a project() method to Table.

In other words, we're not gaining anything from avoiding the SQL keywords select() and row(), by replacing them by project(). Nested projections are already possible today using row or rowField().

Right. Using RVEs to help map to nested Java objects sounds interesting. I have never seen this done before, but it could be a good fit.

Besides, I really think we should not get distracted too much by what JPA is offering here.

The jpaProjection(MyPojo.class) was only meant as an example. In #9126 the proposal is to allow something like select(MyPojo.class) and then internally register / do record-mapping and record-mapping as appropriate. I am however wondering if it wouldn't make sense to make this more transparent and easier for the user to control by adding some kind of Projection type / SPI. The user would thus write something like select(jpa(MyEntity.class)) or select(pojo(MyPojo.class)) or similar. Maybe tools like JOLO could also profit from this.

It however seems like my comment should instead be on #9126.

@lukaseder
Copy link
Member Author

Makes sense. Of course some kind of project() expression could also be passed as an argument to select().

Yes, exactly. I think that's where we might be heading. But the more cleverness we try to implement, the more brittle this will be in edge cases that depend on the actual SELECT clause (DISTINCT, UNION, ORDER BY, OFFSET .. FETCH, FOR UPDATE), so it's really a very slippery slope :)

I have seen that feature request, which is of course related. One of the proposals there was to add a project() method to Table.

Yes, I already regret that API proposition...

Right. Using RVEs to help map to nested Java objects sounds interesting. I have never seen this done before, but it could be a good fit.

There's native support in Oracle (with named types) and PostgreSQL. I've done this quite a few times in Oracle, myself...

The jpaProjection(MyPojo.class) was only meant as an example. In #9126 the proposal is to allow something like select(MyPojo.class) and then internally register / do record-mapping and record-mapping as appropriate. I am however wondering if it wouldn't make sense to make this more transparent and easier for the user to control by adding some kind of Projection type / SPI. The user would thus write something like select(jpa(MyEntity.class)) or select(pojo(MyPojo.class)) or similar. Maybe tools like JOLO could also profit from this.

All potential cleverness here will ultimately run against a wall (again), when users expect nested collections to be supported. Both of these APIs, as well as your reference to JOLO, suggest that such nested collections would be supported, just from reverse engineering some client data structure. I think that's just not going to be good enough. The same is true for #9126, btw.

If we do support nested collections, we should go "all in" ans support MULTISET: #3884, because that's the only way we can implement the full potential of the desired feature. Deriving "80%" usefulness from a client model is not good enough.

Irrespective of this discussion...

It however seems like my comment should instead be on #9126.

... yes indeed. This issue here can be implemented regardless of how we start supporting nested records and collections in the jOOQ API.

This issue here is only about post-processing whatever the SQL query and jOOQ's built in mapping algorithms produce.

@lukaseder
Copy link
Member Author

The scope of this issue here becomes more clear in the context of numerous improvements that have been done for jOOQ 3.15, including the most important ones:

It is now trivial to see that all of the current convenience mapping on ResultQuery (fetchXYZ()) and Result (intoXYZ()) can be achieved via a new set of composable Collector utilities, which could be placed in the new org.jooq.Records, which also hosts some of the ad-hoc conversion utilities.

For example:

public static final <K, V, R extends Record2<K, V>> Collector<R, ?, Map<K, V>> toMap() {
    return toMap(Record2::value1, Record2::value2);
}

public static final <K, R extends Record> Collector<R, ?, Map<K, R>> toMap(Function<? super R, ? extends K> keyMapper) {
    return toMap(keyMapper, r -> r);
}

public static final <K, V, R extends Record> Collector<R, ?, Map<K, V>> toMap(
    Function<? super R, ? extends K> keyMapper,
    Function<? super R, ? extends V> valueMapper
) {
    return Collectors.toMap(
        keyMapper,
        valueMapper,
        (k1, k2) -> {
            throw new InvalidResultException("Key " + k1 + " is not unique in Result");
        },
        LinkedHashMap::new
    );
}

These are just simple utilities built on top of JDK Collectors::toMap with the following properties:

  • Some capture <K, V, R extends Record2<K, V>> types to avoid repetition of field expressions (see example below)
  • The InvalidResultException semantics is maintained (similar to the JDK's default Collectors::toMap option of disallowing merging conflicting values)
  • The result is a LinkedHashMap, which preserves any SQL ORDER BY semantics

It's not more than that. An example:

record Book(int id, String title) {}
record Author(int id, String firstName, String lastName) {}

Map<Book, Author> books =
create().select(
            row(
                T_BOOK.ID,
                T_BOOK.TITLE
            ).mapping(Book::new),
            row(
                T_BOOK.author().ID,
                T_BOOK.author().FIRST_NAME,
                T_BOOK.author().LAST_NAME
            ).mapping(Author::new)
        )
        .from(T_BOOK)
        .orderBy(T_BOOK.ID)
        .collect(Records.toMap());

All of the types and mappings are declared only once. There is no more need to re-declare columns as Map keys and values outside of the SELECT clause.

One additional benefit of using collect() is that we can very easily skip creating a ResultImpl as an intermediate container of records.

All of these ideas also work very well with the current working drafts of creating nested collections (ARRAY or MULTISET)

@lukaseder lukaseder added this to To do in 3.15 Other improvements via automation May 10, 2021
@lukaseder lukaseder changed the title Implement more reusable ResultQuery.fetch() methods Offer Result.intoMap(), intoGroups(), intoSet() functionality as Collector May 10, 2021
lukaseder added a commit that referenced this issue May 10, 2021
@lukaseder
Copy link
Member Author

The naming conflicts with Collectors.toList etc. are unfortunate. As it is right now, users can static import only one or the other.

@lukaseder
Copy link
Member Author

Also, when someone reads the code that has collect(toList()), it may be confusing to immediately understand whether the result is a List<Record1<T>> or a List<T>

@lukaseder
Copy link
Member Author

I'll mark this as an incompatible change (of behaviour), because if we re-use the new Collector API via ResultQuery.collect(), then we will no longer go via the ExecuteListener::resultStart and ::resultEnd lifecycle events (see #11871)

@lukaseder
Copy link
Member Author

Or even better, we can be backwards compatible by creating the intermediary Result only in the presence of ExecuteListener instances, and not otherwise, making this configurable: #11892

New users will then have the performance benefit, and power users who require an ExecuteListener will have the current penalty, but they can opt out of creating intermediary Result instances explicitly.

lukaseder added a commit that referenced this issue May 17, 2021
- In Result.intoSet()
- In Result.getValues()
- In ResultQuery.fetch()
- In ResultQuery.fetchSet()
lukaseder added a commit that referenced this issue May 19, 2021
This includes:

- [#3619] Result.intoGroups() and intoMap() do not run through
RecordListener
3.15 Other improvements automation moved this from To do to Done May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

2 participants