Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RenderTable.WHEN_MULTIPLE_TABLES only works with table lists, not with joins #15993

Closed
rszyler opened this issue Dec 27, 2023 · 10 comments
Closed

Comments

@rszyler
Copy link

rszyler commented Dec 27, 2023

Expected behavior

org.jooq.impl.SelectQueryImpl#accept0 will detect if there are joins in query if WHEN_MULTIPLE_TABLES is set and force adding table names in column names in such case

Place to change in code - org.jooq.impl.SelectQueryImpl#accept0:

            case WHEN_MULTIPLE_TABLES: {
                if (knownTableSource() && getFrom().size() < 2)
                    context.data(DATA_RENDER_TABLE, false);

                break;
            }

getFrom().size() need to be replaced as in case of join in query it only contains one org.jooq.impl.Join object and only then this object contains inside 2 tables so just size() on getFrom() doesn't cover all cases of multiple tables in query.

Simple solution for join case should be adding to condition && !(getFrom().get(0) instanceof JoinTable) but maybe there is more cases to cover that I'm also not aware.

Actual behavior

When WHEN_MULTIPLE_TABLES is set in settings table name is not added to column name when query contains multiple tables in join.
Example code for query:

        final AdGroups child = AD_GROUPS.as("CHILD");
        final AdGroups parent = AD_GROUPS.as("PARENT");
        return getJooq(sourceId)
                .select(child.SID, parent.SID)
                .from(AD_GROUP_MEMBERS)
                .join(parent)
                .on(AD_GROUP_MEMBERS.SOURCE_ID.eq(parent.SOURCE_ID)
                        .and(AD_GROUP_MEMBERS.SNAPSHOT_ID.eq(parent.SNAPSHOT_ID))
                        .and(AD_GROUP_MEMBERS.PARENT_GUID.eq(parent.GUID)))
                .join(child)
                .on(AD_GROUP_MEMBERS.SOURCE_ID.eq(child.SOURCE_ID)
                        .and(AD_GROUP_MEMBERS.SNAPSHOT_ID.eq(child.SNAPSHOT_ID))
                        .and(AD_GROUP_MEMBERS.CHILD_GUID.eq(child.GUID)))
                .where(AD_GROUP_MEMBERS.SOURCE_ID.eq(sourceId)
                        .and(AD_GROUP_MEMBERS.SNAPSHOT_ID.eq(snapshotId))
                        .and(AD_GROUP_MEMBERS.IS_CHILD_GROUP.eq(true))
                        .and(child.SID.in(sids)))
                .fetch(record -> new Edge<>(record.value1(), record.value2()));

Error with generated query:

org.jooq.exception.IntegrityConstraintViolationException: SQL [select `sid`, `sid` from `fsmeta_1`.`ad_group_members` join `fsmeta_1`.`ad_groups` as `PARENT` on (`source_id` = `source_id` and `snapshot_id` = `snapshot_id` and `parent_guid` = `guid`) join `fsmeta_1`.`ad_groups` as `CHILD` on (`source_id` = `source_id` and `snapshot_id` = `snapshot_id` and `child_guid` = `guid`) where (`source_id` = ? and `snapshot_id` = ? and `is_child_group` = ? and `sid` in (?))]; Column 'sid' in field list is ambiguous

Root cause:
Condition in org.jooq.impl.SelectQueryImpl#accept0 doesn't recognize multiple tables in Join objects in getFrom()
Screenshot from 2023-12-27 11-30-34

Steps to reproduce the problem

Set renderTable in Settings to WHEN_MULTIPLE_TABLES.
Create code with query containing join with columns with the same name in on like select(...).from(x).join(y).on(x.z.eq(y.z))

jOOQ Version

3.19.1

Database product and version

8.0.33-google

Java Version

openjdk 17.0.9 2023-10-17

OS Version

No response

JDBC driver name and version (include name if unofficial driver)

No response

@lukaseder
Copy link
Member

Huh, yes, thanks for your report. Interesting, we should check if the single element in getFrom() is a JoinTable, in case of which the table count is also bigger than 1 (exact counts aren't needed).

Should be an easy fix

@lukaseder lukaseder changed the title org.jooq.conf.RenderTable#WHEN_MULTIPLE_TABLES doesn't work if there are joins in query RenderTable.WHEN_MULTIPLE_TABLES only works with table lists, not with joins Dec 27, 2023
@lukaseder
Copy link
Member

@lukaseder
Copy link
Member

There are other cases when the setting currently fails, including:

@lukaseder
Copy link
Member

What's the reason for using this flag, if I may ask?

@rszyler
Copy link
Author

rszyler commented Dec 27, 2023

Wow. thanks for very quick fix @lukaseder

We wanted to make our queries shorter. With always adding schema and table name to column names our queries were way too long to track them in GoogleCloud SQL query insights. Queries displayed there are cut to 1000 chars so usually we're not able to see conditions for most of our queries. We're using sharding so we have multiple schemas per db host and in result removing just schema name is no go for us.

I'm currently working on optimizing our db queries and seeing query stats but not being able to see whole query make this task harder. This render table feature is perfect for that by making queries more compact

To sum up we don't have any must have use case case for that feature but it's quality of life feature if you analyze queries from db side by making queries shorter and more compact.

@lukaseder
Copy link
Member

lukaseder commented Dec 27, 2023

We wanted to make our queries shorter. With always adding schema and table name to column names our queries were way too long to track them in GoogleCloud SQL query insights. Queries displayed there are cut to 1000 chars so usually we're not able to see conditions for most of our queries. We're using sharding so we have multiple schemas per db host and in result removing just schema name is no go for us.

I see. Well, the simplest way to shorten your queries' SQL string length is to always alias your tables. That way, the fully qualified table identifier remains in the FROM clause, but all references will be of the form a.COL1, b.COL2, etc. You can also turn off quoting if you're sure there are no special characters or identifiers conflicting with keywords.

At least, this works as workarounds until all known issues of this flag are fixed.

@rszyler
Copy link
Author

rszyler commented Dec 27, 2023

Thanks for the tip. We already thought about aliases. The main problem is how to enforce it in automatic way and that it will be transparent for other engineers. We already have couple hundreds if not more than thousands of existed queries and few dozens engineers working on the project.

Like I said it's quality of life change for us not something essential so we can wait for proper fix. For sure then I can test it on huge number of existed queries and leave feedback again if needed.

@lukaseder
Copy link
Member

I see, in that case, a flag would indeed work best. I've often wondered if yet another flag that removes schema qualification only in references not in declarations would be helpful. I.e. this:

select t.col
from schema.t.col

As long as there are no ambiguous tables schema1.t and schema2.t, this should be sufficient, plus it would even work with joins.

@rszyler
Copy link
Author

rszyler commented Dec 27, 2023

I think yes if WHEN_MULTIPLE_TABLES will not be fully functional or if it will be very slow. Removing both schema and table name is just way more superior over just removing schema if you want compact and readable queries. If I would have both fully functional option I would always chose removing schema and table name except only one case - if removing both is so inefficient that removing just schemas is x100 faster

@lukaseder
Copy link
Member

I think yes if WHEN_MULTIPLE_TABLES will not be fully functional

There's always an edge case...

or if it will be very slow.

I don't think it will be, at least compared to WHEN_AMBIGUOUS_COLUMNS

Removing both schema and table name is just way more superior over just removing schema if you want compact and readable queries.

I don't know. It seems like one of those "quick wins" for trivial queries, but as soon as the query gets mildly complex (a single join or subquery being "complex"), then it's completely useless.

I thought about your other comments again:

We're using sharding so we have multiple schemas per db host and in result removing just schema name is no go for us.

Do you ever need multiple schemas per query? Otherwise, why not just set the CURRENT_SCHEMA? Most jOOQ users who want shorter queries do this. RenderTable is not at all a very popular feature. It was implemented very late (compared to schema mapping, which has been around since 1.x versions), and given the significance of this particular bug that has been reported only just now, I don't think many people use it.

Thanks for the tip. We already thought about aliases. The main problem is how to enforce it in automatic way and that it will be transparent for other engineers. We already have couple hundreds if not more than thousands of existed queries and few dozens engineers working on the project.

How about patching generated code such that all table references in the generated Tables.java file are always auto-aliased to themselves, e.g.:

public class Tables {
    public static Book BOOK = com.example.Book.BOOK.as("book");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants