Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sql): ASOF and LT JOINS no longer convert SYMBOL keys to STRING #3087

Merged
merged 32 commits into from
Mar 30, 2023

Conversation

jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Mar 22, 2023

fixes #2976

The issue is best demonstrated with multiple levels of LT JOINS.
It goes like this:

  1. LT JOIN physical tables. This causes no problem because physical tables support random access -> QuestDB uses light join. Light join does not copy columns from a slave cursor to a Map-backed record. However the JOIN result set no longer supports random access.
  2. Join another table with the result of ☝️ and use a symbol column as a joining key. The slave cursor does not support random access -> QuestDB has to copy columns from the slave record to a map. But because we are joining on symbol keys, it has to convert symbols to a string and use the string as a key.
  3. The result of ☝️ has symbols from the slave table, used as keys, converted to strings :-(

The idea behind this fix: Join keys are present in both master and slave records. So we can still convert key symbols from the slave to strings to join with the master. But when a downstream code asks for a key column from the slave we can cheat and return the symbol ID from the master cursor. Because the symbol (string value) must be the same; otherwise, a join would not match.

I call this: "wrapping over from slave columns to master columns". Why? Columns in a join record as organized like this:

master_column0 | master_column1 | .... | master_columnN | slave_value_column0 | slave_value_column1 |... | slave_value_columnN | slave_key_column0 | slave_key_column1 | ... | slave_key_columnN

So when a downstream code asks for a symbol column which is after the slave key-value divide then we can "wrap-over" to master columns and fetch the symbol ID from a master record. Because the symbol has to have the same string value in both master and slave records. (assuming the slave record is non-null). And obviously, the join cursor has to return the right symbol table on get/newSymbolTable()

AN ALTERNATIVE SOLUTION:
An alternative could be to sink symbol IDs from keys into a map value. So when a JOIN key uses symbol then the Join would sink a string representation of the symbol to map key (for joining with a master record) and int representation of the symbol would also go to a map value. So when a downstream operator asks for a symbol ID which is used as a key the JOIN record would return the ID from a value portion of a map entry.

@jerrinot jerrinot added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Mar 22, 2023
@jerrinot jerrinot marked this pull request as ready for review March 23, 2023 16:16
Copy link
Collaborator

@ideoma ideoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests with LT join on symbol columns between different tables with unmatching symbol sets?

So that if table T1 has symbols
1 = A
2 = B
3 = C

And table T2 has symbols
1 = C
2 = D
3 = A

and then join them

core/src/test/java/io/questdb/griffin/AsOfJoinTest.java Outdated Show resolved Hide resolved
@jerrinot jerrinot requested a review from ideoma March 27, 2023 09:46
Copy link
Collaborator

@ideoma ideoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I can fully understand the mechanics.
Perhaps @bziobrowski you can have a look here too?

@ideoma
Copy link
Collaborator

ideoma commented Mar 28, 2023

One more question, what if the Master table does not have a NULL value in the symbol columns but the NULL column value is the result of the ASOF join? Can you please add tests with such asof join and IS NULL filter in outer select?

…ction

All classes implementing RecordMetadata were also subclasses of
AbstractRecordMetadata and thus implementing ColumnMetadataCollection.

Having `TableColumnMetadata getColumnMetadata(int columnIndex);`
directly in `RecordMetadata` makes sense. So no need to keep the
hierarchy separated and have code branches which are never taken.
@jerrinot
Copy link
Contributor Author

@ideoma can you please elaborate this:

One more question, what if the Master table does not have a NULL value in the symbol columns but the NULL column value is the result of the ASOF join? Can you please add tests with such asof join and IS NULL filter in outer select?

Do you mean joining on a symbol key where a slave cursor has no matching record?

@ideoma
Copy link
Collaborator

ideoma commented Mar 30, 2023

[PR Coverage check]

😍 pass : 91 / 99 (91.92%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/groupby/GroupByUtils.java 0 1 00.00%
🔵 io/questdb/griffin/engine/join/SymbolWrapOverJoinRecord.java 20 26 76.92%
🔵 io/questdb/griffin/SqlCodeGenerator.java 25 26 96.15%
🔵 io/questdb/griffin/engine/join/AsOfJoinRecordCursorFactory.java 3 3 100.00%
🔵 io/questdb/griffin/engine/join/AbstractSymbolWrapOverCursor.java 27 27 100.00%
🔵 io/questdb/griffin/engine/join/JoinRecordMetadata.java 2 2 100.00%
🔵 io/questdb/griffin/engine/join/LtJoinRecordCursorFactory.java 5 5 100.00%
🔵 io/questdb/cairo/AbstractRecordMetadata.java 1 1 100.00%
🔵 io/questdb/cairo/GenericRecordMetadata.java 8 8 100.00%

@jerrinot
Copy link
Contributor Author

@bziobrowski can you please have another look? thanks!

@jerrinot
Copy link
Contributor Author

a failure on windows-other due to the #2941
re-running

Copy link
Contributor

@bziobrowski bziobrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now 👍

@bluestreak01 bluestreak01 merged commit 372fdfe into master Mar 30, 2023
@bluestreak01 bluestreak01 deleted the jh_lt_join_slave_keys_wrapping_to_master branch March 30, 2023 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple LT JOINs error on SYMBOL column
4 participants