Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SQL-92-style join queries #7

Closed
ianmcook opened this issue Jan 4, 2020 · 0 comments
Closed

Support SQL-92-style join queries #7

ianmcook opened this issue Jan 4, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@ianmcook
Copy link
Owner

ianmcook commented Jan 4, 2020

queryparser now supports queries that use SQL-92-style (explicit) join syntax (ianmcook/queryparser#20). Add support for this to tidyquery.

Limit the initial implementation in tidyquery to two-table joins; implement joins of three or more tables later. With three or more tables, there are major challenges caused by differences in how SQL engines and dplyr perform joins: SQL engines process joins by looking at all the source tables at once, whereas dplyr can only process one join at a time, and in each successive join, the left table is all the tables previously joined. dplyr does not have any built-in mechanism for tracking which columns in a join result came from which tables, and in joins of three or more tables, tidyquery would need to keep track of that to ensure that column references in the join conditions and in other clauses point to the correct columns in the left table.

The dplyr join functions coalesce the join key columns from the left and right tables and can return only these coalesced values, not the separate join key columns from the left and right tables. For the initial implementation in tidyquery, return only the coalesced join key column(s) and disallow qualified references to the join key column(s) from the left and/or right table in outer joins, to prevent users from thinking they're running a query to check for unmatched rows. In a later version of tidyquery, before the join, use mutate() to add new columns to the left and right tables containing the join keys, then join by those columns and remove them after the join; this would preserve the separate join key columns from the left and right tables.

@ianmcook ianmcook added the enhancement New feature or request label Jan 11, 2020
ianmcook pushed a commit that referenced this issue Jan 13, 2020
ianmcook pushed a commit that referenced this issue Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant