-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track unique ids for each scope #54
Conversation
@@ -485,6 +503,21 @@ | |||
(components (query-fixture :snowflakelet)) | |||
(components (query-fixture :snowflake)))) | |||
|
|||
(defn sorted-vec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, doesn't return a vec anymore. All the usage sites compare it to vectors, which works fine, and is why I dropped the explicit conversion. In the next PR I simply rename it to sorted
, so think this is OK to slip through.
SELECT | ||
b.x, | ||
c.x | ||
FROM b, c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a trailing line-break in a later PR.
(is (=? [{:component {:column "x"}, :scope ["SELECT" (=?/same :subselect-1)]} | ||
{:component {:column "x"}, :scope ["SELECT" (=?/same :subselect-2)]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that Hawk will care if these two values are actually the same as each other, or as :top-level
, but we certainly do! Maybe we need something new like (=?/unique :blah)
which enforces uniqueness across all the captures?
(collect/query->components statement (merge {:preserve-identifiers? true} opts))) | ||
|
||
(defn replace-names | ||
"Given a SQL query, apply the given table, column, and schema renames. | ||
"Given an SQL query, apply the given table, column, and schema renames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now we know that I say "sequel" and you say "ess queue ell" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot to chew on, but it looks good to me
Refs metabase/metabase#42586
In order to track the provenance of query outputs we will need to follow the flow of data from the source table columns, through the various scopes, to the final top-level expression.
In essence we replace each string label in the in the
:context
vector with a[label, integer-id]
pair, so that we can tell when two scopes are the same, and in future use these references to build a DAG of the data dependencies.The test here is very basic, it just checks that our set representation does not de-duplicate the contents of the identical sub-selects. This satisfies the TDD gods1, but isn't that useful on its own.
It'll be superseded when we get to more complex queries, but we're quite a way off from supporting the examples I have in mind.
Footnotes
Actually I lied, and didn't write the test first, and then when I checked I was telling the truth I discovered that they already wouldn't be de-duplicated as Macaw tracks the second CTE's scope as nested within the first one. Well, I still think it's a useful minimal test for establishing which IDs should match and which shouldn't. ↩