Skip to content

internal/rule: fix squashjoins rule to squash projections properly too #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 28, 2018

Conversation

mcarmonaa
Copy link
Contributor

@mcarmonaa mcarmonaa commented Jun 26, 2018

Fixes #322

The bug came because the projects nodes over inner joins weren't squashed, so they stayed chained and pointing to wrong field indexes after the table squashing

GroupBy
 ├─ Aggregate(COUNT(1), refs.repository_id)
 ├─ Grouping(refs.repository_id)
 └─ Project(refs.repository_id, refs.commit_hash, refs.ref_name, commits.commit_author_name, commits.commit_author_email, commits.commit_author_when, commits.committer_name, commits.committer_email, commits.committer_when, commits.commit_message, commits.tree_hash, commits.commit_parents, commit_blobs.blob_hash)
     └─ Project(refs.repository_id, refs.commit_hash, refs.ref_name, commits.commit_author_name, commits.commit_author_email, commits.commit_author_when, commits.committer_name, commits.committer_email, commits.committer_when, commits.commit_message, commits.tree_hash, commits.commit_parents)
         └─ Filter(refs.repository_id = commit_blobs.repository_id AND refs.commit_hash = commit_blobs.commit_hash)
             └─ SquashedTable(refs, commits, commit_blobs)
                 ├─ Columns
                 │   ├─ Column(repository_id, TEXT, nullable=false)
                 │   ├─ Column(ref_name, TEXT, nullable=false)
                 │   ├─ Column(commit_hash, TEXT, nullable=false)
                 │   ├─ Column(repository_id, TEXT, nullable=false)
                 │   ├─ Column(commit_hash, TEXT, nullable=false)
                 │   ├─ Column(commit_author_name, TEXT, nullable=false)
                 │   ├─ Column(commit_author_email, TEXT, nullable=false)
                 │   ├─ Column(commit_author_when, TIMESTAMP, nullable=false)
                 │   ├─ Column(committer_name, TEXT, nullable=false)
                 │   ├─ Column(committer_email, TEXT, nullable=false)
                 │   ├─ Column(committer_when, TIMESTAMP, nullable=false)
                 │   ├─ Column(commit_message, TEXT, nullable=false)
                 │   ├─ Column(tree_hash, TEXT, nullable=false)
                 │   ├─ Column(commit_parents, JSON, nullable=false)
                 │   ├─ Column(repository_id, TEXT, nullable=false)
                 │   ├─ Column(commit_hash, TEXT, nullable=false)
                 │   └─ Column(blob_hash, TEXT, nullable=false)
                 └─ Filters
                     ├─ refs.repository_id = commit_blobs.repository_id
                     ├─ refs.commit_hash = commit_blobs.commit_hash
                     ├─ refs.repository_id = commits.repository_id
                     ├─ refs.commit_hash = commits.commit_hash
                     └─ refs.ref_name = "HEAD"

Now the project nodes are squashed too, so it works properly:

GroupBy
 ├─ Aggregate(COUNT(1), refs.repository_id)
 ├─ Grouping(refs.repository_id)
 └─ Project(refs.repository_id, refs.commit_hash, refs.ref_name, commits.commit_author_name, commits.commit_author_email, commits.commit_author_when, commits.committer_name, commits.committer_email, commits.committer_when, commits.commit_message, commits.tree_hash, commits.commit_parents, commit_blobs.blob_hash)
     └─ Filter(refs.repository_id = commit_blobs.repository_id AND refs.commit_hash = commit_blobs.commit_hash)
         └─ SquashedTable(refs, commits, commit_blobs)
             ├─ Columns
             │   ├─ Column(repository_id, TEXT, nullable=false)
             │   ├─ Column(ref_name, TEXT, nullable=false)
             │   ├─ Column(commit_hash, TEXT, nullable=false)
             │   ├─ Column(repository_id, TEXT, nullable=false)
             │   ├─ Column(commit_hash, TEXT, nullable=false)
             │   ├─ Column(commit_author_name, TEXT, nullable=false)
             │   ├─ Column(commit_author_email, TEXT, nullable=false)
             │   ├─ Column(commit_author_when, TIMESTAMP, nullable=false)
             │   ├─ Column(committer_name, TEXT, nullable=false)
             │   ├─ Column(committer_email, TEXT, nullable=false)
             │   ├─ Column(committer_when, TIMESTAMP, nullable=false)
             │   ├─ Column(commit_message, TEXT, nullable=false)
             │   ├─ Column(tree_hash, TEXT, nullable=false)
             │   ├─ Column(commit_parents, JSON, nullable=false)
             │   ├─ Column(repository_id, TEXT, nullable=false)
             │   ├─ Column(commit_hash, TEXT, nullable=false)
             │   └─ Column(blob_hash, TEXT, nullable=false)
             └─ Filters
                 ├─ refs.repository_id = commit_blobs.repository_id
                 ├─ refs.commit_hash = commit_blobs.commit_hash
                 ├─ refs.repository_id = commits.repository_id
                 ├─ refs.commit_hash = commits.commit_hash
                 └─ refs.ref_name = "HEAD"

Signed-off-by: Manuel Carmona manu.carmona90@gmail.com

@mcarmonaa mcarmonaa requested a review from a team June 26, 2018 10:38
func squashProjects(parent, child *plan.Project) (sql.Node, error) {
projections := []sql.Expression{}
for _, expr := range parent.Expressions() {
parentField, ok := expr.(*expression.GetField)
Copy link
Contributor

@ajnavarro ajnavarro Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an expression is a literal, will it fail? Example:

SELECT count(1),1 , refs.repository_id
FROM refs
NATURAL JOIN commits
NATURAL JOIN commit_blobs
WHERE refs.ref_name = 'HEAD'
GROUP BY refs.repository_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That query doesn't work because of how the GroupBy is implemented right now. It only allows expressions in the select which are aggregations or are in the group by too, not because of the changes in this PR.

The following query, for example. does work:

MySQL [(none)]> SELECT 1,refs.repository_id FROM refs NATURAL JOIN commits NATURAL JOIN commit_blobs WHERE refs.ref_name = 'HEAD' limit 10;
+------+---------------+
| 1    | repository_id |
+------+---------------+
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
|    1 | enry          |
+------+---------------+
10 rows in set (0.02 sec)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, how can we reach that error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ajnavarro, I think I'm not understanding you. Do you mean that if a literal appears in the projection the query should fail? Anyway that's something not related to this PR, I guess.

Should I open an issue for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand when this error will be thrown.

As I can see on the squashProjects you are getting the expressions on the projection and checking if the type is a Getfield expression. If not, you are throwing an error.

There are some cases when a projection can contain other expressions than GetField (per example, a Function or a Literal), and that is not necessarily an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤓 Now I see what you mean. The thing is that the projections to squash coming from the replacement of natural joins by inner joins, and those projections only contains GetFields. That's the reason because the chained projections appear, so functions and literals written in a query aren't going to be squashed.

@mcarmonaa
Copy link
Contributor Author

@ajnavarro I already committed the change to not throw an error

}

squashedProject, err := squashProjects(project, child)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other error than errWrongProjection can happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, cannot, but this code is a little bit misleading. Maybe instead of return squashedProject and err we should return squasedProject and ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mcarmonaa mcarmonaa force-pushed the fix/squash-natural-joins branch from 2917d49 to ceb33f3 Compare June 28, 2018 10:00
@ajnavarro
Copy link
Contributor

@mcarmonaa can you rebase and merge please?

Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
@mcarmonaa mcarmonaa force-pushed the fix/squash-natural-joins branch from ceb33f3 to 015bbfa Compare June 28, 2018 10:28
@mcarmonaa mcarmonaa merged commit f69fe09 into src-d:master Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants