Natural join seems to eliminate rows which it shouldn't #977

alexpdp7 · 2019-10-15T08:35:45Z

MySQL [gitbase]> select blob_hash, repository_id from blobs natural join repositories where blob_hash in ('93ec5b4525363844ddb1981adf1586ebddbc21c1', 'aad34590345310fe813fd1d9eff868afc4cea10c', 'ed82eb69daf806e521840f4320ea80d4fe0af435');
+------------------------------------------+-------------------------------------+
| blob_hash                                | repository_id                       |
+------------------------------------------+-------------------------------------+
| aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/javascript-driver |
| ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/enry               |
| aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/python-driver     |
| 93ec5b4525363844ddb1981adf1586ebddbc21c1 | github.com/src-d/go-mysql-server    |
| aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/ruby-driver       |
| ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/gitbase            |
+------------------------------------------+-------------------------------------+
6 rows in set (14.90 sec)

MySQL [gitbase]> select blob_hash, repository_id from blobs where blob_hash in ('93ec5b4525363844ddb1981adf1586ebddbc21c1', 'aad34590345310fe813fd1d9eff868afc4cea10c', 'ed82eb69daf806e521840f4320ea80d4fe0af435');
+------------------------------------------+-------------------------------------+
| blob_hash                                | repository_id                       |
+------------------------------------------+-------------------------------------+
| aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/python-driver     |
| aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/javascript-driver |
| ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/enry               |
| aad34590345310fe813fd1d9eff868afc4cea10c | github.com/bblfsh/ruby-driver       |
| 93ec5b4525363844ddb1981adf1586ebddbc21c1 | github.com/src-d/gitbase            |
| ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/gitbase            |
| 93ec5b4525363844ddb1981adf1586ebddbc21c1 | github.com/src-d/go-mysql-server    |
| ed82eb69daf806e521840f4320ea80d4fe0af435 | github.com/src-d/go-mysql-server    |
+------------------------------------------+-------------------------------------+
8 rows in set (0.13 sec)

also note that removing the natural join makes things go much faster- it was my understanding that normally we want to join with repositories to benefit from some specific optimizations (although I'm guessing that filtering with blob_hash makes those optimizations moot).

The text was updated successfully, but these errors were encountered:

erizocosmico · 2019-10-15T09:54:52Z

Normally we don't want to join with repositories unless there are already joins involved. When querying a single table like blobs, they usually have other optimizations in place.

For example, blobs with a filter like blob_hash IN list only reads the given blobs in each repository. That's why no join it's faster.

As with everything: it depends on the query, depending on what you want, some optimizations may be better than others for performance.

In any case, I reproduced the bug and there's actually an issue. It seems to not return the repeated rows for some reason.

alexpdp7 · 2019-10-15T09:57:45Z

Yeah, I suspected something like that. Anyway, for my use case lack of duplicated rows is not an issue, so for my this is not high priority.

erizocosmico · 2019-10-15T10:19:59Z

This bug is really weird. The natural join is the one returning the correct result. If you remove the optimization in blobs table it returns the same. So, there something going on because repo.BlobObjects() doesn't return these blobs, but accessing them directly does

erizocosmico · 2019-10-15T12:13:17Z

@alexpdp7 are you using siva files got from gitcollector?

I tried with regular repositories and it didn't happen.

alexpdp7 · 2019-10-15T13:08:16Z

Yup, it's using siva

erizocosmico · 2019-10-15T14:09:10Z

Narrowed it down to a siva issue and reported it to go-borges: src-d/go-borges#90, so leaving this as blocked until it's solved on their side.

erizocosmico added the bug Something isn't working label Oct 15, 2019

erizocosmico self-assigned this Oct 15, 2019

erizocosmico mentioned this issue Oct 15, 2019

Problem with distinct and order by #976

Closed

erizocosmico added the blocked Some other issue is blocking this label Oct 15, 2019

erizocosmico removed their assignment Oct 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Natural join seems to eliminate rows which it shouldn't #977

Natural join seems to eliminate rows which it shouldn't #977

alexpdp7 commented Oct 15, 2019 •

edited

Loading

erizocosmico commented Oct 15, 2019

alexpdp7 commented Oct 15, 2019

erizocosmico commented Oct 15, 2019

erizocosmico commented Oct 15, 2019

alexpdp7 commented Oct 15, 2019

erizocosmico commented Oct 15, 2019

Natural join seems to eliminate rows which it shouldn't #977

Natural join seems to eliminate rows which it shouldn't #977

Comments

alexpdp7 commented Oct 15, 2019 • edited Loading

erizocosmico commented Oct 15, 2019

alexpdp7 commented Oct 15, 2019

erizocosmico commented Oct 15, 2019

erizocosmico commented Oct 15, 2019

alexpdp7 commented Oct 15, 2019

erizocosmico commented Oct 15, 2019

alexpdp7 commented Oct 15, 2019 •

edited

Loading