-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Natural join seems to eliminate rows which it shouldn't #977
Comments
Normally we don't want to join with repositories unless there are already joins involved. When querying a single table like blobs, they usually have other optimizations in place. For example, blobs with a filter like blob_hash IN list only reads the given blobs in each repository. That's why no join it's faster. As with everything: it depends on the query, depending on what you want, some optimizations may be better than others for performance. In any case, I reproduced the bug and there's actually an issue. It seems to not return the repeated rows for some reason. |
Yeah, I suspected something like that. Anyway, for my use case lack of duplicated rows is not an issue, so for my this is not high priority. |
This bug is really weird. The natural join is the one returning the correct result. If you remove the optimization in blobs table it returns the same. So, there something going on because |
@alexpdp7 are you using siva files got from gitcollector? I tried with regular repositories and it didn't happen. |
Yup, it's using siva |
Narrowed it down to a siva issue and reported it to go-borges: src-d/go-borges#90, so leaving this as blocked until it's solved on their side. |
also note that removing the natural join makes things go much faster- it was my understanding that normally we want to join with repositories to benefit from some specific optimizations (although I'm guessing that filtering with blob_hash makes those optimizations moot).
The text was updated successfully, but these errors were encountered: