Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rule: parallelize projects calling UAST not under exchanges #767

Closed
wants to merge 1 commit into from
Closed

rule: parallelize projects calling UAST not under exchanges #767

wants to merge 1 commit into from

Conversation

erizocosmico
Copy link
Contributor

Fixes #766

Since rows are iterated serially (except for exchange nodes), when
a Project node calls UAST, it's processed one at a time, which
underutilizes the resources in the machine.

When a Project calls UAST or UAST_MODE and is not under an Exchange
node, which already parallelizes its children, replaces the project
with a special node called parallelProject, which is essentially a
project that keeps up to N goroutines processing rows, where N is
the parallelism value of gitbase.

Signed-off-by: Miguel Molina miguel@erizocosmi.co

@erizocosmico erizocosmico requested a review from a team March 28, 2019 17:04
Copy link
Contributor

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing (generally after upgrading mysql), because in Gopkg.toml we still have:

[[override]]
  name = "github.com/moovweb/rubex"
  branch = "go1"

where mysql and enry uses go-oniguruma:

github.com/src-d/go-oniguruma v1.0.0

return nil, io.EOF
}

if i.rows == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to wrap this if by mutex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, Next is never used in parallel

Fixes #766

Since rows are iterated serially (except for exchange nodes), when
a Project node calls UAST, it's processed one at a time, which
underutilizes the resources in the machine.

When a Project calls UAST or UAST_MODE and is not under an Exchange
node, which already parallelizes its children, replaces the project
with a special node called parallelProject, which is essentially a
project that keeps up to N goroutines processing rows, where N is
the parallelism value of gitbase.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
@erizocosmico
Copy link
Contributor Author

Should be passing now

@kuba-- kuba-- self-requested a review April 2, 2019 10:14
@ajnavarro
Copy link
Contributor

ajnavarro commented Apr 2, 2019

Let's keep this on pause after the new fixed RC

@smola
Copy link
Contributor

smola commented Oct 16, 2019

@erizocosmico Does the parallel project alter row order? If it does, it can give wrong results if ran after an ORDER BY.

@erizocosmico
Copy link
Contributor Author

@smola yes, you're right. It needs to check if it has an order by underneath and not execute it in parallel in that case or return the rows in the same order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize expensive projects that are not under an exchange
4 participants