Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] Better scaling on RVD.union #6943

Merged
merged 1 commit into from Aug 26, 2019
Merged

Conversation

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Aug 26, 2019

Do a tree reduce instead of a linear reduce. This means that the java
stack depth is log2(N) instead of N, and prevents stack overflow errors
when unioning hundreds of tables together.

Do a tree reduce instead of a linear reduce. This means that the java
stack depth is log2(N) instead of N, and prevents stack overflow errors
when unioning hundreds of tables together.
@patrick-schultz
Copy link
Collaborator

@patrick-schultz patrick-schultz commented Aug 26, 2019

I'm confused by the stack depth problem. reduce isn't recursive, it forwards to reduceLeft:

  def reduceLeft[B >: A](op: (B, A) => B): B = {
    if (isEmpty)
      throw new UnsupportedOperationException("empty.reduceLeft")

    var first = true
    var acc: B = 0.asInstanceOf[B]

    for (x <- self) {
      if (first) {
        acc = x
        first = false
      }
      else acc = op(acc, x)
    }
    acc
  }

Loading

@tpoterba
Copy link
Collaborator Author

@tpoterba tpoterba commented Aug 26, 2019

The problem is that in the ordered merge usage, the spark DAG builds up a stack of 200 RDDs / iterators.

Loading

@patrick-schultz
Copy link
Collaborator

@patrick-schultz patrick-schultz commented Aug 26, 2019

Ah, right, that stack.

Loading

@danking danking merged commit 990e875 into hail-is:master Aug 26, 2019
1 check passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants