Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler] Lots of compiler and code size performance improvements #10746

Merged
merged 3 commits into from Aug 10, 2021

Conversation

tpoterba
Copy link
Contributor

@tpoterba tpoterba commented Aug 4, 2021

Necessary for the big aggregate benchmark to pass (albeit more slowly) in benchmarks.

case Begin(xs) =>
xs.foreach(x => emitVoid(x))
case x@Begin(xs) =>
if (!ctx.inLoopCriticalPath.contains(x) && xs.forall(x => !ctx.inLoopCriticalPath.contains(x))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method splitter doesn't do well with lots of flat children -- splitting manually does better.

@@ -172,17 +173,77 @@ object PruneDeadFields {
t match {
case ts: TStruct =>
val subStructs = children.map(_.asInstanceOf[TStruct])
val subFields = ts.fields.map { f =>
f -> subStructs.flatMap(s => s.fieldOption(f.name))
val fieldArrays = Array.fill(ts.fields.length)(new BoxedArrayBuilder[Type])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff was a major performance hotspot.

@@ -36,14 +36,14 @@ trait AggregatorState {
def deserialize(codec: BufferSpec): (EmitCodeBuilder, Value[InputBuffer]) => Unit

def deserializeFromBytes(cb: EmitCodeBuilder, bytes: SBinaryCode): Unit = {
val lazyBuffer = kb.getOrDefineLazyField[MemoryBufferWrapper](Code.newInstance[MemoryBufferWrapper](), (this, "bufferWrapper"))
val lazyBuffer = kb.getOrDefineLazyField[MemoryBufferWrapper](Code.newInstance[MemoryBufferWrapper](), ("AggregatorStateBufferWrapper"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we weren't caching the way we thought.

val (initOps, pAggSigs) = ab.result().unzip
val rt = TTuple(initOps.map(_.aggSig.resultType): _*)
ref._typ = rt

Aggs(postAgg, Begin(initOps), addLets(Begin(seq.result()), let.result()), pAggSigs)
}

private def extract(ir: IR, ab: BoxedArrayBuilder[(InitOp, PhysicalAggSig)], seqBuilder: BoxedArrayBuilder[IR], letBuilder: BoxedArrayBuilder[AggLet], result: IR, r: RequirednessAnalysis, isScan: Boolean): IR = {
def extract(node: IR): IR = this.extract(node, ab, seqBuilder, letBuilder, result, r, isScan)
private def extract(ir: IR, ab: BoxedArrayBuilder[(InitOp, PhysicalAggSig)], seqBuilder: BoxedArrayBuilder[IR], letBuilder: BoxedArrayBuilder[AggLet], memo: mutable.Map[IR, Int], result: IR, r: RequirednessAnalysis, isScan: Boolean): IR = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we cache agg IRs by identity, which deduplicates counts.

@@ -3,13 +3,17 @@ package is.hail.lir
class Blocks(
blocks: Array[Block]
) extends IndexedSeq[Block] {
private val blockIdx = blocks.iterator.zipWithIndex.toMap
private val blockIdx = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java map is faster than scala.

@@ -93,7 +93,7 @@ sealed abstract class BaseTypeWithRequiredness {
throw new AssertionError(
s"children lengths differed ${children.length} ${newChildren.length}. ${children} ${newChildren} ${this}")
}
children.zip(newChildren).foreach { case (r1, r2) =>
(children, newChildren).zipped.foreach { case (r1, r2) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in this file avoid allocation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, I hadn't realized how zipped was implemented. The argument to foreach is actually a 2-ary function, so you don't even need the case. That may be a performance improvement, but also makes it immediately clear that we're not allocating tuples. (Likewise for the other uses of zipped in this file.)

if (!ctx.inLoopCriticalPath.contains(x) && xs.forall(x => !ctx.inLoopCriticalPath.contains(x))) {
xs.grouped(16).zipWithIndex.foreach { case (group, idx) =>
val mb = cb.emb.genEmitMethod(s"begin_group_$idx", FastIndexedSeq[ParamType](), UnitInfo)
val r = cb.newField[Region]("begin_separate_region", region)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Could you just pass the region as a method argument, and avoid creating multiple fields filling up the constant pool?

@@ -93,7 +93,7 @@ sealed abstract class BaseTypeWithRequiredness {
throw new AssertionError(
s"children lengths differed ${children.length} ${newChildren.length}. ${children} ${newChildren} ${this}")
}
children.zip(newChildren).foreach { case (r1, r2) =>
(children, newChildren).zipped.foreach { case (r1, r2) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, I hadn't realized how zipped was implemented. The argument to foreach is actually a 2-ary function, so you don't even need the case. That may be a performance improvement, but also makes it immediately clear that we're not allocating tuples. (Likewise for the other uses of zipped in this file.)

@@ -361,7 +362,7 @@ sealed abstract class RBaseStruct extends TypeWithRequiredness {
def size: Int = fields.length
val children: Seq[TypeWithRequiredness] = fields.map(_.typ)
def _unionLiteral(a: Annotation): Unit =
children.zip(a.asInstanceOf[Row].toSeq).foreach { case (r, f) => r.unionLiteral(f) }
children.iterator.zip(rowIterator(a.asInstanceOf[Row])).foreach { case (r, f) => r.unionLiteral(f) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, missed one, thanks.

@@ -979,15 +1040,15 @@ object PruneDeadFields {
val bodyEnv = memoizeValueIR(body,
requestedType.asInstanceOf[TStream].elementType,
memo)
val valueTypes = names.zip(as).map { case (name, a) =>
val valueTypes = (names, as).zipped.map { case (name, a) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person, function literals with case still force a tuple to be allocated.

@danking danking merged commit 0261f17 into hail-is:main Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants