Skip to content

Conversation

@tpoterba
Copy link
Contributor

@tpoterba tpoterba commented Aug 4, 2021

Necessary for the big aggregate benchmark to pass (albeit more slowly) in benchmarks.

case Begin(xs) =>
xs.foreach(x => emitVoid(x))
case x@Begin(xs) =>
if (!ctx.inLoopCriticalPath.contains(x) && xs.forall(x => !ctx.inLoopCriticalPath.contains(x))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method splitter doesn't do well with lots of flat children -- splitting manually does better.

val subStructs = children.map(_.asInstanceOf[TStruct])
val subFields = ts.fields.map { f =>
f -> subStructs.flatMap(s => s.fieldOption(f.name))
val fieldArrays = Array.fill(ts.fields.length)(new BoxedArrayBuilder[Type])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff was a major performance hotspot.


def deserializeFromBytes(cb: EmitCodeBuilder, bytes: SBinaryCode): Unit = {
val lazyBuffer = kb.getOrDefineLazyField[MemoryBufferWrapper](Code.newInstance[MemoryBufferWrapper](), (this, "bufferWrapper"))
val lazyBuffer = kb.getOrDefineLazyField[MemoryBufferWrapper](Code.newInstance[MemoryBufferWrapper](), ("AggregatorStateBufferWrapper"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we weren't caching the way we thought.


private def extract(ir: IR, ab: BoxedArrayBuilder[(InitOp, PhysicalAggSig)], seqBuilder: BoxedArrayBuilder[IR], letBuilder: BoxedArrayBuilder[AggLet], result: IR, r: RequirednessAnalysis, isScan: Boolean): IR = {
def extract(node: IR): IR = this.extract(node, ab, seqBuilder, letBuilder, result, r, isScan)
private def extract(ir: IR, ab: BoxedArrayBuilder[(InitOp, PhysicalAggSig)], seqBuilder: BoxedArrayBuilder[IR], letBuilder: BoxedArrayBuilder[AggLet], memo: mutable.Map[IR, Int], result: IR, r: RequirednessAnalysis, isScan: Boolean): IR = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we cache agg IRs by identity, which deduplicates counts.

blocks: Array[Block]
) extends IndexedSeq[Block] {
private val blockIdx = blocks.iterator.zipWithIndex.toMap
private val blockIdx = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java map is faster than scala.

s"children lengths differed ${children.length} ${newChildren.length}. ${children} ${newChildren} ${this}")
}
children.zip(newChildren).foreach { case (r1, r2) =>
(children, newChildren).zipped.foreach { case (r1, r2) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in this file avoid allocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, I hadn't realized how zipped was implemented. The argument to foreach is actually a 2-ary function, so you don't even need the case. That may be a performance improvement, but also makes it immediately clear that we're not allocating tuples. (Likewise for the other uses of zipped in this file.)

if (!ctx.inLoopCriticalPath.contains(x) && xs.forall(x => !ctx.inLoopCriticalPath.contains(x))) {
xs.grouped(16).zipWithIndex.foreach { case (group, idx) =>
val mb = cb.emb.genEmitMethod(s"begin_group_$idx", FastIndexedSeq[ParamType](), UnitInfo)
val r = cb.newField[Region]("begin_separate_region", region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Could you just pass the region as a method argument, and avoid creating multiple fields filling up the constant pool?

s"children lengths differed ${children.length} ${newChildren.length}. ${children} ${newChildren} ${this}")
}
children.zip(newChildren).foreach { case (r1, r2) =>
(children, newChildren).zipped.foreach { case (r1, r2) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, I hadn't realized how zipped was implemented. The argument to foreach is actually a 2-ary function, so you don't even need the case. That may be a performance improvement, but also makes it immediately clear that we're not allocating tuples. (Likewise for the other uses of zipped in this file.)

val children: Seq[TypeWithRequiredness] = fields.map(_.typ)
def _unionLiteral(a: Annotation): Unit =
children.zip(a.asInstanceOf[Row].toSeq).foreach { case (r, f) => r.unionLiteral(f) }
children.iterator.zip(rowIterator(a.asInstanceOf[Row])).foreach { case (r, f) => r.unionLiteral(f) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, missed one, thanks.

requestedType.asInstanceOf[TStream].elementType,
memo)
val valueTypes = names.zip(as).map { case (name, a) =>
val valueTypes = (names, as).zipped.map { case (name, a) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person, function literals with case still force a tuple to be allocated.

@danking danking merged commit 0261f17 into hail-is:main Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants