-
Notifications
You must be signed in to change notification settings - Fork 256
[compiler] Lots of compiler and code size performance improvements #10746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| case Begin(xs) => | ||
| xs.foreach(x => emitVoid(x)) | ||
| case x@Begin(xs) => | ||
| if (!ctx.inLoopCriticalPath.contains(x) && xs.forall(x => !ctx.inLoopCriticalPath.contains(x))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method splitter doesn't do well with lots of flat children -- splitting manually does better.
| val subStructs = children.map(_.asInstanceOf[TStruct]) | ||
| val subFields = ts.fields.map { f => | ||
| f -> subStructs.flatMap(s => s.fieldOption(f.name)) | ||
| val fieldArrays = Array.fill(ts.fields.length)(new BoxedArrayBuilder[Type]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff was a major performance hotspot.
|
|
||
| def deserializeFromBytes(cb: EmitCodeBuilder, bytes: SBinaryCode): Unit = { | ||
| val lazyBuffer = kb.getOrDefineLazyField[MemoryBufferWrapper](Code.newInstance[MemoryBufferWrapper](), (this, "bufferWrapper")) | ||
| val lazyBuffer = kb.getOrDefineLazyField[MemoryBufferWrapper](Code.newInstance[MemoryBufferWrapper](), ("AggregatorStateBufferWrapper")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we weren't caching the way we thought.
|
|
||
| private def extract(ir: IR, ab: BoxedArrayBuilder[(InitOp, PhysicalAggSig)], seqBuilder: BoxedArrayBuilder[IR], letBuilder: BoxedArrayBuilder[AggLet], result: IR, r: RequirednessAnalysis, isScan: Boolean): IR = { | ||
| def extract(node: IR): IR = this.extract(node, ab, seqBuilder, letBuilder, result, r, isScan) | ||
| private def extract(ir: IR, ab: BoxedArrayBuilder[(InitOp, PhysicalAggSig)], seqBuilder: BoxedArrayBuilder[IR], letBuilder: BoxedArrayBuilder[AggLet], memo: mutable.Map[IR, Int], result: IR, r: RequirednessAnalysis, isScan: Boolean): IR = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we cache agg IRs by identity, which deduplicates counts.
| blocks: Array[Block] | ||
| ) extends IndexedSeq[Block] { | ||
| private val blockIdx = blocks.iterator.zipWithIndex.toMap | ||
| private val blockIdx = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java map is faster than scala.
| s"children lengths differed ${children.length} ${newChildren.length}. ${children} ${newChildren} ${this}") | ||
| } | ||
| children.zip(newChildren).foreach { case (r1, r2) => | ||
| (children, newChildren).zipped.foreach { case (r1, r2) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes in this file avoid allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, I hadn't realized how zipped was implemented. The argument to foreach is actually a 2-ary function, so you don't even need the case. That may be a performance improvement, but also makes it immediately clear that we're not allocating tuples. (Likewise for the other uses of zipped in this file.)
| if (!ctx.inLoopCriticalPath.contains(x) && xs.forall(x => !ctx.inLoopCriticalPath.contains(x))) { | ||
| xs.grouped(16).zipWithIndex.foreach { case (group, idx) => | ||
| val mb = cb.emb.genEmitMethod(s"begin_group_$idx", FastIndexedSeq[ParamType](), UnitInfo) | ||
| val r = cb.newField[Region]("begin_separate_region", region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Could you just pass the region as a method argument, and avoid creating multiple fields filling up the constant pool?
| s"children lengths differed ${children.length} ${newChildren.length}. ${children} ${newChildren} ${this}") | ||
| } | ||
| children.zip(newChildren).foreach { case (r1, r2) => | ||
| (children, newChildren).zipped.foreach { case (r1, r2) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, I hadn't realized how zipped was implemented. The argument to foreach is actually a 2-ary function, so you don't even need the case. That may be a performance improvement, but also makes it immediately clear that we're not allocating tuples. (Likewise for the other uses of zipped in this file.)
| val children: Seq[TypeWithRequiredness] = fields.map(_.typ) | ||
| def _unionLiteral(a: Annotation): Unit = | ||
| children.zip(a.asInstanceOf[Row].toSeq).foreach { case (r, f) => r.unionLiteral(f) } | ||
| children.iterator.zip(rowIterator(a.asInstanceOf[Row])).foreach { case (r, f) => r.unionLiteral(f) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, missed one, thanks.
| requestedType.asInstanceOf[TStream].elementType, | ||
| memo) | ||
| val valueTypes = names.zip(as).map { case (name, a) => | ||
| val valueTypes = (names, as).zipped.map { case (name, a) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in person, function literals with case still force a tuple to be allocated.
Necessary for the big aggregate benchmark to pass (albeit more slowly) in benchmarks.