-
Notifications
You must be signed in to change notification settings - Fork 251
[compiler] Code-Generated TableIntervalJoin(..., product = true) #13617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2d18dd9
to
f3bca68
Compare
4d5a538
to
d13efa8
Compare
streamintervaljoin works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work! Thanks for pushing on the infrastucture for generating multiple classes, I think that's super valuable. I have some comments, but they're mostly small things.
val expectedArgs = | ||
if (callee.mb.isStatic) callee.emitParamTypes | ||
else CodeParamType(callee.ecb.cb.ti) +: callee.emitParamTypes | ||
|
||
val args = _args.toArray | ||
|
||
if (expectedArgs.size != args.length) | ||
throw new RuntimeException(s"invoke ${ callee.mb.methodName }: wrong number of parameters: " + | ||
s"expected ${ expectedArgs.size }, found ${ args.length }") | ||
val codeArgs = args.indices.flatMap { i => | ||
val arg = args(i) | ||
val pt = expectedArgs(i) | ||
throw new RuntimeException(s"invoke ${callee.mb.methodName}: wrong number of parameters: " + | ||
s"expected ${expectedArgs.size}, found ${args.length}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should do this check in CodeBuilder.invoke[Code]
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the type-checking was done in EmitCodeBuilder. Perhaps some generic version should be in CodeBuilderLike.
I don't think this is essential for this change. We can revist this later for a proper tidy-up (which perhaps would be nice to do at some point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I tried to add an assert. This caused way too many failures. In order to get this to work properly I think we need to track more information about class hierarchy in the TypeInfo
object.
This code fails:
val F = FunctionBuilder[Int]("F")
...
val a = cb.newLocal("a", Code.newInstance(F.cb, F.cb.ctor, FastSeq())
cb.invoke(F.mb, a) // typecheck failure
a
has type LocalRef[AsmFunction0[Int]]
because the compiler infers it from the FunctionBilder (type FunctionBuilder[AsmFunction0[Int]]
). The TypeInfo
on F
, however, is L_C1F;
, which is not the same as the TypeInfo
on a
: Lis/hail/asm4s/AsmFunction0;
Worthwhile doing, but I think it may be a larger change to get this to work properly.
hail/src/main/scala/is/hail/expr/ir/streams/StagedMinHeap.scala
Outdated
Show resolved
Hide resolved
Thank you for your review, Patrick - it has significantly improved this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
Add staged code generation capabilities by lowering to and emitting
StreamLeftIntervalJoin
.The
EmitStream
rule for this node uses a code-generated min heap* ofintervals that contain the current key in the left stream, ordered by
right endpoint.
The row type of the join result is the left row with an array of values**
from the right table, ordered by their associated interval's right endpoint.
Note that this implementation is by no means optimised. There are a number of
opportunities that I'd like to consider in subsequent changes, including:
management per element is required.
* This min heap is code generated instead of using an off-the-shelf
implementation as:
SType
to a java class or interface toparameterise the heap with
** The value is a row with the key fields omitted.
Additional Changes:
this
an explicit argument toCodeBuilder.invoke()
referenceGenomes
fromExecuteContext
Backend
; in practice these come from the backend anyway.