Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler] don't force TableRead to produce a TableValue #13229

Merged
merged 5 commits into from Jul 11, 2023

Conversation

patrick-schultz
Copy link
Collaborator

@patrick-schultz patrick-schultz commented Jul 7, 2023

Currently TableRead.execute always produces a TableValueIntermediate, even though almost all TableReaders are lowerable, so could produce a TableStageIntermediate.

This pr refactors TableReader to allow producing a TableStageIntermediate in most cases, and to make it clearer which readers still need to be lowered (only TableFromBlockMatrixNativeReader, MatrixVCFReader, and MatrixPLINKReader). It also deletes some now dead code.

@patrick-schultz patrick-schultz changed the title [compiler] remove TableReader RDD implementations where possible [compiler] don't force TableRead to produce a TableValue Jul 10, 2023
@patrick-schultz
Copy link
Collaborator Author

Put WIP back on, noticed something I still want to look into.

@patrick-schultz
Copy link
Collaborator Author

Alright, should be ready now.

Comment on lines 512 to 525
final def lower(ctx: ExecuteContext, requestedType: TableType, dropRows: Boolean): TableStage =
if (dropRows) {
val globals = lowerGlobals(ctx, requestedType.globalType)

TableStage(
globals,
RVDPartitioner.empty(ctx, requestedType.keyType),
TableStageDependency.none,
MakeStream(FastIndexedSeq(), TStream(TStruct.empty)),
(_: Ref) => MakeStream(FastIndexedSeq(), TStream(requestedType.rowType)))
} else
_lower(ctx, requestedType)

def _lower(ctx: ExecuteContext, requestedType: TableType): TableStage
Copy link
Collaborator

@ehigham ehigham Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this change adds other than a template pattern which I generally avoid whenever possible. I preferred keeping this logic in LowerTableIr.

How is _lower different to lower? Who else uses lower with dropRows other than TableRead? _lower seems like something that shouldn't be public and calling it directly seems like a mistake.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pattern we use occasionally. _lower should really be protected, I'll make that change, and it is only called directly in lower, which is final. So _lower is the customization point for subclasses.

Copy link
Collaborator

@ehigham ehigham Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to understand is why do we need _lower at all when TableRead seems to be the only place that uses dropRows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand now. That's fair. Addressed.

Comment on lines 484 to 486
def toExecuteIntermediate(ctx: ExecuteContext, requestedType: TableType, dropRows: Boolean): TableExecuteIntermediate = {
assert(!dropRows)
TableExecuteIntermediate(_lower(ctx, requestedType))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this interface change. One thing I'd suggest is to remove the TableExecuteIntermediate companion object and construct TableStageIntermediate and TableValueIntermediate directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a bad change, but given that it will all be going away when everything is lowered, I don't think there's much benefit.

@patrick-schultz patrick-schultz dismissed ehigham’s stale review July 10, 2023 18:50

addressed/responded

Copy link
Collaborator

@ehigham ehigham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: - this is a great change

@danking danking merged commit 479ef56 into hail-is:main Jul 11, 2023
8 checks passed
danking pushed a commit that referenced this pull request Jul 27, 2023
#13313)

On the current release (0.2.119), `is.hail.rvd.RVD.localSort` attempts
to serialize an entire `is.hail.rvd.RVD` object when called, which
causes some operations, like reading a set of VCF files, to fail. That
particular operation no longer fails due to some combination of the
changes made to reading VCF files in
#13206 and
#13229, but the serialization could
still cause issues if `is.hail.rvd.RVD.localSort` is called in another
context.

See
https://hail.zulipchat.com/#narrow/stream/123011-Hail-Query-Dev/topic/Possible.20bug.20in.20Hail.200.2E2.2E119/near/378610151
for more information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants