Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] use val for IndexedRVDSpec partitioner #6629

Merged
merged 2 commits into from Jul 12, 2019

Conversation

@chrisvittal
Copy link
Collaborator

commented Jul 12, 2019

When using indexed reads, we were creating a new partitioner for every
partiton in tmpPartitioner. It was not good.

[hail] use val for IndexedRVDSpec partitioner
When using indexed reads, we were creating a new partitioner for every
partiton in tmpPartitioner. It was not good.
@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2019

oof, this is bad enough we might want to do yet another release

@tpoterba
Copy link
Collaborator

left a comment

add a benchmark that will catch this performance problem

@chrisvittal

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

You have no idea. I went from waiting over an hour for a stage to begin to fast enough.

@chrisvittal

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

This uses an undocumented feature. It can wait for a little while.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2019

oh, phew, it's only on readIndexed

@chrisvittal

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

Yeah.

@chrisvittal chrisvittal force-pushed the chrisvittal:indexedrvdspec-val-partitioner branch from 7a44673 to bfd521a Jul 12, 2019

@chrisvittal

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

This benchmark is in some ways bad. The real problem is in the compiler/orchestration, not in any execution. I feel like we need a _do_nothing() that doesn't execute any code, but runs a pipeline through everything in the compiler and stops short before anything that would submit a spark job.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2019

yeah, I agree. But we can also get that effect by having tiny data but big pipelines

@danking danking merged commit 84ec5c1 into hail-is:master Jul 12, 2019

1 check passed

ci-test success
Details

Xophmeister added a commit to wtsi-hgi/hgi-cloud that referenced this pull request Aug 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.