Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] use val for IndexedRVDSpec partitioner #6629

Merged
merged 2 commits into from Jul 12, 2019

Conversation

@chrisvittal
Copy link
Collaborator

@chrisvittal chrisvittal commented Jul 12, 2019

When using indexed reads, we were creating a new partitioner for every
partiton in tmpPartitioner. It was not good.

When using indexed reads, we were creating a new partitioner for every
partiton in tmpPartitioner. It was not good.
@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Jul 12, 2019

oof, this is bad enough we might want to do yet another release

Loading

Copy link
Collaborator

@tpoterba tpoterba left a comment

add a benchmark that will catch this performance problem

Loading

@chrisvittal
Copy link
Collaborator Author

@chrisvittal chrisvittal commented Jul 12, 2019

You have no idea. I went from waiting over an hour for a stage to begin to fast enough.

Loading

@chrisvittal
Copy link
Collaborator Author

@chrisvittal chrisvittal commented Jul 12, 2019

This uses an undocumented feature. It can wait for a little while.

Loading

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Jul 12, 2019

oh, phew, it's only on readIndexed

Loading

@chrisvittal
Copy link
Collaborator Author

@chrisvittal chrisvittal commented Jul 12, 2019

Yeah.

Loading

@chrisvittal chrisvittal force-pushed the indexedrvdspec-val-partitioner branch from 7a44673 to bfd521a Jul 12, 2019
@chrisvittal
Copy link
Collaborator Author

@chrisvittal chrisvittal commented Jul 12, 2019

This benchmark is in some ways bad. The real problem is in the compiler/orchestration, not in any execution. I feel like we need a _do_nothing() that doesn't execute any code, but runs a pipeline through everything in the compiler and stops short before anything that would submit a spark job.

Loading

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Jul 12, 2019

yeah, I agree. But we can also get that effect by having tiny data but big pipelines

Loading

@danking danking merged commit 84ec5c1 into hail-is:master Jul 12, 2019
1 check passed
Loading
Xophmeister added a commit to wtsi-hgi/hgi-cloud that referenced this issue Aug 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants