make ld_prune fast again #5078
This gets ld_prune on the
Big Data Test
I'm running a test on profile225 right now.
I let ld prune run for a while on a 30GB bgen file. There are known issues here, but I made some changes to avoid some useless work but that included fusing the variant filtering with the ldprune. It's a single stage spark job.
It took 1.6 Hours to do a bit more than a third of this file. So that's about 5700 seconds for ~10 GB compared to 160s for 0.7 GB, which is 35:14. I hope to try this after the BGEN fixes to see what the scaling looks like.
Anyway, as a part of BlockMatrix IR changes that Daniel will work on, we'll look into why LD Prune isn't within 8x of PLINK.
2 times, most recently
Jan 17, 2019
I made three more improvements last week (and I don't plan to do anything else now):