-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider parallelizing sampling inside of the LatentDistributionTest #520
Comments
I'd be interested in this issue |
DoD:
1st sim (time vs n_verts for a single iteration of LDT):
Sim 2 - single vs multiple iterations of LDT
|
@alyakin314 could you clarify what you mean by this? |
Yes, so what I meant is - because of how hyppo paralellizes things via joblib, if it takes X time to run the test once, it will take waaaay less time to run the test 1000 times than 1000X, because the first iteration (maybe the first several? not sure...) is the slowest one due to joblib setting some things up. I would assume @sampan501 would know more. And what I was saying - it'd be nice to know that the test becomes faster regardless of whether you were to run it a single time or 1000. Does that clarify? |
I think this tutorial may help: https://joblib.readthedocs.io/en/latest/parallel.html. Though I'm not really sure of the run time comparisons between running a serial Python and running a single thread through joblib (I would expect negligible), the performance improvements are really noticed when running over multiple threads. You should see real improvement if this line of code is the bottleneck (which I don't think it is). |
Expected Behavior
LDT that uses size correction is a little slower than uncorrected one. This is expected since it has an extra step, but it may be made faster. In particular, it may be that sampling from multivariate gaussians (which is probably done via rejection sampling or some kind of transformation and is likely not very fast) can be sped up by parallelizing here: https://github.com/microsoft/graspologic/blob/bbbc68a24ca9e2097575e7f92f809916ea3eeb46/graspologic/inference/latent_distribution_test.py#L346-L350
However, there are two caveats:
Of course, this would use the same workers as the current kwarg to LDT.
Actual Behavior
Sampling is not parallelized. :(
The text was updated successfully, but these errors were encountered: