Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider parallelizing sampling inside of the LatentDistributionTest #520

Closed
alyakin314 opened this issue Oct 7, 2020 · 5 comments · Fixed by #744
Closed

Consider parallelizing sampling inside of the LatentDistributionTest #520

alyakin314 opened this issue Oct 7, 2020 · 5 comments · Fixed by #744
Labels
enhancement New feature or request low priority

Comments

@alyakin314
Copy link
Contributor

Expected Behavior

LDT that uses size correction is a little slower than uncorrected one. This is expected since it has an extra step, but it may be made faster. In particular, it may be that sampling from multivariate gaussians (which is probably done via rejection sampling or some kind of transformation and is likely not very fast) can be sped up by parallelizing here: https://github.com/microsoft/graspologic/blob/bbbc68a24ca9e2097575e7f92f809916ea3eeb46/graspologic/inference/latent_distribution_test.py#L346-L350
However, there are two caveats:

  1. The results of each iteration of this loop are random, and parallelization+randomness should be treated with caution. If using Joblib, consider this resourse: https://bdpedigo.github.io/posts/2020/02/demo-parallel/ :)
  2. Each separate iteration by itself might be quite fast, so it is not clear whether parallelizing, either via joblib, or multiprocessing will even speed up or slow down the overall test. Thus it is imperative to run an experiment ensuring that parallelizing yields a better performance than not parallelizing. It is also important to ensure that this holds both when running only a single LDT and when running a simulation that requires repeated use of the LDT (because joblib has interesting performance differences when you call it only once or multiple times).

Of course, this would use the same workers as the current kwarg to LDT.

Actual Behavior

Sampling is not parallelized. :(

@alyakin314 alyakin314 added low priority enhancement New feature or request labels Oct 7, 2020
@bdpedigo bdpedigo added this to Code improvements in Neuro Data Design Oct 14, 2020
@bdpedigo bdpedigo removed this from Code improvements in Neuro Data Design Oct 14, 2020
@bdpedigo bdpedigo added this to Code improvements in Neuro Data Design Oct 16, 2020
@kareef928
Copy link
Contributor

I'd be interested in this issue

@kareef928
Copy link
Contributor

kareef928 commented Nov 11, 2020

DoD:

  • Write LDT using Joblib parallel

  • Run simulation to compare speeds of original LDT and parallelized LDT across several settings(different n_verts, single vs multiple iterations of LDT)

1st sim (time vs n_verts for a single iteration of LDT):

  1. n_verts = [10, 100, 1000, 10000, 100000]
  2. Generate two ER graphs - 1st with n_vert vertices and second with 2*n_vert vertices, and p=0.5 for both
  3. Obtain times for running LDT unparallelized (workers=1) and parallelized (workers=-1) on the two graphs
  4. Repeat steps 2-3 for 30 trials per n_vert in n_verts
  5. Plot stripplots of times vs n_verts (of graph 1) for LDT unparallelized and parallelized

Sim 2 - single vs multiple iterations of LDT

  1. n_iters = [1, 5, 25, 50, 100]
  2. Generate two ER graphs - 1st with 250 vertices and second with 500 vertices, and p=0.5 for both
  3. Obtain times for running LDT unparallelized (workers=1) and parallelized (workers=-1) on the two graphs for n_iters iterations
  4. Repeat steps 2-3 for 15 trials per n_iter in n_iters
  5. Plot stripplots of times vs n_iters for LDT unparallelized and parallelized
  • If LDT parallelized is consistently faster than original LDT, then push the parallelized version of LDT

@kareef928
Copy link
Contributor

2. It is also important to ensure that this holds both when running only a single LDT and when running a simulation that requires repeated use of the LDT

@alyakin314 could you clarify what you mean by this?

@alyakin314
Copy link
Contributor Author

  1. It is also important to ensure that this holds both when running only a single LDT and when running a simulation that requires repeated use of the LDT

@alyakin314 could you clarify what you mean by this?

Yes, so what I meant is - because of how hyppo paralellizes things via joblib, if it takes X time to run the test once, it will take waaaay less time to run the test 1000 times than 1000X, because the first iteration (maybe the first several? not sure...) is the slowest one due to joblib setting some things up. I would assume @sampan501 would know more.

And what I was saying - it'd be nice to know that the test becomes faster regardless of whether you were to run it a single time or 1000. Does that clarify?

@sampan501
Copy link

sampan501 commented Apr 1, 2021

I think this tutorial may help: https://joblib.readthedocs.io/en/latest/parallel.html.

Though I'm not really sure of the run time comparisons between running a serial Python and running a single thread through joblib (I would expect negligible), the performance improvements are really noticed when running over multiple threads. You should see real improvement if this line of code is the bottleneck (which I don't think it is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low priority
Projects
No open projects
Neuro Data Design
Code improvements
Development

Successfully merging a pull request may close this issue.

3 participants