Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with batchJobs #46

Open
stephens999 opened this issue May 26, 2015 · 4 comments
Open

Integration with batchJobs #46

stephens999 opened this issue May 26, 2015 · 4 comments

Comments

@stephens999
Copy link
Owner

We've had various discussions about how to provide better support for long-running jobs.
To me it seems that by making use of batchJobs, and particularly its waitForJobs function,
we should be able to get something that works with relatively little code.

Currently we have, in run_dsc, the code:

runScenarios(dsc,scenariosubset,seedsubset)
runMethods(dsc,scenariosubset,methodsubset,seedsubset)
runOutputParsers(dsc)
runScores(dsc,scenariosubset,methodsubset)

The simplest approach that I can see would involve submitting jobs to do each
of these functions, and using waitForJobs to wait between each job set.

runScenarios (by submitting to batchJobs)
waitForJobs()
runMethods (again through batchJobs, a second registry of jobs this)
waitForJobs()
runOutputparsers (again through batchJobs, a third registry)
waitForJobs()
runScores (batchJobs, a fourth registry)
waitForJobs()

@ramanshah is there a reason you can see that this would not work?

@ramanshah
Copy link
Contributor

I've thought about this strategy. Here are my reasons for hesitation:

  1. Some clusters have an enormous amount of latency between job submission and job execution. I've done a lot of my past research on clusters where the wait between job submission and the beginning of job execution tends to run in the 1-4 day range. Quadrupling such latency would be painful.
  2. I know for some fields (e.g. in quantitative finance; Rick's experiences seem to agree with this) that a really slow "brute force" methodology involving some pedantic Monte Carlo simulation is often the benchmark for the more clever methodology. This might also be true for our genomic work but I'm not sure. A single extremely slow method in a dsc would hold up all of the faster methods.
  3. As you have mentioned in other issues, it is likely that input parsers and the use of multiple pre/post processing steps could make the maximum number of global barriers (waitForJobs()) even larger.

If you feel these aren't important, we can definitely do it this way. Your suggestion is probably the simplest implementation.

@stephens999
Copy link
Owner Author

I think 1 is presumably going to depend on the cluster environment, but it isn't a problem I have come across in practice with the clusters we are using.

For 2 this scenario is indeed not out of the question, but easily dealt with: first run your dsc for all the fast methods. Then add the slow method and run that.

For 3, I agree, but actually suspect that in most use cases it will be
the methods that are the rate-determining step, not the waitForJobs() on parsers etc.

I think the issue is urgent enough, and this approach simple enough, that we would be best off implementing it first, and seeing what our next bottleneck turns out to be.

@ramanshah
Copy link
Contributor

Sounds good.

@ramanshah
Copy link
Contributor

Probably addresses #23 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants