Skip to content

XSEDE 2012 Tutorial

pradeepmantha edited this page Jun 29, 2012 · 25 revisions

Do you want to run large-scale data/compute intensive applications on "distributed, heterogenous" HPC clusters with "minimal" queue waiting time? Then the following tutorial is for you.

  1. What software tools could help me to solve that?
  2. How/Why does the software solve the problems?
  3. Great! Any simple examples?
  4. Good! Any real science example?
  5. I want to try it on my own cluster, any support?

ToDo's for Team:

We will also demo a multi-machine run. Because of the logistics issues involved, we will need to submit this job either:

  1. In the morning, have it run for the entire day
  2. In a reserved queue.

For the queue reservation, Yaakoub is the person to contact. I will need the demo size, the username from which the demo will be submitted and so on.

Other things to consider including queue wait times. I can and will add the training accounts to an allocation with escalated privilege and a reserved chassis. This means people using the training accounts will have hardware reserved for them to run.

Which version of Bliss & Pilot-API need to be used? Current roadblocks -

Latest version of BigJob not released - because -
      AndreL - Enough testing not done to release the package.
      Ole, Melissa, Pradeep - Waiting for the new package to release to test. ( deadlock? )
                            - Somehow reluctant to test directly from source.

 Solution - If the released alpha package is installed in a separate directory by AndreM,
          - Ole, Melissa, Pradeep test the alpha package 
              - if problems found report all the bugs, new production version will be released.
              - No bugs- Great, thats our tutorial version of Pilot-API and Bliss.