-
Notifications
You must be signed in to change notification settings - Fork 2
Canada Compute
michaeljteng edited this page Apr 8, 2019
·
3 revisions
This doc details the usage of PLAI group's available resources at Candada Compute. If you've never used these, first register at https://ccdb.computecanada.ca/ sponsored by jav-224-01 , Frank Wood: Faculty, Computer Science, Un. of BC
under the RAPI jav-224-aa
. The cluster is managed by slurm, unlike the torque/maui commands some of you might be used to.
- Once you make an account, you can login to any of the resources below:
-
cedar.computecanada.ca
(58,416 CPU cores and 584 GPUs) -
graham.computecanada.ca
( 36,160 CPU cores and 320 GPUs) -
beluga.cpmputecanada.ca
(probably dont use this?) -
niagara.computecanada.ca
(60000 CPU cores - but we seem to not have access to this at the moment)
- On all the above resources, job submission is managed by slurm
ssh uname@someresource.computecanada.ca
squeue -u uname
- the following is guides for quickly getting started with some cluster, but for more info the actual wiki for the cluster is a lot more helpful.
TODO: will hopefully have figured out how to get multigpu jobs at some point. these clusters seem to support using 100s of networked gpus.
ssh uname@cedar.computecanada.ca
- clone your stuff to scratch
- if it is your first time running something make a python env for it (you need to do on headnode)
module load python/3.7
virtualenv --no-download /home/uname/some_env
source /home/uname/some_env/bin/activate
pip install torch ... etc
- To get a job:
salloc --time=1:0:0 --account=def-fwood [--gres=gpu:1 | --ntasks=32] --mem=2G
module load python/3.7
-
source /home/uname/some_env/bin/activate
<- with the stuff you just installed python agi.py
Once you understand what to do in the interactive session, you should start using sbatch to submit jobs.
- see cedar as far as i can tell