Canada Compute

Canada compute cluster docs

This doc details the usage of PLAI group's available resources at Candada Compute. If you've never used these, first register at https://ccdb.computecanada.ca/ sponsored by jav-224-01 , Frank Wood: Faculty, Computer Science, Un. of BC under the RAPI jav-224-aa. The cluster is managed by slurm, unlike the torque/maui commands some of you might be used to.

Quick Start

Once you make an account, you can login to any of the resources below:

cedar.computecanada.ca (58,416 CPU cores and 584 GPUs)
graham.computecanada.ca ( 36,160 CPU cores and 320 GPUs)
beluga.cpmputecanada.ca (probably dont use this?)
niagara.computecanada.ca (60000 CPU cores - but we seem to not have access to this at the moment)

On all the above resources, job submission is managed by slurm
ssh uname@someresource.computecanada.ca
squeue -u uname
the following is guides for quickly getting started with some cluster, but for more info the actual wiki for the cluster is a lot more helpful.

TODO: will hopefully have figured out how to get multigpu jobs at some point. these clusters seem to support using 100s of networked gpus.

Cedar Quickstart

ssh uname@cedar.computecanada.ca
clone your stuff to scratch
if it is your first time running something make a python env for it (you need to do on headnode)

module load python/3.7
virtualenv --no-download /home/uname/some_env
source /home/uname/some_env/bin/activate
pip install torch ... etc

To get a job: salloc --time=1:0:0 --account=def-fwood [--gres=gpu:1 | --ntasks=32] --mem=2G
module load python/3.7
source /home/uname/some_env/bin/activate <- with the stuff you just installed
python agi.py

Once you understand what to do in the interactive session, you should start using sbatch to submit jobs.

Graham Quickstart

see cedar as far as i can tell

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canada Compute

Canada compute cluster docs

Quick Start

Cedar Quickstart

Graham Quickstart

Clone this wiki locally