Skip to content

Canada Compute

michaeljteng edited this page Apr 8, 2019 · 3 revisions
  1. Canada compute cluster docs

This doc details the usage of PLAI group's available resources at Candada Compute. If you've never used these, first register at https://ccdb.computecanada.ca/ sponsored by jav-224-01 , Frank Wood: Faculty, Computer Science, Un. of BC under the RAPI jav-224-aa. The cluster is managed by slurm, unlike the torque/maui commands some of you might be used to.

Quick Start

  • Once you make an account, you can login to any of the resources below:
  • cedar.computecanada.ca (58,416 CPU cores and 584 GPUs)
  • graham.computecanada.ca ( 36,160 CPU cores and 320 GPUs)
  • beluga.cpmputecanada.ca (probably dont use this?)
  • niagara.computecanada.ca (60000 CPU cores - but we seem to not have access to this at the moment)
  • On all the above resources, job submission is managed by slurm
  • ssh uname@someresource.computecanada.ca
  • squeue -u uname
  • the following is guides for quickly getting started with some cluster, but for more info the actual wiki for the cluster is a lot more helpful.

TODO: will hopefully have figured out how to get multigpu jobs at some point. these clusters seem to support using 100s of networked gpus.

Cedar Quickstart

  • ssh uname@cedar.computecanada.ca
  • clone your stuff to scratch
  • if it is your first time running something make a python env for it (you need to do on headnode)
  1. module load python/3.7
  2. virtualenv --no-download /home/uname/some_env
  3. source /home/uname/some_env/bin/activate
  4. pip install torch ... etc
  • To get a job: salloc --time=1:0:0 --account=def-fwood [--gres=gpu:1 | --ntasks=32] --mem=2G
  • module load python/3.7
  • source /home/uname/some_env/bin/activate <- with the stuff you just installed
  • python agi.py

Once you understand what to do in the interactive session, you should start using sbatch to submit jobs.

Graham Quickstart

  • see cedar as far as i can tell
Clone this wiki locally