- 22-JAN 2024
- Modified
anaconda3
tominiforge3
injob.psb
, as NSCC no longer supportsanaconda3
.
- Modified
- 18-OCT 2023
- Simplified the tutorial so that it only uses conda instead of singularity anymore.
- Corrected the project ID for our group.
- 3-APR 2023
- Added the instruction for using Anaconda3.
- 10-MAR 2023
- Revised the
job.psb
so that the examples for applying more than 1 gpus are provided.
- Revised the
- 3-MAR 2023
- Revised the
job.psb
so that the container can know the gpu index. This prevent the bugAssertionError: Invalid device id
when loading a checkpoint. The bug is caused by the unrecognizable gpu index assigned by the PBS-PRO to the variableCUDA_VISIBLE_DEVICES
, which further causedtorch.cuda.is_available() = True
, yettorch.cuda.device_count() = 0
. The revisedjob.psb
manually assigns the gpu index instead.
- Revised the
- 1-MAR 2023
- Uploaded the correct container file (for option 2).
First, we need to prepare the Python environment for our code. Login to your NSCC, type
module avail
so that you will see the list of all the modules. Find the Miniforge3
. (It is a open-sourced version of Anaconda, the two are equivalent in usage.) It should be something like miniforge3/23.10
. Load it by typing
module load miniforge3/23.10
Now you can use the conda
command. Then the rest is all the same. By same I mean you can condo create
your environment and then condo install
or pip install
your packages. This option is available for the new nscc. By using this you don't need to bother with the messy singularity container.
First, edit your job definition. See jpb.psb
in detail! The examples and comments there covered everything!
Then, upload your dataset, code, and job.psb
to NSCC. I always put job.psb
and main.py
in the same directory for convenience. Moreover, following NSCC's instruction, large files like dataset should be stored in ~/scratch
directory.
Finally, in the NSCC terminal, cd to the path storing main.py
, and type
qsub job.pbs
to submit your job. If your main.py
needs arguments, and you have already edited your job.psb
accordingly (see job.psb
for example), simply feed them with -v
flag and comma separator as
qsub -v bs=32,e=100 job.pbs
and use below for feeding lists
qsub -v bs=32,e=100,modal="visual audio" job.psb
qstat
: see the job numbers and status of your submitted jobs, but you don't know what variables you fed to the job.qstat -x -f
: see the summary of your recently submitted jobs, you can see the variables fed to the job if any.qdel <jobid>
: kill a job.qdel -W force <jobid>
: force kill a job, use this when a normal kill cannot work.
:)