We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello,
It seems that jbsub is a custom scheduler that we don't have access to. On my cluster one is using srun
So I tried to replace the first line (l.47) in train_propositional.sh that calls jbsub with srun, here is my file so far:
#!/bin/bash set -e trap exit SIGINT ulimit -v 16000000000 export PYTHONUNBUFFERED=1 # sokoban problem 2 has the same small screen size as problem 0, and has more than 20000 states unlike problem 0. # ('sokoban_image-20000-global-global-0-train.npz', array([56, 56, 3]), (3613, 1, 9408)) --- probelm 0 has only 3613 states! # ('sokoban_image-20000-global-global-2-train.npz', array([56, 56, 3]), (19999, 1, 9408)) export skb_train=sokoban_image-20000-global-global-2-train export SHELL=/bin/bash export common task (){ script=$1 ; shift mode=$1 # main training experiments. results are used for planning experiments $common $script $mode hanoi 4 4 {} $comment ::: 5000 ::: CubeSpaceAE_AMA{3,4}Conv $common $script $mode hanoi 3 9 {} $comment ::: 5000 ::: CubeSpaceAE_AMA{3,4}Conv $common $script $mode hanoi 4 9 {} $comment ::: 5000 ::: CubeSpaceAE_AMA{3,4}Conv $common $script $mode hanoi 5 9 {} $comment ::: 5000 ::: CubeSpaceAE_AMA{3,4}Conv $common $script $mode puzzle mnist 3 3 {} $comment ::: 5000 ::: CubeSpaceAE_AMA{3,4}Conv $common $script $mode lightsout digital 5 {} $comment ::: 5000 ::: CubeSpaceAE_AMA{3,4}Conv $common $script $mode lightsout twisted 5 {} $comment ::: 5000 ::: CubeSpaceAE_AMA{3,4}Conv $common -queue x86_12h $script $mode puzzle mandrill 4 4 {} $comment ::: 20000 ::: CubeSpaceAE_AMA3Conv $common -queue x86_24h $script $mode puzzle mandrill 4 4 {} $comment ::: 20000 ::: CubeSpaceAE_AMA4Conv $common -queue x86_6h $script $mode sokoban $skb_train {} $comment ::: 20000 ::: CubeSpaceAE_AMA3Conv $common -queue x86_12h $script $mode sokoban $skb_train {} $comment ::: 20000 ::: CubeSpaceAE_AMA4Conv $common -queue x86_12h $script $mode blocks cylinders-4-flat {} $comment ::: 20000 ::: CubeSpaceAE_AMA3Conv $common -queue x86_24h $script $mode blocks cylinders-4-flat {} $comment ::: 20000 ::: CubeSpaceAE_AMA4Conv } export -f task proj=$(date +%Y%m%d%H%M)sae-planning number=2 ################################################################ ## Train the network, and run plot, summary, dump for as the job finishes #common="parallel -j 1 --keep-order jbsub -mem 16g -cores 1+1 -queue x86_6h -proj $proj -require 'v100||a100'" common="parallel -j 1 --keep-order srun -N 1 -p g100_usr_interactive --gres=gpu:1 -proj $proj -require 'v100||a100'" export comment=kltune$number parallel -j 1 --keep-order task ./train_kltune.py learn_summary_plot_dump ::: {1..30} exit
Which creates the error:
srun: fatal: Can not execute 202205230755sae-planning
I have hard time understanding what the "202205230755sae-planning" executable corresponds to, as well as what is the "-proj" argument of jbsub
Best regards
Aymeric
The text was updated successfully, but these errors were encountered:
-proj is just a tag to assign to jobs. in my experience both LFS, Torque had this feature, surely slurm has one too.
Sorry, something went wrong.
No branches or pull requests
Hello,
It seems that jbsub is a custom scheduler that we don't have access to. On my cluster one is using srun
So I tried to replace the first line (l.47) in train_propositional.sh that calls jbsub with srun, here is my file so far:
Which creates the error:
srun: fatal: Can not execute 202205230755sae-planning
I have hard time understanding what the "202205230755sae-planning" executable corresponds to, as well as what is the "-proj" argument of jbsub
Best regards
Aymeric
The text was updated successfully, but these errors were encountered: