Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grid #317

Closed
sunnycqcn opened this issue Dec 26, 2016 · 11 comments
Closed

grid #317

sunnycqcn opened this issue Dec 26, 2016 · 11 comments
Labels

Comments

@sunnycqcn
Copy link

sunnycqcn commented Dec 26, 2016

I used the commond as:

#PBS -l walltime=336:00:00
#PBS -q gcore
#PBS -l naccesspolicy=shared,nodes=1:ppn=20
cd $PBS_O_WORKDIR
module purge
module load bioinfo
module load canu
/home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/canu \
 -p asm -d strigaA \
 genomeSize=1638.1m \
 errorRate=0.035 \
  -pacbio-raw /scratch/snyder/f/fu115/Genome_assembly/fastq/seq/filtered_subreads.fastq

Then get the error as:
Could you help me check what is wrong?
Thanks,
Fuyou

/var/spool/torque/mom_priv/jobs/469384.snyder-adm.rcac.purdue.edu.SC: line 4: fg: no job control
make: *** No targets specified and no makefile found.  Stop.
/var/spool/torque/mom_priv/jobs/469384.snyder-adm.rcac.purdue.edu.SC: line 6: make-dedicated: command not found
/var/spool/torque/mom_priv/jobs/469384.snyder-adm.rcac.purdue.edu.SC: line 7: thread: command not found
-- Detected Java(TM) Runtime Environment '1.8.0_111' (from '/group/bioinfo/apps/apps/jdk1.8.0_111/bin/java').
-- Detected gnuplot version '4.6 patchlevel 6' (from 'gnuplot') and image format 'png'.
-- Detected 20 CPUs and 505 gigabytes of memory.
-- Detected PBS/Torque '5.0.1' with 'pbsnodes' binary in /usr/pbs/bin/pbsnodes.
-- Detecting PBS/Torque resources.
-- 
-- Found  50 hosts with  20 cores and  252 GB memory under PBS/Torque control.
-- Found  16 hosts with  20 cores and  504 GB memory under PBS/Torque control.
--
-- Allowed to run under grid control, and use up to   4 compute threads and   16 GB memory for stage 'bogart (unitigger)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to   4 compute threads and    2 GB memory for stage 'read error detection (overlap error adjustment)'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    1 GB memory for stage 'overlap error adjustment'.
-- Allowed to run under grid control, and use up to   4 compute threads and   32 GB memory for stage 'utgcns (consensus)'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    4 GB memory for stage 'overlap store parallel bucketizer'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    8 GB memory for stage 'overlap store parallel sorting'.
-- Allowed to run under grid control, and use up to   1 compute thread  and    6 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to   5 compute threads and    8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to   5 compute threads and    8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to   4 compute threads and    8 GB memory for stage 'meryl (k-mer counting)'.
-- Allowed to run under grid control, and use up to   2 compute threads and   16 GB memory for stage 'falcon_sense (read correction)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run under grid control, and use up to  10 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
--
-- This is canu parallel iteration #1, out of a maximum of 2 attempts.
--
-- Final error rates before starting pipeline:
--   
--   genomeSize          -- 4800000
--   errorRate           -- 0.015
--   
--   corOvlErrorRate     -- 0.045
--   obtOvlErrorRate     -- 0.045
--   utgOvlErrorRate     -- 0.045
--   
--   obtErrorRate        -- 0.045
--   
--   cnsErrorRate        -- 0.045
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Mon Dec 26 13:08:30 2016 with 739766.226 GB free disk space

    /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/gatekeeperCreate \
      -minlength 1000 \
      -o /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore.BUILDING \
      /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore.gkp \
    > /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore.BUILDING.err 2>&1

-- Finished on Mon Dec 26 13:08:31 2016 (1 second) with 739766.22 GB free disk space
----------------------------------------
--
-- In gatekeeper store '/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/ecoli.gkpStore':
--   Found 12528 reads.
--   Found 115899341 bases (24.14 times coverage).
--
--   Read length histogram (one '*' equals 20.62 reads):
--        0    999      0 
--     1000   1999   1444 **********************************************************************
--     2000   2999   1328 ****************************************************************
--     3000   3999   1065 ***************************************************
--     4000   4999    774 *************************************
--     5000   5999    668 ********************************
--     6000   6999    619 ******************************
--     7000   7999    618 *****************************
--     8000   8999    607 *****************************
--     9000   9999    560 ***************************
--    10000  10999    523 *************************
--    11000  11999    478 ***********************
--    12000  12999    429 ********************
--    13000  13999    379 ******************
--    14000  14999    366 *****************
--    15000  15999    353 *****************
--    16000  16999    329 ***************
--    17000  17999    297 **************
--    18000  18999    294 **************
--    19000  19999    283 *************
--    20000  20999    251 ************
--    21000  21999    195 *********
--    22000  22999    152 *******
--    23000  23999    132 ******
--    24000  24999     75 ***
--    25000  25999     66 ***
--    26000  26999     56 **
--    27000  27999     44 **
--    28000  28999     35 *
--    29000  29999     16 
--    30000  30999     21 *
--    31000  31999     18 
--    32000  32999     11 
--    33000  33999      8 
--    34000  34999      6 
--    35000  35999      6 
--    36000  36999     10 
--    37000  37999      2 
--    38000  38999      3 
--    39000  39999      2 
--    40000  40999      2 
--    41000  41999      2 
--    42000  42999      1 
-- Meryl attempt 1 begins.
----------------------------------------
-- Starting command on Mon Dec 26 13:08:32 2016 with 739766.22 GB free disk space

      qsub \
        -l mem=8g -l nodes=1:ppn=4 \
        -d `pwd` -N "meryl_ecoli" \
        -t 1-1 \
        -j oe -o /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.\$PBS_ARRAYID.out \
        /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.sh 
    
Array jobs are currently not supported.
See https://www.rcac.purdue.edu/news/detail.cfm?NewsID=616 for information
on converting your array job to a supported workflow.

qsub: Your job has been administratively rejected by the queueing system.
qsub: There may be a more detailed explanation prior to this notice.

-- Finished on Mon Dec 26 13:08:32 2016 (lickety-split) with 739766.22 GB free disk space
----------------------------------------
ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:
================================================================================
Please panic.  canu failed, and it shouldn't have.

Stack trace:

 at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/lib/canu/Execution.pm line 1305.
	canu::Execution::caFailure("Failed to submit batch jobs", undef) called at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/lib/canu/Execution.pm line 1010
	canu::Execution::submitOrRunParallelJob("/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli"..., "ecoli", "meryl", "/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli"..., "meryl", 1) called at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/lib/canu/Meryl.pm line 373
	canu::Meryl::merylCheck("/scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli"..., "ecoli", "cor") called at /home/fu115/DIRECTORY/canu/canu-1.4/Linux-amd64/bin/canu line 470


canu failed with 'Failed to submit batch jobs'.
@skoren
Copy link
Member

skoren commented Dec 26, 2016

Your grid does not support array jobs which Canu requires to run. It is normally a standard feature of grid systems, yours is the first we've seen where the array jobs are administratively disabled.

I would ask if the admins can allow your jobs to run with array jobs as it would be non-trivial for you to modify Canu to run without array support. Otherwise, you would have to run Canu with useGrid=remote. Every time you get to a submit command like the above:

      qsub \
        -l mem=8g -l nodes=1:ppn=4 \
        -d `pwd` -N "meryl_ecoli" \
        -t 1-1 \
        -j oe -o /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.\$PBS_ARRAYID.out \
        /scratch/snyder/f/fu115/Genome_assembly/PBonly/canutest/ecoli-autoa/correction/0-mercounts/meryl.sh 

It would stop and you would need to manually edit the command as directed on your grid's support page:
https://www.rcac.purdue.edu/news/detail.cfm?NewsID=616
Then, once that job is done re-run your Canu command which will pick up at the next step and stop again when it reaches the next array job to submit. Otherwise, you can submit the Canu command to a single node and run it with useGrid=false which will mean it will run on only a single instance which is OK for smaller genomes (<500mb).

@skoren skoren added the wontfix label Dec 26, 2016
@sunnycqcn
Copy link
Author

sunnycqcn commented Dec 26, 2016 via email

@skoren
Copy link
Member

skoren commented Dec 26, 2016

No, without array jobs or following my suggestion above, you can only run on a single node and have to use useGrid=false

@sunnycqcn
Copy link
Author

sunnycqcn commented Dec 26, 2016 via email

@sunnycqcn
Copy link
Author

sunnycqcn commented Dec 26, 2016 via email

@skoren
Copy link
Member

skoren commented Dec 26, 2016

What is in your canuS.sh script? It looks like you're setting both a grid engine and useGrid=false and the machine you are running on is not reporting the grid configuration. Set both useGrid=0 gridEngine=undefined to make sure it won't poll your grid.

@sunnycqcn
Copy link
Author

sunnycqcn commented Dec 26, 2016 via email

@skoren
Copy link
Member

skoren commented Dec 26, 2016

You are still setting useGrid=true gridEngine="pbs".

You want to submit the above script to your grid and let Canu run only on the single scheduled node so you want useGrid=false gridEngine=undefined as I said above.

@sunnycqcn
Copy link
Author

sunnycqcn commented Dec 26, 2016 via email

@sunnycqcn
Copy link
Author

sunnycqcn commented Dec 26, 2016 via email

@skoren
Copy link
Member

skoren commented Dec 26, 2016

You can't continue the run, running meryl.jobSubmit.sh will just fail to submit the job again since it relies on arrays. You can run meryl.sh by hand which should take a while after which it will continue to the next step but you have to wait for it to finish before resuming. It would be easiest to start from scratch off grid.

@skoren skoren closed this as completed Jan 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants