Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatekeeper write error #1015

Closed
stubrown opened this issue Jul 19, 2018 · 3 comments
Closed

Gatekeeper write error #1015

stubrown opened this issue Jul 19, 2018 · 3 comments

Comments

@stubrown
Copy link

stubrown commented Jul 19, 2018

I am trying to install and run CANU on a new Linux cluster running Red Hat Enterprise Linux Server release 7.4.

My problem seems to be a Gatekeeper write error. My sysadmin promises that the current /scratch volume has no Disk quota and 500 TB free space.

safeWrite()-- Write failure on writeBuffer: Disk quota exceeded
safeWrite()-- Wanted to write 1048236 objects (size=1), wrote 5000.

my OS details:

[browns02@bigpurple-ln3 Skate]$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 7.4 (Maipo)
[browns02@bigpurple-ln3 Skate]$  uname -a
Linux bigpurple-ln3 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

My canu command is fairly simple - trying to assemble some error corrected PacBio data which did not assemble very well using the PacBio HGAP program.

canu-1.7/Linux-amd64/bin/canu -p skateBP2 -d SkateCanu2  genomeSize=2.5g correctedErrorRate=0.065 corMhapSensitivity=normal gnuplotTested=true -pacbio-corrected skate_PR_corr.fasta

it starts to run, but fails immediately (likety-split). Here is the error part of the output file:

ERROR:  Failed with exit code 134.  (rc=34304)
ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   gatekeeper failed.
ABORT:
ABORT: Disk space available:  3816996.203 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (./skateCan.gkpStore.BUILDING.err):

Here is the skateCan.gkpStore.BUILDING.err

[browns02@bigpurple-ln3 Skate]$ more ./skateCan.gkpStore.BUILDING.err

Starting file './skateCan.gkpStore.gkp'.

  Loading reads from '/gpfs/home/browns02/skate_PR_corr.fasta'
safeWrite()-- Write failure on writeBuffer: Disk quota exceeded
safeWrite()-- Wanted to write 1048236 objects (size=1), wrote 5000.
gatekeeperCreate: AS_UTL/AS_UTL_fileIO.C:107: void AS_UTL_safeWrite(FILE*, const void*, const char*, size_t, size_t): Asserti
on `(*__errno_location ()) == 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
AS_UTL/AS_UTL_fileIO.C::107 in _Z16AS_UTL_safeWriteP8_IO_FILEPKvPKcmm()
AS_UTL/writeBuffer.H::82 in _ZN11writeBuffer5flushEv()
AS_UTL/writeBuffer.H::66 in _ZN11writeBuffer5writeEPvm()
stores/gkStore.C::150 in _ZN7gkStore21gkStore_stashReadDataEP10gkReadData()
stores/gatekeeperCreate.C::459 in _Z9loadReadsP7gkStoreP9gkLibraryjjP8_IO_FILES4_S4_PcRjS6_RmS6_S7_()
stores/gatekeeperCreate.C::711 in main()
(null)::0 in (null)()
(null)::0 in (null)()

Here is what it looks like as it starts up:

-- Canu 1.7
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_131' (from 'java').
-- Detected 40 CPUs and 377 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /cm/shared/apps/slurm/17.11.7/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
--
-- Found   7 hosts with  40 cores and  754 GB memory under Slurm control.
-- Found   1 host  with  40 cores and 1510 GB memory under Slurm control.
-- Found   1 host  with  40 cores and  377 GB memory under Slurm control.
-- Found  53 hosts with  40 cores and  376 GB memory under Slurm control.
-- Found  25 hosts with  40 cores and  376 GB memory under Slurm control.
-- Found   3 hosts with  40 cores and 1510 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl    256 GB   32 CPUs  (k-mer counting)
-- Grid:  cormhap   48 GB   10 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24 GB   10 CPUs  (overlap detection)
-- Grid:  utgovl    24 GB   10 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32 GB    1 CPU   (overlap store sorting)
-- Grid:  red       16 GB    4 CPUs  (read error detection)
-- Grid:  oea        8 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      512 GB   32 CPUs  (contig construction)
-- Grid:  gfa       32 GB   32 CPUs  (GFA alignment and processing)
--
-- Found PacBio corrected reads in the input files.
--
-- Generating assembly 'skateBP2' in '/gpfs/scratch/browns02/Skate/SkateCanu2'
--
-- Parameters:
--  genomeSize        2500000000
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0650 (  6.50%)
--    utgOvlErrorRate 0.0650 (  6.50%)
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0650 (  6.50%)
--    utgErrorRate    0.0650 (  6.50%)
--    cnsErrorRate    0.0650 (  6.50%)
----------------------------------------
-- Starting command on Thu Jul 19 15:25:50 2018 with 3449607.179 GB free disk space
    cd /gpfs/scratch/browns02/Skate/SkateCanu2
    sbatch \
      --mem-per-cpu=4g \
      --cpus-per-task=1   \
      -D `pwd` \
      -J 'canu_skateBP2' \
      -o canu-scripts/canu.01.out canu-scripts/canu.01.sh
Submitted batch job 1049
-- Finished on Thu Jul 19 15:25:50 2018 (lickety-split) with 3449607.179 GB free disk space
@skoren
Copy link
Member

skoren commented Jul 19, 2018

Nothing special in that code, it's just calling fwrite and checking the errno. The disk quota error is from the error returned by fwrite. So writes are failing to that disk. The size (5000) seems suspiciously like some kind of system limit. One thing I notice is that it's submitting the job to the grid using slurm. Is this folder mounted/writeable from all cluster nodes? If you don't want it to run on the cluster add useGrid=false and it'll stay on the current node. I'd also suggest trying the sample data on your system and on another mount point to see if it runs correctly.

As a sidenote, I would not use HGAP corrected reads but run from scratch. They're likely part of the reason your assembly of a skate didn't work based on our recent experience with a similar genome. Here are the parameters we used: 'corMhapMerSize=18' 'correctedErrorRate=0.105' 'corMinCoverage=0' 'ovlMerThreshold=500'

@skoren
Copy link
Member

skoren commented Aug 2, 2018

Any update on this?

@brianwalenz
Copy link
Member

Nope, no update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants