Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fasterq-dump err #214

Closed
drZhan opened this issue Aug 10, 2019 · 16 comments
Closed

fasterq-dump err #214

drZhan opened this issue Aug 10, 2019 · 16 comments

Comments

@drZhan
Copy link

drZhan commented Aug 10, 2019

Hi,
there is the problem about fasterq-dump. It worked properly when running in other SRA file.
here is my error information.

$ fasterq-dump -p -3 SRR316212.sra -O fastq
join :|------------------------- 49.56%2019-08-10T14:55:09 fasterq-dump.2.9.6 err: cmn_iter.c cmn_read_String( #92778497 ).VCursorCellDataDirect() -> RC(rcVDB,rcFunction,rcExecuting,rcConstraint,rcViolated)
--------------------- 91.55%

@groverj3
Copy link

I just had this error today as well.

@aboshkin
Copy link
Contributor

aboshkin commented Sep 3, 2019

@groverj3, was it on SRR316212 or a different accession? Did you prefetch the file or used it directly? Is it the same error every time you run? If the file is on your disk, try to run vdb-validate on it.

@groverj3
Copy link

groverj3 commented Sep 3, 2019

Actually, at least two different accessions. I was getting all the runs from a couple different bioprojects and got errors with both fastq-dump and fasterq-dump.

There was also an error with prefetch, but obviously a different one. I, unfortunately, don't have the exact SRR# or those other error messages handy at the moment. But if I get a spare moment I can do some digging.

I was able to get them directly as .fastq.gz files from the ENA instead. Which was super slow.

@groverj3
Copy link

groverj3 commented Sep 6, 2019

Error happened again with SRR6294776. I let it sit there until it exited, and no file was downloaded. I tried it again, by itself (other files downloaded without issue) and the exact error was:

2019-09-06T21:48:09 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_uint8_array( #23265281 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2019-09-06T21:48:09 fasterq-dump.2.10.0 err: row #23265281 : READ.len(40) != QUALITY.len(0) (A)
2019-09-06T21:48:25 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_String( #28508161 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2019-09-06T21:48:27 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_String( #786433 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2019-09-06T21:48:31 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_String( #17104897 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2019-09-06T21:49:41 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_String( #6488065 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2019-09-06T21:49:51 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_uint8_array( #23265282 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2019-09-06T21:49:51 fasterq-dump.2.10.0 err: row #23265282 : READ.len(40) != QUALITY.len(0) (A)
2019-09-06T21:49:56 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_String( #11534337 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)

Before I killed it.

Trying prefetch now.

@groverj3
Copy link

groverj3 commented Sep 6, 2019

prefetch also fails with:

2019-09-06T21:51:03 prefetch.2.10.0: 1) Downloading 'SRR6294776'...
2019-09-06T21:51:03 prefetch.2.10.0:  Downloading via https...
2019-09-06T22:04:16 prefetch.2.10.0 int: transfer incomplete while reading file within network system module - Cannot KStreamRead: https://sra-download.ncbi.nlm.nih.gov/sos/sra-pub-run-1/SRR6294776/SRR6294776.1
2019-09-06T22:04:16 prefetch.2.10.0:  https download failed
2019-09-06T22:04:16 prefetch.2.10.0: 1) failed to download SRR6294776

@groverj3
Copy link

groverj3 commented Sep 6, 2019

A few more potentially helpful details:

Navigating to https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR6294776 results in waiting a very long time, but it does eventually come up. Once there, downloading using wget with the https link https://sra-download.ncbi.nlm.nih.gov/sos/sra-pub-run-1/SRR6294776/SRR6294776.1 takes a long time to respond and is very slow, but it does eventually start.

Meanwhile, navigating to https://www.ebi.ac.uk/ena/data/view/SRR6294776 works fine. Also, downloading from ENA'S ftp link to the SRA file is fast. Though, their direct .fastq.gz file link is slow.

fasterq-dump works just fine on the manually downloaded SRA file.

@osris
Copy link

osris commented Sep 6, 2019 via email

@FatihSarigol
Copy link

Hello,
Here is a similar error I faced, even though exact same command worked with two other files before without a problem.

fasterq-dump SRR8371669 --split-3 -e 6 -O . -t TEMP

Version fasterq-dump-orig.2.10.0

Error message after downloading 12 files that add up to 48GB in the TEMP directory:

2020-02-05T18:36:14 fasterq-dump.2.10.0 err: cmn_iter.c cmn_read_uint8_array( #127244309 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted) 2020-02-05T18:36:14 fasterq-dump.2.10.0 err: row #127244309 : READ.len(300) != QUALITY.len(0) (F) 2020-02-05T18:36:14 fasterq-dump.2.10.0 fatal: SIGNAL - Segmentation fault

@colin986
Copy link

colin986 commented Mar 9, 2020

Hi,

Same error for me for

sratoolkit.2.10.4-ubuntu64/bin/fasterq-dump SRR10572657 -e 32 -o data/sra -t data/tmp
2020-03-09T22:11:44 fasterq-dump.2.10.4 err: cmn_iter.c cmn_read_String( #48847424 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)       ---- 34.08%2020-03-09T22:12:59 fasterq-dump.2.10.4 err: cmn_iter.c cmn_read_uint8_array( #18261357 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2020-03-09T22:12:59 fasterq-dump.2.10.4 err: row #18261357 : READ.len(300) != QUALITY.len(0) (F)
2020-03-09T22:12:59 fasterq-dump.2.10.4 fatal: SIGNAL - Segmentation fault

@FatihSarigol
Copy link

My problem only occurred on very large files and here is how I solved it:

Instead of using fastq-dump or fasterq-dump directly to download a sample which write the files as paired fastq files, I downloaded the sample in a non-readable .sra format as a single file which takes less space so download is faster, and possible to resume if it stops, and then converted that file to a pair of fastq files. Here is an example code:

module load sratoolkit
prefetch --max-size 100000000 SRR8371669

Above creates a folder and an sra file, I extract fastq files from it:

fastq-dump SRR8371669/SRR8371669.sra --split-3

You need to define the --max-size because normally it is set to 20GB for some reason and it wouldn't work.
Hope this helps..

@klymenko
Copy link
Contributor

Please try our latest 2.10.6 toolkit.

@thomasjtaylor
Copy link

thomasjtaylor commented Jun 18, 2020

I have a similar problem with sratoolkit-2.10.7 centos linux x64 for phs000260:
fasterq-dump --ngc $NGC --split-files --include-technical -O ./fastq SRR946491

Will try to prefetch the failing reads
(edit)
prefetch followed by fasterq-dump picked up the failing runs.

@mtinti
Copy link

mtinti commented Jul 16, 2020

Same issue here fasterq-dump.2.10.7

fasterq-dump --split-files SRR11848434 [fails]

2020-07-16T09:29:36 fasterq-dump.2.10.7 err: cmn_iter.c cmn_read_uint8_array( #4308993 ).VCursorCellDataDirect() -> RC(rcPS,rcCondition,rcWaiting,rcTimeout,rcExhausted)
2020-07-16T09:29:36 fasterq-dump.2.10.7 err: row #4308993 : READ.len(200) != QUALITY.len(0) (D)
2020-07-16T09:29:36 fasterq-dump.2.10.7 fatal: SIGNAL - Segmentation fault
fasterq-dump quit with error code 1

prefetch --max-size 100000000 SRR11848434 [works]
fastq-dump SRR11848434/SRR11848434.sra --split-files [works]

@kwrodarmer
Copy link
Contributor

because normally it is set to 20GB for some reason

Most people don't know in advance the size of what they will be downloading. Many people have high bandwidth connections, but this varies by installation and geography. prefetch prioritizes downloads by size and it asks for user input for a go-ahead on very large runs.

@kwrodarmer
Copy link
Contributor

Same issue here fasterq-dump.2.10.7

Sorry to give a similar answer, but we'd like to ask you to try once again with 2.10.8.

Most of the issues we've been chasing for some time have nothing to do with the SRA Toolkit itself, but are attempts to overcome issues with some backend networking infrastructure. 2.10.8 addresses an issue with the SRA Toolkit, where on some systems the stack sizes were too small for all runs (this is data dependent) and had been built with an inadequate stack guard. The latter problem led to cases where we could exhaust a thread's stack without detection, resulting in some data corruptions on output (e.g. READ.len(200) != QUALITY.len(0) (D)) and inevitably a segfault at some point.

NB - I have no proof that the error above was due to this issue, but it would be consistent with errors we've observed.

@klymenko
Copy link
Contributor

Please try our new 2.10.9 release.
We improved tolerance against network failures.

Then do the following:

First run prefetch <accession>
If prefetch fails - run the same command again - download will continue.

Then run fasterq-dump <accession>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants