Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastq-dump.2.8.2 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -76 ( NET - Reading information from the socket failed ) #139

Closed
Homap opened this issue Jul 17, 2018 · 9 comments

Comments

@Homap
Copy link

Homap commented Jul 17, 2018

Hello,

I get the following error when using fastqdump:
fastq-dump.2.8.2 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -76 ( NET - Reading information from the socket failed )

However, the files seem to be downloaded successfully and there is a small report of the number of reads:
Read 90716435 spots for SRR955798
Written 90716435 spots for SRR955798

I saw a similar error report in github page but I was still concerned. Can I trust the files downloaded?

Thank you,
Homa

@Homap Homap closed this as completed Jul 17, 2018
@kwrodarmer
Copy link
Contributor

Many people would be surprised that a command line tool like this can output error text and still complete successfully. We are aware of the confusion this creates, and are working on changing this interaction in our tools. But for today, it's what we have.

The issue is that the tool does in fact have network-related errors as it executes. When it encounters these errors, it reports them because the error may result in a failure to perform the task of converting the run to fastq. At the same time, the tool works hard to overcome these errors by trying again and again until it eventually gives up or succeeds. If it eventually gives up, then the important error(s) are the first ones that it encounters, and that's why we emit them when they occur.

Network errors are beyond the control of our software, and can be introduced by dozens or hundreds of devices in between your computer and our servers. The best we can do is to report them when detected and try again in the hope that the error is transient as so many of them are.

The Unix status code will tell you whether the tool succeeded or not, and the appearance of the final result tally also tells you that the tool completed successfully.

@Homap
Copy link
Author

Homap commented Jul 17, 2018

Thank you for your reply! I see, this error appears in most runs but they all have the report of the number of reads at the end and appear to have completed successfully.

@kwrodarmer
Copy link
Contributor

If you are converting a large number of runs, you may want to get the latest software and look at fasterq-dump. Depending upon your environment, it could be useful and it is less prone to the types of spurious errors you observe. That said, it can still run into errors, but its access patterns tend to reduce their likelihood.

@Homap
Copy link
Author

Homap commented Jul 17, 2018

This can be really useful for me. From the manual, I see just giving the SRR accession, it will download SRR_1.fastq and SRR_2.fastq without the need to specify "split" as it is used in fastqdump, is that correct?

Thanks a lot again!

@Homap
Copy link
Author

Homap commented Jul 17, 2018

Just one more question. How can I formally check if the download of the fastq files has been done correctly? Is there a way to check that?

Thank you again!

@kwrodarmer
Copy link
Contributor

kwrodarmer commented Jul 17, 2018

There are very few ways of absolutely proving that a fastq file set represents the identical information contained in an SRA object. Some of the common validations are a count of the number of reads, but this has to be done with care since traditionally there are filters applied on output that can cause reads to be dropped.

With fastq-dump, the number of spots read should equal the total number in the object, which can be viewed with the tool vdb-dump. The total spots emitted plus the spots dropped due to filtering should equal the number of spots read. A base count is not necessarily or easily matched due to the effects of clipping. If the tool is asked to perform no clipping and performs no filtering, then the base counts should match what would be in the SRA object.

To see a summary of what an SRA object contains, run the following command:

vdb-dump --info SRR955798

To see the number of spots, you can execute

vdb-dump --id_range SRR955798

which will show that it occupies 90,716,435 rows (spots), or just look for the output of vdb-dump --info tagged as SEQ which tells the number of rows in the sequence table.

@Homap
Copy link
Author

Homap commented Jul 18, 2018

This is amazing, thank you so much!

So in my example, I have a report that:
Read 90716435 spots for SRR955798
Written 90716435 spots for SRR955798

and using vdb-dump --id_range, we see that there are 90,716,435 spots in the SRA object. This means the download has been done successfully, correct?

Thank you very much again for your great help!

@kwrodarmer
Copy link
Contributor

Yes, exactly. The Read N spots for SRRxxxxxx line should always match the total number of spots reported by vdb-dump and when the output in the Written N spots for SRRxxxxxx line agrees with it, you have the proof you need.

@Homap
Copy link
Author

Homap commented Jul 18, 2018

Perfect! Thank you so so much for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants