Add functionality to recognise and download SRA/ENA/GEO ids #68

drpatelh · 2020-04-22T14:06:58Z

nf-core/viralrecon pull request

Many thanks for contributing to nf-core/viralrecon!

Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).

PR checklist

This comment contains a description of changes (with reason)
If you've fixed a bug or added code that should be tested, add tests!
If necessary, also make a PR on the nf-core/viralrecon branch on the nf-core/test-datasets repo
Ensure the test suite passes (nextflow run . -profile test,docker).
Make sure your code lints (nf-core lint .).
Documentation in docs is updated
CHANGELOG.md is updated
README.md is updated

Learn more about contributing: CONTRIBUTING.md

heuermh · 2020-04-22T16:30:15Z

Might you be able to add a commit with this usage.md doc patch to this pr?

$ git diff .
diff --git a/docs/usage.md b/docs/usage.md
index 79580c3..324c1d4 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -235,15 +235,15 @@ AAGGTGTCTGCAATTCATAGCTCT

 ## SRA download

-## `--ignore_sra_errors`
+### `--ignore_sra_errors`

 Ignore validation errors when checking SRA identifiers that would otherwise cause the pipeline to fail (Default: false).

-## `--save_sra_fastq`
+### `--save_sra_fastq`

 Save FastQ files created from SRA identifiers in the results directory (Default: false).

-## `--skip_sra`
+### `--skip_sra`

 Skip steps involving the download and validation of FastQ files using SRA identifiers (Default: false).

drpatelh · 2020-04-22T16:35:33Z

Done 53b5e86
👍 Good spot.

JoseEspinosa

It looks very nice @drpatelh !
The only thing that I would suggest is that if we want to completely get rid of the parallel-fastq-dump process, we could eventually query the US DB when the ftp is not available at the EBI, as we did before ( https://github.com/JoseEspinosa/viralrecon/blob/460a29f962edc31d7818d63b26b6fad5b98d0cbc/bin/check_samplesheet.py#L34-L35 ). One of the fields returned by the query was download_path. We could use this path to download the data using process SRA_FASTQ_FTP. The only problem could be how to retrieve the md5 checksums since this information is not returned by the query and I was trying to find a way to get it from the NCBI but I couldn't find a way yet.

drpatelh · 2020-04-23T09:12:00Z

Thanks @JoseEspinosa

Querying the NCBI for the fastq files if they dont exist in the ENA doesnt work for this sample 😏

It returns a path to the SRA file I think 🤔 so would still have to be put through parallel-fastq-dump. Think it would be good to keep that process in the pipeline anyway just in case FTP ends up being too slow and if we need to find another way to fetch the files.

drpatelh added 5 commits April 22, 2020 15:02

Add ENA and GEO download capability

8495fba

Add aspera_openssh_file param

c0feea9

Add aspera_openssh_file param

cea2402

Add aspera_openssh_file param

d089b77

Strip out anything to do with aspera before I get arrested

efa910d

drpatelh requested a review from JoseEspinosa April 22, 2020 15:05

drpatelh mentioned this pull request Apr 22, 2020

Auto-detect multiple runs of the same sample on SRA #56

Closed

Fix header levels

53b5e86

drpatelh added 2 commits April 22, 2020 20:02

Merge branch 'dev' of https://github.com/nf-core/viralrecon into dev

5c84e49

Merge branch 'dev' of https://github.com/nf-core/viralrecon into dev

487df03

JoseEspinosa approved these changes Apr 23, 2020

View reviewed changes

JoseEspinosa merged commit 3f0d1e4 into nf-core:dev Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to recognise and download SRA/ENA/GEO ids #68

Add functionality to recognise and download SRA/ENA/GEO ids #68

drpatelh commented Apr 22, 2020

heuermh commented Apr 22, 2020

drpatelh commented Apr 22, 2020 •

edited

Loading

JoseEspinosa left a comment

drpatelh commented Apr 23, 2020

Add functionality to recognise and download SRA/ENA/GEO ids #68

Add functionality to recognise and download SRA/ENA/GEO ids #68

Conversation

drpatelh commented Apr 22, 2020

nf-core/viralrecon pull request

PR checklist

heuermh commented Apr 22, 2020

drpatelh commented Apr 22, 2020 • edited Loading

JoseEspinosa left a comment

Choose a reason for hiding this comment

drpatelh commented Apr 23, 2020

drpatelh commented Apr 22, 2020 •

edited

Loading