Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to recognise and download SRA/ENA/GEO ids #68

Merged
merged 8 commits into from
Apr 23, 2020
Merged

Add functionality to recognise and download SRA/ENA/GEO ids #68

merged 8 commits into from
Apr 23, 2020

Conversation

drpatelh
Copy link
Member

nf-core/viralrecon pull request

Many thanks for contributing to nf-core/viralrecon!

Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/viralrecon branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: CONTRIBUTING.md

@heuermh
Copy link
Contributor

heuermh commented Apr 22, 2020

Might you be able to add a commit with this usage.md doc patch to this pr?

$ git diff .
diff --git a/docs/usage.md b/docs/usage.md
index 79580c3..324c1d4 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -235,15 +235,15 @@ AAGGTGTCTGCAATTCATAGCTCT

 ## SRA download

-## `--ignore_sra_errors`
+### `--ignore_sra_errors`

 Ignore validation errors when checking SRA identifiers that would otherwise cause the pipeline to fail (Default: false).

-## `--save_sra_fastq`
+### `--save_sra_fastq`

 Save FastQ files created from SRA identifiers in the results directory (Default: false).

-## `--skip_sra`
+### `--skip_sra`

 Skip steps involving the download and validation of FastQ files using SRA identifiers (Default: false).

@drpatelh
Copy link
Member Author

drpatelh commented Apr 22, 2020

Done 53b5e86
👍 Good spot.

Copy link
Member

@JoseEspinosa JoseEspinosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very nice @drpatelh !
The only thing that I would suggest is that if we want to completely get rid of the parallel-fastq-dump process, we could eventually query the US DB when the ftp is not available at the EBI, as we did before ( https://github.com/JoseEspinosa/viralrecon/blob/460a29f962edc31d7818d63b26b6fad5b98d0cbc/bin/check_samplesheet.py#L34-L35 ). One of the fields returned by the query was download_path. We could use this path to download the data using process SRA_FASTQ_FTP. The only problem could be how to retrieve the md5 checksums since this information is not returned by the query and I was trying to find a way to get it from the NCBI but I couldn't find a way yet.

@drpatelh
Copy link
Member Author

Thanks @JoseEspinosa

Querying the NCBI for the fastq files if they dont exist in the ENA doesnt work for this sample 😏

It returns a path to the SRA file I think 🤔 so would still have to be put through parallel-fastq-dump. Think it would be good to keep that process in the pipeline anyway just in case FTP ends up being too slow and if we need to find another way to fetch the files.

@JoseEspinosa JoseEspinosa merged commit 3f0d1e4 into nf-core:dev Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants