Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to download data from EBI #53

Open
ElDeveloper opened this issue May 20, 2020 · 3 comments
Open

Add support to download data from EBI #53

ElDeveloper opened this issue May 20, 2020 · 3 comments

Comments

@ElDeveloper
Copy link

It would be nice if grabseqs supported downloading data from EBI.

@louiejtaylor
Copy link
Owner

louiejtaylor commented May 20, 2020

Hi @ElDeveloper, thanks for the issue! I haven't used EBI before but I think much or all of the data might already be available through SRA.

For example, if I go to the metagenome search page, I see sample IDs that look like SRR#####, DRR#####, ERR######, etc; and project IDs that look like PRJEB######, PRJDB#####, PRJNA######, etc. All of those can be passed to/found by grabseqs sra like so (here using the -l option for sample listing only):

$ grabseqs sra -l PRJDB5400
DRR082486_1.fastq.gz,DRR082486_2.fastq.gz
DRR082487_1.fastq.gz,DRR082487_2.fastq.gz
DRR082488_1.fastq.gz,DRR082488_2.fastq.gz

$ grabseqs sra -l ERS2665588
ERR2750450_1.fastq.gz,ERR2750450_2.fastq.gz

Is that functionality what you're looking for, or have I misunderstood?

@ElDeveloper
Copy link
Author

@louiejtaylor thanks for the suggestion. It used to be the case that SRA and EBI/ENA were fairly out of sync but that may have changed recently. Other than that, I tried the command you suggested and I was able to download all sequences for a small study, but wasn't able to figure out a way to download the sample metadata.

For example, when I run:

grabseqs sra -m metadata.csv -o proj/ ERP020591

The file metadata.csv includes some technical information about the sequences. However, the sample metadata that describes covariates for the study isn't there. For that accession, the metadata can be found here. Does grabseqs support downloading that data?

@ElDeveloper
Copy link
Author

@louiejtaylor friendly ping - is there any way grabseqs could support downloading that additional metadata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants