Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getfastq --id is currently broken #10

Closed
Hego-CCTB opened this issue Mar 24, 2020 · 6 comments
Closed

getfastq --id is currently broken #10

Hego-CCTB opened this issue Mar 24, 2020 · 6 comments

Comments

@Hego-CCTB
Copy link
Collaborator

when just looking for an SRA ID instead of the metadata.tsv, I get this error:

amalgkit getfastq --threads 8 --id SRR7699519 -e abc@abc.com
amalgkit getfastq: start
pigz found. It will be used for compression/decompression in read name formatting.
--id is specified. Downloading SRA metadata from Entrez.
Traceback (most recent call last):
File "/Users/s229181/anaconda/anaconda3/envs/dev/bin/amalgkit", line 254, in
args.handler(args)
File "/Users/s229181/anaconda/anaconda3/envs/dev/bin/amalgkit", line 31, in command_getfastq
getfastq_main(args)
File "/Users/s229181/anaconda/anaconda3/envs/dev/lib/python3.7/site-packages/amalgkit/getfastq.py", line 517, in getfastq_main
metadata = getfastq_metadata(args)
File "/Users/s229181/anaconda/anaconda3/envs/dev/lib/python3.7/site-packages/amalgkit/getfastq.py", line 477, in getfastq_metadata
search_term = getfastq_search_term(sra_id, args.entrez_additional_search_term)
NameError: name 'sra_id' is not defined

Looking through the code, "sra_id" is only assigned by accessing the metadata.tsv, which is mutually exclusive to --id. I think the easiest solution would be as it was before the "metadata update", by creating a new metadata.tsv with just a single entry and have the rest of the code run as it is right now.

Ah, also we need to add PigZ to the dependencies.

@kfuku52
Copy link
Owner

kfuku52 commented Mar 24, 2020

That sounds the right solution. Could you fix it?

@Hego-CCTB
Copy link
Collaborator Author

sure!

@takaW496
Copy link

I got the same error message when I tried to run getfastq process using bioproject ID in gfe pipeline:

Traceback (most recent call last):
  File "/opt/conda/envs/biotools/bin/amalgkit", line 254, in <module>
    args.handler(args)
  File "/opt/conda/envs/biotools/bin/amalgkit", line 31, in command_getfastq
    getfastq_main(args)
  File "/opt/conda/envs/biotools/lib/python3.7/site-packages/amalgkit/getfastq.py", line 517, in getfastq_main
    metadata = getfastq_metadata(args)
  File "/opt/conda/envs/biotools/lib/python3.7/site-packages/amalgkit/getfastq.py", line 477, in getfastq_metadata
    search_term = getfastq_search_term(sra_id, args.entrez_additional_search_term)
NameError: name 'sra_id' is not defined

@Hego-CCTB did you fix the problem? Could you share the fixed script?

@kfuku52
Copy link
Owner

kfuku52 commented Sep 21, 2020

@Hego-CCTB Are you aware of Taka's question?

@Hego-CCTB
Copy link
Collaborator Author

yes!
I'm looking into it, but failed to make progress so far. My "fix" created a host of other problems, but I hope I can get a working update out soon.

@Hego-CCTB
Copy link
Collaborator Author

@takaW496 Problem should be fixed now. I've also included a --id_list functionality, which can process multiple SRA runs, while --id is reserved for a single run.

--id_list needs a path to a simple text file, where each ID is in a different row.
--id_list does currently only queue the download of each run, but doesn't download them in parallel (this is what I'm looking into next)

I'll close this for now, but feel free to reopen this issue if you encounter any other problems regarding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants