getfastq --id is currently broken #10

Hego-CCTB · 2020-03-24T11:45:22Z

when just looking for an SRA ID instead of the metadata.tsv, I get this error:

amalgkit getfastq --threads 8 --id SRR7699519 -e abc@abc.com
amalgkit getfastq: start
pigz found. It will be used for compression/decompression in read name formatting.
--id is specified. Downloading SRA metadata from Entrez.
Traceback (most recent call last):
File "/Users/s229181/anaconda/anaconda3/envs/dev/bin/amalgkit", line 254, in
args.handler(args)
File "/Users/s229181/anaconda/anaconda3/envs/dev/bin/amalgkit", line 31, in command_getfastq
getfastq_main(args)
File "/Users/s229181/anaconda/anaconda3/envs/dev/lib/python3.7/site-packages/amalgkit/getfastq.py", line 517, in getfastq_main
metadata = getfastq_metadata(args)
File "/Users/s229181/anaconda/anaconda3/envs/dev/lib/python3.7/site-packages/amalgkit/getfastq.py", line 477, in getfastq_metadata
search_term = getfastq_search_term(sra_id, args.entrez_additional_search_term)
NameError: name 'sra_id' is not defined

Looking through the code, "sra_id" is only assigned by accessing the metadata.tsv, which is mutually exclusive to --id. I think the easiest solution would be as it was before the "metadata update", by creating a new metadata.tsv with just a single entry and have the rest of the code run as it is right now.

Ah, also we need to add PigZ to the dependencies.

The text was updated successfully, but these errors were encountered:

kfuku52 · 2020-03-24T12:10:14Z

That sounds the right solution. Could you fix it?

Hego-CCTB · 2020-03-24T12:50:13Z

sure!

takaW496 · 2020-09-18T11:35:50Z

I got the same error message when I tried to run getfastq process using bioproject ID in gfe pipeline:

Traceback (most recent call last):
  File "/opt/conda/envs/biotools/bin/amalgkit", line 254, in <module>
    args.handler(args)
  File "/opt/conda/envs/biotools/bin/amalgkit", line 31, in command_getfastq
    getfastq_main(args)
  File "/opt/conda/envs/biotools/lib/python3.7/site-packages/amalgkit/getfastq.py", line 517, in getfastq_main
    metadata = getfastq_metadata(args)
  File "/opt/conda/envs/biotools/lib/python3.7/site-packages/amalgkit/getfastq.py", line 477, in getfastq_metadata
    search_term = getfastq_search_term(sra_id, args.entrez_additional_search_term)
NameError: name 'sra_id' is not defined

@Hego-CCTB did you fix the problem? Could you share the fixed script?

kfuku52 · 2020-09-21T14:01:30Z

@Hego-CCTB Are you aware of Taka's question?

Hego-CCTB · 2020-09-21T14:38:05Z

yes!
I'm looking into it, but failed to make progress so far. My "fix" created a host of other problems, but I hope I can get a working update out soon.

Hego-CCTB · 2020-09-24T13:51:38Z

@takaW496 Problem should be fixed now. I've also included a --id_list functionality, which can process multiple SRA runs, while --id is reserved for a single run.

--id_list needs a path to a simple text file, where each ID is in a different row.
--id_list does currently only queue the download of each run, but doesn't download them in parallel (this is what I'm looking into next)

I'll close this for now, but feel free to reopen this issue if you encounter any other problems regarding this.

Hego-CCTB closed this as completed Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getfastq --id is currently broken #10

getfastq --id is currently broken #10

Hego-CCTB commented Mar 24, 2020

kfuku52 commented Mar 24, 2020

Hego-CCTB commented Mar 24, 2020

takaW496 commented Sep 18, 2020

kfuku52 commented Sep 21, 2020

Hego-CCTB commented Sep 21, 2020

Hego-CCTB commented Sep 24, 2020

getfastq --id is currently broken #10

getfastq --id is currently broken #10

Comments

Hego-CCTB commented Mar 24, 2020

kfuku52 commented Mar 24, 2020

Hego-CCTB commented Mar 24, 2020

takaW496 commented Sep 18, 2020

kfuku52 commented Sep 21, 2020

Hego-CCTB commented Sep 21, 2020

Hego-CCTB commented Sep 24, 2020