-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert rule spades_assembly to a python script #201
Conversation
Do the integration tests all succeed? |
Yes, the integration tests all succeed except one. This is relevant to test_batch_recovery, where long read file is given, but long read type is "none". In my script (aviary/modules/assembly/scripts/spades_assembly.py 63-65), I raised a ValueError for this scenario whereas in the earlier rule this would be defaulted to pacbio. Is there any reason to set the default type as pacbio? I believe nanopore is more common. |
I also have another concern about the tmpdir. In test_short_read_recovery_queue_submission, the tmpdir I got from qsub-logs is still /tmp, which is the $TMPDIR in the local env where I ran aviary. Is this correct? If I were to run bin chicken, would the aviary command be running on the remote node? |
This doesn't seem to make a whole lot of sense. The default long read type in the CLI is If you don't have long reads, then this rule shouldn't even be getting run |
tmp_dir_arg = "" | ||
|
||
if os.path.exists("data/spades_assembly/tmp"): | ||
subprocess.call("rm -rf data/spades_assembly/tmp",shell=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be using subprocess.call
with shell=True
, it isn't safe. I know it is what snakemake does when shell commands are run, but we can do better. Refer to other python scripts on how to avoid using shell=True
. run
is a safe alternative, run_flye.py
has a proper implementation
# run cmd | ||
with open(log, 'a') as logf: | ||
logf.write(f"Queueing command {command}\n") | ||
subprocess.run(command.split(), stdout=logf, stderr=subprocess.STDOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've done it correctly here actually, so you should use this method elsewhere
if long_read_type in ["ont","ont_hq"]: | ||
command = f"spades.py --checkpoints all --memory {max_memory} --meta --nanopore {input_long_reads} --12 {input_fastq} "\ | ||
f"-o data/spades_assembly -t {threads} -k {kmer_sizes} {tmp_dir_arg} " | ||
elif long_read_type == "pacbio": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you looking for "pacbio", we should use "pacbio" mode when the user supplies "rs", "sq", "ccs", or "hifi". "pacbio" isn't even an option?
This should really just be an else statement as it was in the original script.
f"-o data/spades_assembly -t {threads} -k {kmer_sizes} {tmp_dir_arg} " | ||
else: | ||
# raise error | ||
raise ValueError(f"Invalid long read type: {long_read_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be raising a value error here, just fall back to pacbio.
I am sorry I didn't explain this well. I just thought it would be nice to have more sanity check. A good example is the test case for batch submission (see test/data/example_batch.tsv). Although I can resubmit the pull request without the sanity check if you think it's not nessesary or I can correct the current version with the correct pacbio codes. And thanks for all the comments, they are very helpful. |
The choices for long-read-type are restricted by the cli ( Line 481 in 7832126
|
I think I understand what Yibi is saying, but I don't exactly understand how the |
But batch mode doesn't seem to have this restriction. |
@rhysnewell maybe you can try to run the test case and figure it out? |
My vote would be to add checks for batch mode, rather than worry at each stage if the variable is garbage. |
@YibiChen I've pinpointed the issue here and it's an easy fix, but it will go in a separate PR. Just remove your pacbio check from your script and respond to the other comments, thanks |
The integration tests all succeed. Let me know if you spot anything else. |
No description provided.