Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trycycler reconsile could continue evaluating contigs after one fails #47

Open
d4straub opened this issue Nov 11, 2022 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@d4straub
Copy link

Hi there,

first of all, trycycler seems a great tool, thanks for this! Disclaimer: I am using it the first time.

What I really found extremely annoying is that trycycler reconsile stops after the first contig doesnt meet requirements.

For example, I use trycycler reconsile and it stops complaining

#Error: failed to circularise sequence A_tig00000003 for multiple reasons. You
#must either repair this sequence or exclude it and then try running trycycler
#reconcile again.

That is alright, I remove it and try again, however, same problem with the second contig. I get skeptic, but try a third round after removing the second contig. Third contig also cannot be circularized, again an error. I realize that this genome part might be linear (a qucik literature confirms that this might be true), re-add all contigs, add the appropriate command (--linear), run it a forth time and it passes. However:

#Error: some pairwise identities are below the minimum allowed value of 98.0%.
#Please remove offending sequences or lower the --min_identity threshold and try
#again.

Alright, so I remove those bad contigs, and restart it 5th time, but

#Error: some pairwise indels are greater than the maximum allowed value of 1000.
#Please remove offending sequences or raise the --max_indel_size threshold and
#try again.

Again, I remove those contigs and restart 6th time.

Essentially, I am just wondering whether it wouldn't be more effective to have trycycler reconsile continue after it encounters the first "error" but stops with all those error reports in one run. I could have seen immediately that none of the contigs are circular and used --linear instead of running it 3 times. I could have immediately removed contigs with bad pairwise identities and pairwise indels.

Maybe circularisation is required to calculate indentities & indels and it has to stop when circularisation is failing, but at least it could report all contigs that fail to circularize? And maybe pairwise identities and pairwise indels could be another block that fails in one go? That would have left me with 3 instead of 6 runs, much better imho.

Best,
Daniel

@rrwick rrwick added the enhancement New feature or request label Mar 13, 2023
@rrwick
Copy link
Owner

rrwick commented Mar 13, 2023

I agree! Re-running Trycycler reconcile over and over can be time consuming, especially for the cluster of chromosomal contigs (larger sequences take longer to run).

I hope to re-engineer Trycycler's high-friction parts (like this) at some point. So I'll keep this issue open as an enhancement request until then.

Ryan

@rrwick
Copy link
Owner

rrwick commented Oct 20, 2023

I've just pushed (52b8c1f) a small improvement to this issue: trycycler reconcile now attempts to circularise all the contigs before it quits with an error message. So if 3 contigs can't be circularised, it will tell you about all three at once.

I'll leave this issue open because trycycler reconcile is still not as efficient as it could be.

Ryan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants