Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOB-typer failed to detect plasmid replicons and relaxases for fasta files with certain headers. #87

Closed
ryotag opened this issue Jun 24, 2021 · 1 comment

Comments

@ryotag
Copy link

ryotag commented Jun 24, 2021

This issue is related to #85, but I opened this new issue because the title of #85 does not reflect the true problem.

I found that MOB-typer failed to detect plasmid replicons and relaxases for fasta files with certain headers.
Here is the file I analyzed. (the file extension is .txt since github does not allow files with .fasta)
KX912253_RC.txt

mob_typer --multi --infile KX912253_RC.fasta --out_file results.tsv returned a tsv file as follows:

sample_id num_contigs size gc md5 rep_type(s) rep_type_accession(s) relaxase_type(s) relaxase_type_accession(s) mpf_type mpf_type_accession(s) orit_type(s) orit_accession(s) predicted_mobility mash_nearest_neighbor mash_neighbor_distance mash_neighbor_identification primary_cluster_id secondary_cluster_id predicted_host_range_overall_rank predicted_host_range_overall_name observed_host_range_ncbi_rank observed_host_range_ncbi_name reported_host_range_lit_rank reported_host_range_lit_name associated_pmid(s)
FRI-2_plasmid_KX912253-RC_Enterobacter_asburiae_strain_H162620587_plasmid_pJF-587__complete_sequence. 1 108672 51.1842977 5bd1577e5eae2824bbb7eb4e9ed6c126 - - - - - - - - non-mobilizable KX912253 0 Enterobacter asburiae AA414 AI467 genus Enterobacter genus Enterobacter - - -

I changed the header of the fasta file from
>FRI-2_plasmid_KX912253-RC Enterobacter asburiae strain H162620587 plasmid pJF-587, complete sequence.
to
>FRI-2_plasmid_KX912253-RC
Now, I got the following results.

sample_id num_contigs size gc md5 rep_type(s) rep_type_accession(s) relaxase_type(s) relaxase_type_accession(s) mpf_type mpf_type_accession(s) orit_type(s) orit_accession(s) predicted_mobility mash_nearest_neighbor mash_neighbor_distance mash_neighbor_identification primary_cluster_id secondary_cluster_id predicted_host_range_overall_rank predicted_host_range_overall_name observed_host_range_ncbi_rank observed_host_range_ncbi_name reported_host_range_lit_rank reported_host_range_lit_name associated_pmid(s)
FRI-2_plasmid_KX912253-RC 1 108672 51.1842977 5bd1577e5eae2824bbb7eb4e9ed6c126 IncFII,IncR CP019890_00139,000207__CP025517 MOBF NC_014107_00160 MPF_F NC_014107_00125,NC_014107_00126,NC_014107_00127,NC_009425_00108,NC_014107_00135,NC_014107_00139,NC_014107_00145,NC_014107_00146,NC_014107_00154,NC_014107_00155,NC_014107_00137,NC_014107_00159 - - conjugative KX912253 0 Enterobacter asburiae AA414 AI467 order Enterobacterales order Enterobacterales family Enterobacteriaceae 20851899; 23711894

The results seem to be quite different, one without any replicons/relaxases and the other with detected replicons/relaxases.
I'm not sure this is a bug or not, but any help/thoughts are appreciated.

Thank you for your time,

@jrober84
Copy link
Collaborator

There seems to be some issues with blast and length of headers. I have implemented a fix in 3.1.0 where all sequences are renamed internally for all of the blast and search calls. Then reported back as the original sequence identifiers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants