Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding custom sequences to MOB-SUIT database #20

Closed
ajkarloss opened this issue Mar 25, 2019 · 1 comment
Closed

Adding custom sequences to MOB-SUIT database #20

ajkarloss opened this issue Mar 25, 2019 · 1 comment

Comments

@ajkarloss
Copy link

Hi, We want to add our own sequences to MOB-SUIT database. How to do that?

@kbessonov1984
Copy link
Collaborator

kbessonov1984 commented Mar 25, 2019

Hi, Jeevan,

Adding of the new sequences to the existing database can be done via the mob_cluster utility. There are two modes that one can use. For fastest results we recommend the build mode. The overall procedure consists of adding new sequences to the existing reference sequences, running mob_cluster and optionally replacing previous reference file ncbi_plasmid_full_seqs.fas with the new version or providing custom paths to the new plasmid reference database. Note that the reference plasmid sequences are located in the installation folder of the mob_suite (e.g. I have mob_suite_test conda environment and my reference plasmid sequences are located in the /Users/kirill/miniconda/envs/mob_suite_test/lib/python3.6/site-packages/mob_suite/databases/ directory).

Here is the list of sample commands to run:

  1. Copy existing database in your working temporary directory
    cp <...>/databases/ncbi_plasmid_full_seqs.fas .
  2. Append your new plasmid sequences to ncbi_plasmid_full_seqs.fas or to your custom list of plasmid references in FASTA format with each sequence having unique title line (e.g. >HQ451074.1)
  3. Run mob_cluster on the new reference plasmid sequences file (e.g. mob_cluster -m build -i ncbi_plasmid_full_seqs.fas --num_threads 3 -o newdb)
  4. Overwrite existing database at default location or provide path to the custom plasmid database directory.
    • Option 1: Copy new plasmid database references_updated.fasta and its mash sketch file to the databases folder (e.g. cp references_updated.fasta <...>/databases/ncbi_plasmid_full_seqs.fas and cp ncbi_plasmid_full_seqs.fas.msh <...>/databases/ncbi_plasmid_full_seqs.fas.msh
    • Option 2: Do not alter original database. Instead, every mob_typer or mob_recon run refer to the updated custom database via either-d or --plasmid_db and --plasmid_mash_db parameters.

Note 1: <...> refers to the mob-suite installation directory that can be inferred by running which mob_cluster command
Note 2: In case of an error, reference databases can always be downloaded from https://share.corefacility.ca/index.php/s/oeufkw5HyKz0X5I/download or https://ndownloader.figshare.com/articles/5841882/versions/1 mirrors

The output of the mob_cluster command provides newly created database (references_updated.fasta) where each sequence entry is represented by the accession number and cluster id (>accession|clusterid). Please note that cluster reference numbers are volatile and are dependant on your input dataset. E.g. for a set of 3 plasmid sequences there were a total of 3 clusters in clusters.txt file at 0.05 and 0.0001 mash distance thresholds:

id            0.05     0.0001
HQ451074.1      1       1
AB093554        2       2
AB109805        3       3

Please let us know if you have any further questions or issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants