-
Notifications
You must be signed in to change notification settings - Fork 2
Home
g-e-kenney edited this page Nov 15, 2022
·
12 revisions
Because there weren't any tools that did quite what I needed them to do:
- Export diagrams of gene clusters as a vector (not bitmap!) file suitable for figure layout.
- Export diagrams with multiple gene clusters with the same relative scale.
- Import gene metadata from the JGI's IMG database, which has many genomes, metagenomes, and so on that are absent from NCBI's NR database (and UniProt), or that are present but poorly annotated in databases like NCBI/WGS.
- Handle hypothetical and predicted proteins helpfully (i.e. by identifying new groups of hypothetical proteins that frequently appear in the genomic neighborhoods of interest).
- Integrate into workflows using sequence similarity networks generated via the EFI-EST toolset.
- Interrogate similarity of genomic neighborhoods without relying on the sequence similarity of genes of interest - I did not want to have to make the assumption that sequence similarity and gene cluster similarity necessarily track, since that's not always a sound assumption.
- Interrogate genomic neighborhoods without relying on antiSMASH predictions or similarity to known genome neighborhoods - a point of distinction between this tool and BiG-SCAPE.
The wiki entries contain a more detailed description of the use of specific functions.
- A basic installation guide is the best starting point.
- A standard run for the
prettyClusters
toolset
- This is very much a work in progress, and doing this is using
prettyClusters
in Difficult mode, but I've got a rough workflow for getting this data and doing some pre-processing to get it into standardized GenBank files, and from there, using GenBank files as an input intoprettyClusters
.