Skip to content
g-e-kenney edited this page Nov 15, 2022 · 12 revisions

Why prettyClusters?

Because there weren't any tools that did quite what I needed them to do:

  • Export diagrams of gene clusters as a vector (not bitmap!) file suitable for figure layout.
  • Export diagrams with multiple gene clusters with the same relative scale.
  • Import gene metadata from the JGI's IMG database, which has many genomes, metagenomes, and so on that are absent from NCBI's NR database (and UniProt), or that are present but poorly annotated in databases like NCBI/WGS.
  • Handle hypothetical and predicted proteins helpfully (i.e. by identifying new groups of hypothetical proteins that frequently appear in the genomic neighborhoods of interest).
  • Integrate into workflows using sequence similarity networks generated via the EFI-EST toolset.
  • Interrogate similarity of genomic neighborhoods without relying on the sequence similarity of genes of interest - I did not want to have to make the assumption that sequence similarity and gene cluster similarity necessarily track, since that's not always a sound assumption.
  • Interrogate genomic neighborhoods without relying on antiSMASH predictions or similarity to known genome neighborhoods - a point of distinction between this tool and BiG-SCAPE.

The prettyClusters toolset

The wiki entries contain a more detailed description of the use of specific functions.

The core toolset

Accessory components

Workflows for prettyClusters

Preparing for use of prettyClusters

Basic workflow

Incorporation of data from other sources (EMBL, GenBank, private data)

  • This is very much a work in progress, and doing this is using prettyClusters in Difficult mode, but I've got a rough workflow for getting this data and doing some pre-processing to get it into standardized GenBank files, and from there, using GenBank files as an input into prettyClusters.