Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Non-redundant TF motif matches genome-wide

Below describes the general workflow for clustering motif models to remove redundancy, generating an "archetype" motif, and then finally, performing genome-wide scans of motifs and remvoval of redundancy. Please note that this documentation is incomplete, as should be used a a rough guide only.

If you are looking for the final results (motif clusters, genome-wide scans and a browser shot) please see the following website:

Note: The above link is for version 1.0 of the motif clusters. While we have yet to update the website, version 2.0-beta (latest) can be found at

Contact me at jvierstra (at) with any questions/requests/comments.



Quick start

Step 1: Download and preproces motifs

See runall script in each motif database directory (databases/*)

Step 1: Compute pair-wise motif similarity

Here we use TOMTOM to determine the similarity between all motif models (all pairwise) with the following code:

meme2meme databases/*/*.meme > tomtom/

tomtom \
	-bfile /net/seq/data/projects/motifs/hg19.K36.mappable_only.5-order.markov \
	-dist kullback \
	-motif-pseudo 0.1 \
	-text \
	-min-overlap 1 \
	tomtom/ tomtom/ \
> tomtom/tomtom.all.txt

I have a provided a script that will load this operation up on a SLURM parallel compute cluster (see bin/runall.tomtom.v2.0beta-human for an example)

Step 2: Hierarchically cluster motifs

After running TOMTOM, open up the provided Jupyter Notebook to perform the clustering and visualization

We perform hierarchical clustering (distance: correlation, complete linkage) from the TOMTOM similarity E-values. Below is a heatmap representation of motifs clustered by simililarity and clusters identified cutting the dendrogram at height 0.7.

Clustered heatmap cut at height 0.7

Step 3: Process each cluster to build a motif archetype

Again, inside the notebook there is code that will process and visualize each motif cluster.

AC0002 (homeodomain) AC0240 (CCAAT-box)
AC0002 AC0240

Step 4: Make HTML output

Run the BASH script bin/runall.make-html to generate an HTML webpage (index.html) in the results directory

Step 5: Scan genome using all motif models (individually and archetype)

I use the software package MOODS to find motif matches genome-wide. Its a great tool and that I highly reccomend. See bin/runall.scan_models for an example of how to do this on a SLURM cluster.

Step 5: Create working and browser tracks

To create a bigBed file from a bed9+4, we need to include an AutoSql file (

table hg38_motifs_collapsed
"Collapsed motifs matches in hg38 (see:"
string  chrom;        "Reference sequence chromosome or scaffold"
uint    chromStart;    "Start position of feature on chromosome"
uint    chromEnd;    "End position of feature on chromosome"
string  name;        "Name of motif"
uint    score;        "Score"
char[1] strand;        "+ or - for strand"
uint    thickStart;    "Coding region start"
uint    thickEnd;    "Coding region end"
uint      reserved;    "itemRgb"

Make the tracks for the archetypes

bedToBigBed -type=bed9+4 -tab moods.combined.all.bed chrom.sizes
awk -v OFS="\t" '{ print $1, $2, $3, $4, $11, $6, $10, $13}' moods.combined.all.bed | bgzip -c > moods.combined.all.bed.gz
tabix -p bed moods.combined.all.bed.gz

Make the tracks for the full motif scans.

fetchChromSizes hg38 > /tmp/chrom.sizes
awk -v OFS="\t" '{ print $1, 0, $2; }' /tmp/chrom.sizes | sort-bed - > /tmp/chrom.sizes.bed
bedops -e 100% moods.combined.all.bed /tmp/chrom.sizes.bed \
| awk -v OFS="\t" '{ print $1, $2, $3, $4, 0, $6, $2, $3, "0,0,0", $5, $7 }' > /tmp/moods
bedToBigBed -type=bed9+2 -tab /tmp/moods chrom.sizes