Skip to content

Meme output format

Christian Roth edited this page Jan 3, 2018 · 1 revision

PEnG-motif's meme output

When using the option -o, PEnG-motif will write out the found motifs in MEME Motif Format.

After a short header, motifs are represented as PWMs and sorted by estimated p-value. The first motif has the highest significance (=lowest negative log p-value)

Example of a motif in MEME format

MOTIF RRGATGASTCAT
letter-probability matrix: alength= 4 w= 12 nsites= 41824 bg_prob= 0.00000000 opt_bg_order= 2 log(Pval)= -5292.73730469
0.33215398 0.06993207 0.34203890 0.25587505
0.31801489 0.08812791 0.34426886 0.24958830
0.26473501 0.09318515 0.47870308 0.16337676
0.58294821 0.11465897 0.30030057 0.00209219
0.00000202 0.00001670 0.00000204 0.99997920
0.00002015 0.00000858 0.99172789 0.00824337
0.99998569 0.00000308 0.00000253 0.00000867
0.01724115 0.51645988 0.42893052 0.03736848
0.00001135 0.00000201 0.00001879 0.99996781
0.03279394 0.96710050 0.00002284 0.00008276
0.99999911 0.00000012 0.00000017 0.00000061
0.00390841 0.29186615 0.18110542 0.52311999

Detailed description of the motif representation in PEnG-motif

In MEME format each motif consists of three parts which are discussed below

1) Motif ID

The keyword MOTIF starts a new motif entry and in PEnG-motif is always followed by the IUPAC representation of the PWM.

MOTIF RRGATGASTCAT

2) Additional motif annotation

The second line gives additional information about the PWM:

letter-probability matrix: alength= 4 w= 12 nsites= 41824 bg_prob= 0.00000000 opt_bg_order= 2 log(Pval)= -5292.73730469
0.33215398 0.06993207 0.34203890 0.25587505

Depending on the way you ran PEnG-motif you can find these arguments:

  • alength - size of the base alphabet, always 4
  • w - length of the motif
  • nsites - estimated number of sites in the data
  • bg_prob - estimated background probability to find the motif by chance
  • opt_bg_order - order of the homogenous Markov Model used for estimating the background probabilities
  • log(Pval) - estimated log p-value of the motif, derived from a Poisson statistic
  • zoops_score - score between 0 and 1, indicating how well the PWM can distinguish true sequences from randomly generated sequences. Low zoops_scores indicates poor model performance.

Please note: For long merged motifs please take the nsites and log(Pval) attributes with a grain of salt. PEnG-motif infers these values from the base patterns - which sometimes may not be very accurate.

3) PWM

The lines following the annotation give the PWM for the motif.

Each row is one position in PWM, the four columns represent A, C, G and T respectively.

0.33215398 0.06993207 0.34203890 0.25587505
0.31801489 0.08812791 0.34426886 0.24958830
0.26473501 0.09318515 0.47870308 0.16337676
0.58294821 0.11465897 0.30030057 0.00209219
0.00000202 0.00001670 0.00000204 0.99997920
0.00002015 0.00000858 0.99172789 0.00824337
0.99998569 0.00000308 0.00000253 0.00000867
0.01724115 0.51645988 0.42893052 0.03736848
0.00001135 0.00000201 0.00001879 0.99996781
0.03279394 0.96710050 0.00002284 0.00008276
0.99999911 0.00000012 0.00000017 0.00000061
0.00390841 0.29186615 0.18110542 0.52311999