-
Notifications
You must be signed in to change notification settings - Fork 3
Meme output format
When using the option -o
, PEnG-motif will write out the found motifs in MEME Motif Format.
After a short header, motifs are represented as PWMs and sorted by estimated p-value. The first motif has the highest significance (=lowest negative log p-value)
MOTIF RRGATGASTCAT
letter-probability matrix: alength= 4 w= 12 nsites= 41824 bg_prob= 0.00000000 opt_bg_order= 2 log(Pval)= -5292.73730469
0.33215398 0.06993207 0.34203890 0.25587505
0.31801489 0.08812791 0.34426886 0.24958830
0.26473501 0.09318515 0.47870308 0.16337676
0.58294821 0.11465897 0.30030057 0.00209219
0.00000202 0.00001670 0.00000204 0.99997920
0.00002015 0.00000858 0.99172789 0.00824337
0.99998569 0.00000308 0.00000253 0.00000867
0.01724115 0.51645988 0.42893052 0.03736848
0.00001135 0.00000201 0.00001879 0.99996781
0.03279394 0.96710050 0.00002284 0.00008276
0.99999911 0.00000012 0.00000017 0.00000061
0.00390841 0.29186615 0.18110542 0.52311999
In MEME format each motif consists of three parts which are discussed below
The keyword MOTIF
starts a new motif entry and in PEnG-motif is always followed by the IUPAC representation of the PWM.
MOTIF RRGATGASTCAT
The second line gives additional information about the PWM:
letter-probability matrix: alength= 4 w= 12 nsites= 41824 bg_prob= 0.00000000 opt_bg_order= 2 log(Pval)= -5292.73730469
0.33215398 0.06993207 0.34203890 0.25587505
Depending on the way you ran PEnG-motif you can find these arguments:
-
alength
- size of the base alphabet, always 4 -
w
- length of the motif -
nsites
- estimated number of sites in the data -
bg_prob
- estimated background probability to find the motif by chance -
opt_bg_order
- order of the homogenous Markov Model used for estimating the background probabilities -
log(Pval)
- estimated log p-value of the motif, derived from a Poisson statistic -
zoops_score
- score between 0 and 1, indicating how well the PWM can distinguish true sequences from randomly generated sequences. Low zoops_scores indicates poor model performance.
Please note: For long merged motifs please take the nsites
and log(Pval)
attributes with a grain of salt. PEnG-motif infers these values from the base patterns - which sometimes may not be very accurate.
The lines following the annotation give the PWM for the motif.
Each row is one position in PWM, the four columns represent A, C, G and T respectively.
0.33215398 0.06993207 0.34203890 0.25587505
0.31801489 0.08812791 0.34426886 0.24958830
0.26473501 0.09318515 0.47870308 0.16337676
0.58294821 0.11465897 0.30030057 0.00209219
0.00000202 0.00001670 0.00000204 0.99997920
0.00002015 0.00000858 0.99172789 0.00824337
0.99998569 0.00000308 0.00000253 0.00000867
0.01724115 0.51645988 0.42893052 0.03736848
0.00001135 0.00000201 0.00001879 0.99996781
0.03279394 0.96710050 0.00002284 0.00008276
0.99999911 0.00000012 0.00000017 0.00000061
0.00390841 0.29186615 0.18110542 0.52311999