Skip to content

eead-csic-compbio/EEADannot

Repository files navigation

EEADannot

This repository holds the scripts and data files used for the manual curation of DNA motifs and cis regulatory sites, mostly from plants, that are eventually added as a separate library/collection in the database footprintDB, also available at RSAT::Plants, which are part of the INB/ELIXIR-ES resources portfolio.

Citation

Contreras-Moreira B, Sebastian A. FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces. Methods Mol Biol. 2016; 1482:259-77. doi: 10.1007/978-1-4939-6396-6_17. PMID: 27557773.

Example motif in TRANSFAC format

MO  EREBP1
NA  Os02g54160.1
P0      A      C      G      T
01   1.00   0.00   0.00   0.00
02   0.00   0.00   1.00   0.00
03   0.00   1.00   0.00   0.00
04   0.50   0.50   0.00   0.00
05   0.00   0.00   1.00   0.00
06   0.00   0.50   0.50   0.00
07   0.00   1.00   0.00   0.00
RX  PUBMED:23703395
RL  Serra TS et al (2013) OsRMC, a negative regulator of salt stress response in rice...Plant Mol Biol 82(4-5): 439-455
RX  PUBMED:12913152
RL  Cheong YH et al (2003) BWMK1, a rice mitogen-activated protein kinase... Plant Physiol 132(4):1961-72
XX
FA  EREBP1
NA  Ethylene-responsive transcription factor 1; EREBP1; OsEREBP1; ERF1_ORYSJ; Q6K7E6; Q9SE28
SQ  MCGGAIIHHLKGHPEGSRRATEGLLWPEKKKPRWGGGGRRHFGGFVEEDDEDFEADFEEFEVDSGDSDLELGEEDDDDVVEI...
OS  Oryza sativa
CC  family:AP2/ERF
XX
SI  EREBP1_1
SQ  gAGCCGCCa
XX
SI  EREBP1_2
SQ  gAGCAGGCa
XX
//

Production steps

  • Add new TFs to TFsequences.faa

    • Make sure 'FullName' has no blanks.
  • Add new motifs to PWM.tab.

    • In name line, 1st word is motif name [no spaces], 2nd is 1+ TF names separated by commas [,].

    • Make sure separators among weights/columns are TABs.

    • To convert MEME/HOMER motifs you can use a one liner such as:

      perl -ane 'next if(/^>/ || /^#/); $f++; for $c (1 .. @F){ $data[$f][$c]=$F[$c-1] }; $maxc=@F if(@F>$maxc); END{ for $c (1 .. $maxc){ for $ff (1 .. $f){ printf("%1.3f\t",$data[$ff][$c]) } print "\n"} }' motif.meme

    • If you need to convert a PNG logo check https://www.biostars.org/p/9537076/#9607522 followed by:

      cat enoLogo.txt | perl -lne 'if(/^([ACGT])/){ s/^([ACGT])/$1:/; tr/AT/TA/; s/(\d+.\d{3})\d+/$1/g; push(@pwm,"$_\n")} END{ print sort(@pwm)}'

  • Add individual sites, if any, to sites.tab.

  • Add new papers to references.tab

    • Make sure 1st field matches a motif name in PWM.tab.
    • Second field is one or more TF name/primary key separated by commas [,].
    • Third field is PubMed id.
  • Actually format the library in footprintDB format (will try to use Plant-TFClass to assign TFs to families):

    $ perl create_library4footprintdb.pl

About

Manual curation of selected DNA motifs and cis regulatory sites, mostly from plants

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages