Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information. Update Jul 26, 2016
protVec_100d_3grams.csv ProtVec: Word2vec for Proteins trained over SWISS-Prot Jul 26, 2016

Continuous Distributed Representation of Biological Sequences for Deep Genomics and Deep Proteomics

We introduce a new representation for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. Biovectors are basically n-gram character skip-gram wordvectors for biological sequences (DNA, RNA, and Protein). In this work, we have explored biophysical and biochemical meaning of this space. In addition, in variety of bioinformatics tasks we have shown the strength of such a sequence representation.

  title={Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics},
  author={Asgari, Ehsaneddin and Mofrad, Mohammad RK},
  journal={PloS one},
  publisher={Public Library of Science}

journal pone 0141287 g002

You can’t perform that action at this time.