Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

Continuous Distributed Representation of Biological Sequences for Deep Genomics and Deep Proteomics

Update: More recent model trained over UniRef50 can be downloaded from the following link, July 2020.

wget http://deepbio.info/uniref_embeddings.zip

We introduce a new representation for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. Biovectors are basically n-gram character skip-gram wordvectors for biological sequences (DNA, RNA, and Protein). In this work, we have explored biophysical and biochemical meaning of this space. In addition, in variety of bioinformatics tasks we have shown the strength of such a sequence representation.

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141287

@article{asgari2015continuous,
  title={Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics},
  author={Asgari, Ehsaneddin and Mofrad, Mohammad RK},
  journal={PloS one},
  volume={10},
  number={11},
  pages={e0141287},
  year={2015},
  publisher={Public Library of Science}
}

journal pone 0141287 g002

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published