Skip to content

A comprehensive evaluation of embedding models for RNAs with analysis.

License

Notifications You must be signed in to change notification settings

semolnahali/BioSeqVec

Repository files navigation

BioSeqVec

RNA Sequence Representation Learning With the continuous advancement of high-throughput sequencing technology, the abundance of biological sequence data containing vital life-related information has substantially increased. These sequences have become indispensable tools for disease detection, mechanistic analysis, and drug discovery, with their significance notably highlighted during the COVID-19 pandemic. Concurrently, recent advancements in Natural Language Processing (NLP) have ignited substantial interest in representation learning across diverse domains.

Efforts are underway to enhance downstream performance while mitigating training costs. To comprehensively comprehend biological sequence data, representation learning has emerged as a pivotal approach for embedding and extracting features from extensive collections of biological sequences. This readme presents an overview of the code accompanying our survey paper on representation learning techniques and models designed for RNA sequence data.

Key Features of the Code Embedding Task: The code addresses the embedding task related to RNA sequences, outlining challenges such as scalability, dimensionality choices, feature preservation, and potential solutions.

Model Categorization: We systematically categorize and summarize popular models employed for handling biological sequences.

State-of-the-Art Evaluation: The code includes methods for evaluating state-of-the-art models, providing insights into their performance.

Applications and Future Directions: We discuss potential applications and future research directions in the field of biological sequence representation learning.

BioSeqVec Library: To support further research in this domain, we introduce an open-source Python library, BioSeqVec. This library consolidates all the algorithms discussed in our survey paper, offering a unified interface for researchers.

For detailed information and instructions on using this code, please refer to the documentation provided in the respective sections. Feel free to explore and utilize this codebase for your research and practical applications in RNA sequence representation learning.

Note: If you use this code or find it useful for your work, please consider citing our original survey paper for reference.

Paper Reference:

About

A comprehensive evaluation of embedding models for RNAs with analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages