Skip to content

Load fasta files that contain DNA strings and process it for other downstream tasks

License

Notifications You must be signed in to change notification settings

kchu25/FastaLoader.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastaLoader

Stable Dev Build Status Coverage

This is a package that provides subroutines that loads the DNA sequences in the specified fasta file. The DNA sequences are then transformed into some other useful information, e.g. one-hot/WYK encoded vectors, kmer-frequency preserved shuffled sequences, Markov background estimates, partitioned datasets for K-fold cross-validations (for fasta with labels), etc. for downstream machine learning tasks. As of now, we require all sequences in the fasta file to be the same length, and strings must be defined on DNA alphabets {A,C,G,T}.

Usage

Coming Soon

About

Load fasta files that contain DNA strings and process it for other downstream tasks

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages