New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap / Todos #3

Open
johanneskoester opened this Issue Feb 22, 2015 · 58 comments

Comments

Projects
None yet
@johanneskoester
Contributor

johanneskoester commented Feb 22, 2015

This is a continuously updated list of Todos. If you have a suggestion, comment to the issue. I will update the list if the suggestion fits in.
Please note that none of the items will be guaranteed to be implemented. If you want to make sure something is done, consider to contribute the code to Rust-Bio, any help is welcome! Of course you will be listed as one of the authors.

If you want to implement an item, please post it in this thread and I will mark it in the list so that we don't duplicate work.

Changes to current code

  • make SAIS implementation generic over the used alphabet data type and dynamically choose the right type depending on the number of characters. This will further reduce memory usage.
  • investigate no_mangle flag to allow easy integration of Rust-Bio into e.g. Python via ctypes.

New code

  • add indexed FASTA reader based on the seek method of current fasta reader and https://github.com/BurntSushi/rust-csv
  • allow to call current pairwise alignment implementation without traceback for distance computation (hamming, edit, affine)
  • add common score matrices (e.g. PAM250, BLOSUM62 for amino acids) _[@rilut]*
  • add (gapped) q-gram/kmer implementation for counting and index building
  • add bed reader
  • add gff3 reader
  • add gtf/gff !=3 reader based on https://github.com/BurntSushi/rust-csv [@natir]
  • Hidden-Markov-Models (Viterby, Baum-Welch, ...)
  • add Myers Algorithm for approximate pattern matching
  • basic NGS statistics (qual distribution, nucleotide distribution)?
  • add amino acid alphabet to bio::alphabets
  • add multiple sequence alignment computation (partial order alignment for now) [@bnbowman]
  • provide an interval tree, segment tree or clustertree implementation
  • add parallelization examples
  • Add speed benchmarks for all algorithms (right now we have them for pattern matching and SAIS).
  • Add mate-pair FASTQ reader (using the zip method).
  • SDP algorithms for sparse alignment [@bnbowman]
  • Add tabix interface to Rust-Htslib
  • de Bruijn Graph implementation and related algorithms [@rob-p]
  • Overlap/String graphs
  • graph genome support (see work by Veli Mäkkinen, Richard Durbin et al.)
  • BCALM algorithm (@bryceperkins)
  • rewrite gff/gtf parser (and other parsers) using nom in order to support comments (see #115). Look at the ideas of BioJulia.

@johanneskoester johanneskoester changed the title from TODO to Roadmap / Todos Feb 22, 2015

@vadimnazarov

This comment has been minimized.

Show comment
Hide comment
@vadimnazarov

vadimnazarov Feb 22, 2015

Member

Hello! I've been thinking what could be useful for generic library for bioinformatics tasks. This is what I thought:

  1. Add to your "HMM" point Baum-Welch and Viterbi algorithms implementation just to clarify what should be done in particular.
  2. Add simple hamming and levenshtein distance, so they could be effortlessly called with just one function like "hamming(seq1, seq2);".
    2.1. Or/and more general - add a data structure for fast search of sequences with specified number of indels and mismatches like this one http://mitcr.milaboratory.com/site/apidocs/index.html?com/milaboratory/core/sequence/tree/SequenceTreeMap.html
    2.2. Add efficient by space implementation of pairwise alignment.
  3. Add various BLOSUM??? and other matrices for alignments.
  4. Add wrappers and parsers of output for various command line tools (like bwa or clustalw), on-line services (BLAST) and databases (NCBI).
  5. Various phylogenetic stuff, but, sadly, I know nothing about it.
  6. Add functions for basic NGS data processing - read trimming, quality filtering, parsing of NGS .fastq mate files (and mate pairing). Everything for such grinding tasks, just to give user ability to concentrate on research not only on processing huge NGS data by hands.
    6.1. Basic statistics for NGS data like quality distributions, nucleotide-by-position distributions and others.
  7. Add functions for kmer generating, statistics like most frequent kmers.
  8. Add functions for sequence profiles like simple profiles (by position, like in [6]), probably HMM profiles.
  9. Parallelisation for all. Because Rust. (:

I think that it would be a good idea to see what other general bioinformatics libraries like BioPython have implemented. I hope I will be able to check this in a couple of days and add more suggestions and/or specify what I wrote higher.

Also, probably the project's name "Rust-Bio, a rusty bioinformatics library" is a bit un-easy to understand at first glance. Googling "rust for bioinformatics" don't give you a link to this repo, and I think that that search query is a more frequent one than other possible queries. May I suggest as a title something like "Rust-Bio, Rust for Bioinformatics", and after that, in description, "Rust module for general bioinformatics tasks"? Probably, "BioRust" would be the best possible name because of various other libraries like BioPython, BioJava and BioRuby, but it has been already claimed: https://github.com/hirokai/BioRust and https://github.com/biorust .

Feel free to comment and criticise my suggestions, I'd be glad to discuss todos.

Member

vadimnazarov commented Feb 22, 2015

Hello! I've been thinking what could be useful for generic library for bioinformatics tasks. This is what I thought:

  1. Add to your "HMM" point Baum-Welch and Viterbi algorithms implementation just to clarify what should be done in particular.
  2. Add simple hamming and levenshtein distance, so they could be effortlessly called with just one function like "hamming(seq1, seq2);".
    2.1. Or/and more general - add a data structure for fast search of sequences with specified number of indels and mismatches like this one http://mitcr.milaboratory.com/site/apidocs/index.html?com/milaboratory/core/sequence/tree/SequenceTreeMap.html
    2.2. Add efficient by space implementation of pairwise alignment.
  3. Add various BLOSUM??? and other matrices for alignments.
  4. Add wrappers and parsers of output for various command line tools (like bwa or clustalw), on-line services (BLAST) and databases (NCBI).
  5. Various phylogenetic stuff, but, sadly, I know nothing about it.
  6. Add functions for basic NGS data processing - read trimming, quality filtering, parsing of NGS .fastq mate files (and mate pairing). Everything for such grinding tasks, just to give user ability to concentrate on research not only on processing huge NGS data by hands.
    6.1. Basic statistics for NGS data like quality distributions, nucleotide-by-position distributions and others.
  7. Add functions for kmer generating, statistics like most frequent kmers.
  8. Add functions for sequence profiles like simple profiles (by position, like in [6]), probably HMM profiles.
  9. Parallelisation for all. Because Rust. (:

I think that it would be a good idea to see what other general bioinformatics libraries like BioPython have implemented. I hope I will be able to check this in a couple of days and add more suggestions and/or specify what I wrote higher.

Also, probably the project's name "Rust-Bio, a rusty bioinformatics library" is a bit un-easy to understand at first glance. Googling "rust for bioinformatics" don't give you a link to this repo, and I think that that search query is a more frequent one than other possible queries. May I suggest as a title something like "Rust-Bio, Rust for Bioinformatics", and after that, in description, "Rust module for general bioinformatics tasks"? Probably, "BioRust" would be the best possible name because of various other libraries like BioPython, BioJava and BioRuby, but it has been already claimed: https://github.com/hirokai/BioRust and https://github.com/biorust .

Feel free to comment and criticise my suggestions, I'd be glad to discuss todos.

@parir

This comment has been minimized.

Show comment
Hide comment
@parir

parir Feb 22, 2015

Contributor

Some more suggestions:

  • Add multiple alignments to bio::alignment.
  • Add the amino acid alphabet to bio::alphabets.
  • Some common AA substitution matrices like PAM250 or BLOSUM62 would be nice! :)
Contributor

parir commented Feb 22, 2015

Some more suggestions:

  • Add multiple alignments to bio::alignment.
  • Add the amino acid alphabet to bio::alphabets.
  • Some common AA substitution matrices like PAM250 or BLOSUM62 would be nice! :)
@vadimnazarov

This comment has been minimized.

Show comment
Hide comment
@vadimnazarov

vadimnazarov Feb 22, 2015

Member

@parir Thanks! (: What algorithms for multiple alignments do you think would be the most useful?

Member

vadimnazarov commented Feb 22, 2015

@parir Thanks! (: What algorithms for multiple alignments do you think would be the most useful?

@parir

This comment has been minimized.

Show comment
Hide comment
@parir

parir Feb 22, 2015

Contributor

I agree with @vd-n. It would be good to look into the other Bio[programming language of your choice] frameworks to figure out common API design principles.

Contributor

parir commented Feb 22, 2015

I agree with @vd-n. It would be good to look into the other Bio[programming language of your choice] frameworks to figure out common API design principles.

@parir

This comment has been minimized.

Show comment
Hide comment
@parir

parir Feb 22, 2015

Contributor

@vd-n Tough question! I haven't worked with sequence alignments in the last 3-4 years, so I don't know the current state of the art algorithms. Back then Muscle, MAFFT and ProbCons where supposed to be pretty good. ClustalW could also be included, because it's the classic one.

Contributor

parir commented Feb 22, 2015

@vd-n Tough question! I haven't worked with sequence alignments in the last 3-4 years, so I don't know the current state of the art algorithms. Back then Muscle, MAFFT and ProbCons where supposed to be pretty good. ClustalW could also be included, because it's the classic one.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Feb 22, 2015

Contributor

Thank you, guys!
I have included most of your suggestions above.
Regarding parallelization, I would like to leave it out of the API right now, and let the user decide by invoking functions within threads. However, there might be cases where this does not make sense, like multiple alignments. We can see how to update the paradigm later then.

Contributor

johanneskoester commented Feb 22, 2015

Thank you, guys!
I have included most of your suggestions above.
Regarding parallelization, I would like to leave it out of the API right now, and let the user decide by invoking functions within threads. However, there might be cases where this does not make sense, like multiple alignments. We can see how to update the paradigm later then.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jul 14, 2015

Something similar to bx-pythons clustertrees and intervaltrees, perhaps? Simple, but indispensable...

ghost commented Jul 14, 2015

Something similar to bx-pythons clustertrees and intervaltrees, perhaps? Simple, but indispensable...

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jul 15, 2015

Contributor

Very good idea, I have added it to the list.

Contributor

johanneskoester commented Jul 15, 2015

Very good idea, I have added it to the list.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jul 16, 2015

An extremely common use case I had in Python was turning a bed file into a dict of cluster or intervaltrees, where the keys were chromosomes or chromosome/strand tuples. Might be nice to have.

Of course this is trivial to code up oneself - is Rust-bio more about implementing the difficult stuff or also having convenience functions?

ghost commented Jul 16, 2015

An extremely common use case I had in Python was turning a bed file into a dict of cluster or intervaltrees, where the keys were chromosomes or chromosome/strand tuples. Might be nice to have.

Of course this is trivial to code up oneself - is Rust-bio more about implementing the difficult stuff or also having convenience functions?

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jul 16, 2015

Contributor

The key idea is to have a toolbox for anything that reoccurs when doing bioinformatics in Rust. In this sense, your suggestion fits perfectly.

Contributor

johanneskoester commented Jul 16, 2015

The key idea is to have a toolbox for anything that reoccurs when doing bioinformatics in Rust. In this sense, your suggestion fits perfectly.

@vadimnazarov

This comment has been minimized.

Show comment
Hide comment
@vadimnazarov

vadimnazarov Aug 4, 2015

Member

Hi everyone,
i have a couple of questions and suggestions regarding Rust-Bio.

  1. do I need to somehow "take" a task and say it somewhere so no one else will take it and therefore no parallel work will be done?
  2. after careful thinking I think that implementing multiple alignment algorithms, clones of bowtie, bwa-mem, BLAST or other widely used software isn't very practical. I think the best way would be is to have an interface to run this software tools with passed arguments and parse the output of them. What is your opinion?
  3. GHMM seems to be dead. https://github.com/dichodaemon/ghmm Also it doesn't have profile-HMM, which are also very useful. But profile-HMM has been implemented here: http://hmmer.janelia.org .
  4. suggestion: speed benchmarks in comparing to other Bio* software packages. It would be great if Rust-Bio would have a greater speed than most or even all of them.
  5. suggestion: a point by point comparison of Rust-Bio to other Bio* software packages.
  6. suggestion: a couple of clustering algorithms, e.g., hierarchical clustering, means.
  7. I think if Rust-Bio is a general bioinformatics library, then it would great if it would have a rust-htslib as a dependency, so user will get all necessary stuff at once, and won't be loading endless additional packages.
  8. suggestion: transform alphabets into singletons and / or give them simple aliases (macro), because this:
let alphabet = alphabets::dna::iupac_alphabet();
let pos = suffix_array(text);
let bwt = bwt(text, &pos);
let fmindex = FMIndex::new(&bwt, 3, &alphabet);

is much complicated for a medium user and takes more time to write and analyse than this:

let pos = suffix_array(text);
let bwt = bwt(text, &pos);
let fmindex = FMIndex::new(&bwt, 3, dna_iupac!);
Member

vadimnazarov commented Aug 4, 2015

Hi everyone,
i have a couple of questions and suggestions regarding Rust-Bio.

  1. do I need to somehow "take" a task and say it somewhere so no one else will take it and therefore no parallel work will be done?
  2. after careful thinking I think that implementing multiple alignment algorithms, clones of bowtie, bwa-mem, BLAST or other widely used software isn't very practical. I think the best way would be is to have an interface to run this software tools with passed arguments and parse the output of them. What is your opinion?
  3. GHMM seems to be dead. https://github.com/dichodaemon/ghmm Also it doesn't have profile-HMM, which are also very useful. But profile-HMM has been implemented here: http://hmmer.janelia.org .
  4. suggestion: speed benchmarks in comparing to other Bio* software packages. It would be great if Rust-Bio would have a greater speed than most or even all of them.
  5. suggestion: a point by point comparison of Rust-Bio to other Bio* software packages.
  6. suggestion: a couple of clustering algorithms, e.g., hierarchical clustering, means.
  7. I think if Rust-Bio is a general bioinformatics library, then it would great if it would have a rust-htslib as a dependency, so user will get all necessary stuff at once, and won't be loading endless additional packages.
  8. suggestion: transform alphabets into singletons and / or give them simple aliases (macro), because this:
let alphabet = alphabets::dna::iupac_alphabet();
let pos = suffix_array(text);
let bwt = bwt(text, &pos);
let fmindex = FMIndex::new(&bwt, 3, &alphabet);

is much complicated for a medium user and takes more time to write and analyse than this:

let pos = suffix_array(text);
let bwt = bwt(text, &pos);
let fmindex = FMIndex::new(&bwt, 3, dna_iupac!);
@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Aug 4, 2015

Contributor

Hi,

thanks for the useful suggestions! Let me answer piece by piece:

  1. Good question. I think it would be best to post it here (maybe together with a link to your fork) and I will format to TODO list at the top accordingly.
  2. The alignment algorithms provided in Rust-Bio are intended for tasks where you need more control (or have different questions) than with e.g. a traditional read mapper and want to use plain rust. If someone just wants to do read mapping or multiple alignment, he can just use the proper tool. From within rust, there is already the very convenient command API, so no need to provide any special interface in Rust-Bio.
  3. I actually meant this library: http://sourceforge.net/projects/ghmm/, which is a different one I think. It is just a plain lib. Not so much activity, but it seems to be just a feature complete very general project.
  4. Yes, we definetly need some speed benchmarks.
  5. Clustering maybe is better done in a separate statistics/machine learning library which will surely show up for Rust, soon. Further, these kinds of analyses are better done in a scripting language like Python in my opinion. Rust-Bio is more for the low-level stuff where performance matters.
  6. This is an interesting point. On the other hand, one could say that a user who uses Rust-Bio for a project which has nothing to do with BAM should not be required to have that being pulled into his dependencies by Rust-Bio. Especially, since Rust-Htslib relies on compiling C code, which is an additional source of failure.
  7. Interesting idea. So far I had the impression that singletons are not really encouraged in Rust. Do you have a real code example for implementing singletons from other Rust projects?
Contributor

johanneskoester commented Aug 4, 2015

Hi,

thanks for the useful suggestions! Let me answer piece by piece:

  1. Good question. I think it would be best to post it here (maybe together with a link to your fork) and I will format to TODO list at the top accordingly.
  2. The alignment algorithms provided in Rust-Bio are intended for tasks where you need more control (or have different questions) than with e.g. a traditional read mapper and want to use plain rust. If someone just wants to do read mapping or multiple alignment, he can just use the proper tool. From within rust, there is already the very convenient command API, so no need to provide any special interface in Rust-Bio.
  3. I actually meant this library: http://sourceforge.net/projects/ghmm/, which is a different one I think. It is just a plain lib. Not so much activity, but it seems to be just a feature complete very general project.
  4. Yes, we definetly need some speed benchmarks.
  5. Clustering maybe is better done in a separate statistics/machine learning library which will surely show up for Rust, soon. Further, these kinds of analyses are better done in a scripting language like Python in my opinion. Rust-Bio is more for the low-level stuff where performance matters.
  6. This is an interesting point. On the other hand, one could say that a user who uses Rust-Bio for a project which has nothing to do with BAM should not be required to have that being pulled into his dependencies by Rust-Bio. Especially, since Rust-Htslib relies on compiling C code, which is an additional source of failure.
  7. Interesting idea. So far I had the impression that singletons are not really encouraged in Rust. Do you have a real code example for implementing singletons from other Rust projects?
@vadimnazarov

This comment has been minimized.

Show comment
Hide comment
@vadimnazarov

vadimnazarov Aug 13, 2015

Member

Oops, please pardon me for the late response!

  1. Ok, that's nice! I think it would be good to make a very clear note on that on the very top of the first message in this issue.
  2. Sorry, I think i didn't get it, could you point me to this API?
  3. Ah, I thought that on github it was just a clone of this repo.
  4. Ok!
  5. (at this point you have responded to my point "6" of the my original message)
    5*. Hierarchical clustering and k-means are quite simple algorithms (to implement) yet very useful. I think I totally understand your point, but from the point of view of a user I would really like to simply call something like hclust(gene_mat_dist) within Rust-Bio, e.g., if I'm doing gene expression analysis, and don't bother with external packages. Or if I want to cluster NGS reads with k-means as a preprocessing step for the further analysis of clusters. For the more complicated analysis and algorithms for sure external packages are needed.
  6. Ok, I agree, also I didn't know about dependencies. A bit off-topic: is it possible to make Rust-only htslib? Or are there some major limitations / drawbacks of doing this?
  7. I haven't seen singletons in Rust but I think they could be implemented using "unsafe code". Honestly, here I talked about singletons just as an example. I think it could be done just by copying the alphabet object every time you call it, because I don't think that it will generate a lot of overhead.
Member

vadimnazarov commented Aug 13, 2015

Oops, please pardon me for the late response!

  1. Ok, that's nice! I think it would be good to make a very clear note on that on the very top of the first message in this issue.
  2. Sorry, I think i didn't get it, could you point me to this API?
  3. Ah, I thought that on github it was just a clone of this repo.
  4. Ok!
  5. (at this point you have responded to my point "6" of the my original message)
    5*. Hierarchical clustering and k-means are quite simple algorithms (to implement) yet very useful. I think I totally understand your point, but from the point of view of a user I would really like to simply call something like hclust(gene_mat_dist) within Rust-Bio, e.g., if I'm doing gene expression analysis, and don't bother with external packages. Or if I want to cluster NGS reads with k-means as a preprocessing step for the further analysis of clusters. For the more complicated analysis and algorithms for sure external packages are needed.
  6. Ok, I agree, also I didn't know about dependencies. A bit off-topic: is it possible to make Rust-only htslib? Or are there some major limitations / drawbacks of doing this?
  7. I haven't seen singletons in Rust but I think they could be implemented using "unsafe code". Honestly, here I talked about singletons just as an example. I think it could be done just by copying the alphabet object every time you call it, because I don't think that it will generate a lot of overhead.
@vadimnazarov

This comment has been minimized.

Show comment
Hide comment
@vadimnazarov

vadimnazarov Aug 13, 2015

Member

Two more suggestions and one clarification to the previous one:

  • a FASTQ reader to read pair-end / mate-pair files. This could be a simple wrapper around two FASTQ readers, that will match reads' accessions because sometimes orders of reads are different between two FASTQ files.
  • add a SAM file reader.
  • add Rust speed benchmarks to the code (cargo bench)

Also I would like to take this task: allow to call current pairwise alignment implementation without traceback for distance computation (hamming, edit, affine). I think I will also try to implement Rust benchmarks of this and of the existed alignment algorithms.

Member

vadimnazarov commented Aug 13, 2015

Two more suggestions and one clarification to the previous one:

  • a FASTQ reader to read pair-end / mate-pair files. This could be a simple wrapper around two FASTQ readers, that will match reads' accessions because sometimes orders of reads are different between two FASTQ files.
  • add a SAM file reader.
  • add Rust speed benchmarks to the code (cargo bench)

Also I would like to take this task: allow to call current pairwise alignment implementation without traceback for distance computation (hamming, edit, affine). I think I will also try to implement Rust benchmarks of this and of the existed alignment algorithms.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Aug 14, 2015

Contributor

Regarding 2:

The API I mean is this one:
http://doc.rust-lang.org/nightly/std/process/struct.Command.html

We have benchmarks now in the README.

Regarding 5:

I agree, that should be easy to do. But if anybody wants to implement that, I would rather suggest him to do that in a separate library. Most importantly, these clustering methods are useful for other fields as well, so why hiding them in a bioinformatics library. Depending on an additional library with Cargo is extremely easy, and does not make it more complicated for the user.

Regarding 6:

Yes, implementing BAM/CRAM/BCF in Rust only is surely possible. However, these custom binary formats are really complicated stuff. It would require quite some work. If anyboday wants to try that, I would be happy to include the code or have it as an auxilliary project within Rust-Bio. But at the moment, I would prefer to focus on stuff that is not already possible by using htslib.

Regarding 7:

There certainly is some overhead, so that you don't want to reinitialize e.g. in an inner loop.
Actually I have seen a macro based approach for static non-primitive variables somewhere. We can always provide that method at a later stage, delegating to the current object oriented implementation.

Regarding your second post

  • interesting idea. So in principle, this is just the zip method you could use. Some error checking with the read names is also nice.
  • SAM format should be supported by rust-htslib already. If not, it is certainly only one or two lines that need to be changed in the BAM reader, because htslib has a unified object for reading SAM/BAM/CRAM I think
  • We have benchmarks for suffix arrays and pattern matching, but I agree it would be nice to have them for most of the other stuff as well.

I have added your suggestions to the TODO list.

Contributor

johanneskoester commented Aug 14, 2015

Regarding 2:

The API I mean is this one:
http://doc.rust-lang.org/nightly/std/process/struct.Command.html

We have benchmarks now in the README.

Regarding 5:

I agree, that should be easy to do. But if anybody wants to implement that, I would rather suggest him to do that in a separate library. Most importantly, these clustering methods are useful for other fields as well, so why hiding them in a bioinformatics library. Depending on an additional library with Cargo is extremely easy, and does not make it more complicated for the user.

Regarding 6:

Yes, implementing BAM/CRAM/BCF in Rust only is surely possible. However, these custom binary formats are really complicated stuff. It would require quite some work. If anyboday wants to try that, I would be happy to include the code or have it as an auxilliary project within Rust-Bio. But at the moment, I would prefer to focus on stuff that is not already possible by using htslib.

Regarding 7:

There certainly is some overhead, so that you don't want to reinitialize e.g. in an inner loop.
Actually I have seen a macro based approach for static non-primitive variables somewhere. We can always provide that method at a later stage, delegating to the current object oriented implementation.

Regarding your second post

  • interesting idea. So in principle, this is just the zip method you could use. Some error checking with the read names is also nice.
  • SAM format should be supported by rust-htslib already. If not, it is certainly only one or two lines that need to be changed in the BAM reader, because htslib has a unified object for reading SAM/BAM/CRAM I think
  • We have benchmarks for suffix arrays and pattern matching, but I agree it would be nice to have them for most of the other stuff as well.

I have added your suggestions to the TODO list.

@vadimnazarov

This comment has been minimized.

Show comment
Hide comment
@vadimnazarov

vadimnazarov Aug 15, 2015

Member

Ok, thank you for your response! One more little suggestion: make rust-bio.github.io repository with all documentation so it can be available at "https://rust-bio.github.io".

Member

vadimnazarov commented Aug 15, 2015

Ok, thank you for your response! One more little suggestion: make rust-bio.github.io repository with all documentation so it can be available at "https://rust-bio.github.io".

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Aug 15, 2015

Contributor

Good idea. I have done that as suggested.

Contributor

johanneskoester commented Aug 15, 2015

Good idea. I have done that as suggested.

@rilut

This comment has been minimized.

Show comment
Hide comment
@rilut

rilut Oct 22, 2015

Contributor

Hello, I'm interested in rust-bio, but I have no experiences with Rust. I only experienced with Biopython. Any suggestion?
Can I work on the BLOSUM?

Contributor

rilut commented Oct 22, 2015

Hello, I'm interested in rust-bio, but I have no experiences with Rust. I only experienced with Biopython. Any suggestion?
Can I work on the BLOSUM?

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Oct 22, 2015

Contributor

@rilut sure feel free to fork and try it out. I will mark it in the list.

Contributor

johanneskoester commented Oct 22, 2015

@rilut sure feel free to fork and try it out. I will mark it in the list.

@bnbowman

This comment has been minimized.

Show comment
Hide comment
@bnbowman

bnbowman Nov 28, 2015

I'd also suggest SDP algorithms for sparse alignment - generally handy things tools, especially when working with genomic assemblies or noisy long-read technologies.

And much like @rilut , I'm fairly new to Rust and looking to learn more and help out. Will poke around a bit and see if anything jumps out at me.

bnbowman commented Nov 28, 2015

I'd also suggest SDP algorithms for sparse alignment - generally handy things tools, especially when working with genomic assemblies or noisy long-read technologies.

And much like @rilut , I'm fairly new to Rust and looking to learn more and help out. Will poke around a bit and see if anything jumps out at me.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Nov 29, 2015

Contributor

Thanks! I have added sparse alignment to the list! Looking forward to any contribution!

Contributor

johanneskoester commented Nov 29, 2015

Thanks! I have added sparse alignment to the list! Looking forward to any contribution!

@brentp

This comment has been minimized.

Show comment
Hide comment
@brentp

brentp Dec 17, 2015

This is more for rust-htslib, but it'd be nice to have an interface to the tabix stuff and potentially .csi.

brentp commented Dec 17, 2015

This is more for rust-htslib, but it'd be nice to have an interface to the tabix stuff and potentially .csi.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Dec 18, 2015

Contributor

That's a good point, thanks!

Contributor

johanneskoester commented Dec 18, 2015

That's a good point, thanks!

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Feb 22, 2016

Greetings!

I have just finished the basic workings of a transcriptome translator. The repo is available here: https://github.com/rundrop1/transcriptome_translator

It currently generates all possible Amino Acid sequences from Nucleotide sequences. Support for RNA is coming today. Currently getting output is hacky as the way I did it yesterday was to capture stdout by executing through vim which requires a bit of trimming at the top of the file but otherwise works temporarily. (Note that if you do it this way you will get a valid fasta file)

I didn't look before making the parser so I've implemented a FASTA parser with nom. The parser currently is only able to deserialize FASTA for further use. It also requires rust-phf which currently requires a specific nightly. I also utilize memmap for loading the file. The file is not mutated during processing and while the code is unsafe by memory safety standards but in practice it is usable if you are sure your file will not be modified before the translator has a chance to complete.

I'd like to know if the translator is of any use to your crate and how I could get it into a more usable state for acceptance into this toolkit. I'd also like to add features like automatically BLASTing results and am open to implementing others ideas as well. I plan on adding features and otherwise maintaining 'transcriptome_translation' well into the future.

ghost commented Feb 22, 2016

Greetings!

I have just finished the basic workings of a transcriptome translator. The repo is available here: https://github.com/rundrop1/transcriptome_translator

It currently generates all possible Amino Acid sequences from Nucleotide sequences. Support for RNA is coming today. Currently getting output is hacky as the way I did it yesterday was to capture stdout by executing through vim which requires a bit of trimming at the top of the file but otherwise works temporarily. (Note that if you do it this way you will get a valid fasta file)

I didn't look before making the parser so I've implemented a FASTA parser with nom. The parser currently is only able to deserialize FASTA for further use. It also requires rust-phf which currently requires a specific nightly. I also utilize memmap for loading the file. The file is not mutated during processing and while the code is unsafe by memory safety standards but in practice it is usable if you are sure your file will not be modified before the translator has a chance to complete.

I'd like to know if the translator is of any use to your crate and how I could get it into a more usable state for acceptance into this toolkit. I'd also like to add features like automatically BLASTing results and am open to implementing others ideas as well. I plan on adding features and otherwise maintaining 'transcriptome_translation' well into the future.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Feb 23, 2016

Contributor

Hi!
Well, thanks for considering Rust-Bio. Your project sounds very interesting. Since Rust-Bio is an algorithm library and not a command line utility, it cannot go into it directly. However, I invite you to contribute any code that is of general use as a data structure or algorithm. This way, we can maintain it as a community and you can use it in your tool without having duplicate effort. So, can you maybe elaborate a bit on decomposable parts of your code base? E.g. I could immagine that the plain amino acid translation could be a function in Rust-Bio.

Contributor

johanneskoester commented Feb 23, 2016

Hi!
Well, thanks for considering Rust-Bio. Your project sounds very interesting. Since Rust-Bio is an algorithm library and not a command line utility, it cannot go into it directly. However, I invite you to contribute any code that is of general use as a data structure or algorithm. This way, we can maintain it as a community and you can use it in your tool without having duplicate effort. So, can you maybe elaborate a bit on decomposable parts of your code base? E.g. I could immagine that the plain amino acid translation could be a function in Rust-Bio.

@anderspitman

This comment has been minimized.

Show comment
Hide comment
@anderspitman

anderspitman Feb 25, 2016

Hey @johanneskoester, I'd be happy to implement a gtf/gff reader.

anderspitman commented Feb 25, 2016

Hey @johanneskoester, I'd be happy to implement a gtf/gff reader.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Feb 25, 2016

Contributor

Great, thank you. I have added you to the list!

Contributor

johanneskoester commented Feb 25, 2016

Great, thank you. I have added you to the list!

@anderspitman

This comment has been minimized.

Show comment
Hide comment
@anderspitman

anderspitman Feb 26, 2016

Great! Did you have any specific preferences for the implementation? I was just planning on starting with making it as similar as possible to what you have for FASTA. I'm still relatively new to Rust so It'll probably be fairly iterative if you don't mind giving input while I learn best practices.

anderspitman commented Feb 26, 2016

Great! Did you have any specific preferences for the implementation? I was just planning on starting with making it as similar as possible to what you have for FASTA. I'm still relatively new to Rust so It'll probably be fairly iterative if you don't mind giving input while I learn best practices.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Feb 26, 2016

@anderspitman I've just created #rust-bio on irc.mozilla.org if you care to join me I can help answer your questions regarding rust. It seems that the reader is not too complicated and I can walk you through how to do some of the Rusty things. My implementation is verbose but easily readable with quite a few comments to explain the exact steps taken. I am not a bio-informatics professional but I essentially freelance for labs at a local university when computation can be applied to the questions they are trying to answer.

@johanneskoester Would parallelizing the serializing and deserialzing of various formats be enough of a "killer feature" to potentially convert the existing parsers to utilize the external crate nom? There are a lot of good things about staying within std but nom and rayon could allow us to support machines with many cores. Can be done with std::mpsc::channels; I will do it this way instead so it will be more inline with how things are already done.

ghost commented Feb 26, 2016

@anderspitman I've just created #rust-bio on irc.mozilla.org if you care to join me I can help answer your questions regarding rust. It seems that the reader is not too complicated and I can walk you through how to do some of the Rusty things. My implementation is verbose but easily readable with quite a few comments to explain the exact steps taken. I am not a bio-informatics professional but I essentially freelance for labs at a local university when computation can be applied to the questions they are trying to answer.

@johanneskoester Would parallelizing the serializing and deserialzing of various formats be enough of a "killer feature" to potentially convert the existing parsers to utilize the external crate nom? There are a lot of good things about staying within std but nom and rayon could allow us to support machines with many cores. Can be done with std::mpsc::channels; I will do it this way instead so it will be more inline with how things are already done.

@natir

This comment has been minimized.

Show comment
Hide comment
@natir

natir May 20, 2016

Contributor

Hi, after stolen gff parser to @anderspitman, code just support gff3 actually, I need produce more little bit of code for gff2 and gtf2 support.

But maybe I can work one the basic NGS statistics, in same time.

Contributor

natir commented May 20, 2016

Hi, after stolen gff parser to @anderspitman, code just support gff3 actually, I need produce more little bit of code for gff2 and gtf2 support.

But maybe I can work one the basic NGS statistics, in same time.

@anderspitman

This comment has been minimized.

Show comment
Hide comment
@anderspitman

anderspitman May 20, 2016

@natir no problem on stealing ;) I haven't had time to start on this. Hopefully I'll be able to jump in soon

anderspitman commented May 20, 2016

@natir no problem on stealing ;) I haven't had time to start on this. Hopefully I'll be able to jump in soon

@bnbowman

This comment has been minimized.

Show comment
Hide comment
@bnbowman

bnbowman May 24, 2016

Just left a pull request with an IntervalTree implementation, and I'm starting to mess around with some SDP algorithms...

bnbowman commented May 24, 2016

Just left a pull request with an IntervalTree implementation, and I'm starting to mess around with some SDP algorithms...

@rob-p

This comment has been minimized.

Show comment
Hide comment
@rob-p

rob-p May 31, 2016

Hi all,

I'm very excited by this project! I currently have almost no experience in Rust (I do almost all of my development in C++11/14), but I'm very interested in learning Rust, and potentially using it in some future projects. One data structure I'd be very interested in seeing added to this crate is a de Bruijn graph. This is obviously a widely used data structure in Bioinformatics, and I don't know of an existing implementation in Rust. It would be very cool to have, in this package, a "canonical" implementation that people interested in doing dBG stuff with Rust could use. Is this something that others might be interested in?

rob-p commented May 31, 2016

Hi all,

I'm very excited by this project! I currently have almost no experience in Rust (I do almost all of my development in C++11/14), but I'm very interested in learning Rust, and potentially using it in some future projects. One data structure I'd be very interested in seeing added to this crate is a de Bruijn graph. This is obviously a widely used data structure in Bioinformatics, and I don't know of an existing implementation in Rust. It would be very cool to have, in this package, a "canonical" implementation that people interested in doing dBG stuff with Rust could use. Is this something that others might be interested in?

@natir

This comment has been minimized.

Show comment
Hide comment
@natir

natir Jun 1, 2016

Contributor

Hi @rob-p,

I make the same reflection yesterday, a de Brujin graph could be a very good feature for rust-bio. Maybe you can see the a very efficient de Brujin graph implementation in GATB library.

Contributor

natir commented Jun 1, 2016

Hi @rob-p,

I make the same reflection yesterday, a de Brujin graph could be a very good feature for rust-bio. Maybe you can see the a very efficient de Brujin graph implementation in GATB library.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jun 1, 2016

Contributor

Indeed DBG would be nice. Feel free to contribute an implementation!

Contributor

johanneskoester commented Jun 1, 2016

Indeed DBG would be nice. Feel free to contribute an implementation!

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jul 21, 2016

Contributor

@rust-bio/contributors I have started a selection of command line utilities based on Rust-Bio: https://github.com/rust-bio/rust-bio-tools. This is not intended as a replacement to other tools. It should only fill gaps, where efficient or convenient tools for certain tasks in bioinformatics are not yet available. Feel free to contribute if you have some small tasks that can be nicely handled by Rust-Bio.

Contributor

johanneskoester commented Jul 21, 2016

@rust-bio/contributors I have started a selection of command line utilities based on Rust-Bio: https://github.com/rust-bio/rust-bio-tools. This is not intended as a replacement to other tools. It should only fill gaps, where efficient or convenient tools for certain tasks in bioinformatics are not yet available. Feel free to contribute if you have some small tasks that can be nicely handled by Rust-Bio.

@natir

This comment has been minimized.

Show comment
Hide comment
@natir

natir Nov 15, 2016

Contributor

Hello all,

Maybe I can work on Overlap/String Graph but what is the exact mind of this goal.

I think rewrite a graph API isn't good idea, crates petgraph do the job.

So I can create Node, Edge struct and parser for each assembly graph format. I can begin this job with GFA format. It's ok for all ?

Contributor

natir commented Nov 15, 2016

Hello all,

Maybe I can work on Overlap/String Graph but what is the exact mind of this goal.

I think rewrite a graph API isn't good idea, crates petgraph do the job.

So I can create Node, Edge struct and parser for each assembly graph format. I can begin this job with GFA format. It's ok for all ?

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Nov 15, 2016

Contributor

@natir, that sounds perfect, thanks!

Contributor

johanneskoester commented Nov 15, 2016

@natir, that sounds perfect, thanks!

@bryceperkins

This comment has been minimized.

Show comment
Hide comment
@bryceperkins

bryceperkins Nov 29, 2016

Hey all, I will be working on Rust implementation of the BCALM algorithm using these two repos as guidance:
https://github.com/rchikhi/python-bcalm
https://github.com/Malfoy/bcalm

For experienced rust users: Do you see any roadblocks or issues that I should watch for? Or do you have any tips or references that can help speed up this process?

bryceperkins commented Nov 29, 2016

Hey all, I will be working on Rust implementation of the BCALM algorithm using these two repos as guidance:
https://github.com/rchikhi/python-bcalm
https://github.com/Malfoy/bcalm

For experienced rust users: Do you see any roadblocks or issues that I should watch for? Or do you have any tips or references that can help speed up this process?

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Nov 30, 2016

Contributor

Thanks @bryceperkins! The API is quite stable, and there are no major issues at the moment.

Contributor

johanneskoester commented Nov 30, 2016

Thanks @bryceperkins! The API is quite stable, and there are no major issues at the moment.

@RHagenson

This comment has been minimized.

Show comment
Hide comment
@RHagenson

RHagenson Jun 5, 2017

Contributor

What are the thoughts on adding an AppVeyor build to check the library's build status on Windows systems? A template for doing this already exists at: https://github.com/japaric/trust This repo also includes a .travis.yml template for building on both Linux and MacOS, but that may muddy the waters on a TravisCI build failure being on Linux or MacOS.

Contributor

RHagenson commented Jun 5, 2017

What are the thoughts on adding an AppVeyor build to check the library's build status on Windows systems? A template for doing this already exists at: https://github.com/japaric/trust This repo also includes a .travis.yml template for building on both Linux and MacOS, but that may muddy the waters on a TravisCI build failure being on Linux or MacOS.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jun 6, 2017

Contributor

@RHagenson I don't think this is needed. Rust-Bio is pure Rust, hence it should work the same on all OSes.

Contributor

johanneskoester commented Jun 6, 2017

@RHagenson I don't think this is needed. Rust-Bio is pure Rust, hence it should work the same on all OSes.

@RHagenson

This comment has been minimized.

Show comment
Hide comment
@RHagenson

RHagenson Jun 6, 2017

Contributor
Contributor

RHagenson commented Jun 6, 2017

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jun 7, 2017

Contributor

So, if there is a difference, it would be a bug in Rust I think. I am hesitating to activate Windows builds, because, to be fair we would also have to activate osx builds. This is 3x the build load we have now, for almost no benefit, since it is very unlikely to be a difference. I am also thinking about handling natures resources responsibly here. What we could do is a weekly test of the master branch.

Contributor

johanneskoester commented Jun 7, 2017

So, if there is a difference, it would be a bug in Rust I think. I am hesitating to activate Windows builds, because, to be fair we would also have to activate osx builds. This is 3x the build load we have now, for almost no benefit, since it is very unlikely to be a difference. I am also thinking about handling natures resources responsibly here. What we could do is a weekly test of the master branch.

@RHagenson

This comment has been minimized.

Show comment
Hide comment
@RHagenson

RHagenson Jun 7, 2017

Contributor

I can appreciate the conservative approach and it certainly is a bit excessive to launch three builds with each update to master. I am not sure how we would automate a weekly Windows build test though. Do you know of a way to do this? I am not seeing anything to space builds out by a week with AppVeyor.

Contributor

RHagenson commented Jun 7, 2017

I can appreciate the conservative approach and it certainly is a bit excessive to launch three builds with each update to master. I am not sure how we would automate a weekly Windows build test though. Do you know of a way to do this? I am not seeing anything to space builds out by a week with AppVeyor.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jun 9, 2017

Contributor

If it is not yet possible, we could ask for a weekly build feature. No need to rush, windows builds can wait.

Contributor

johanneskoester commented Jun 9, 2017

If it is not yet possible, we could ask for a weekly build feature. No need to rush, windows builds can wait.

@ghuls

This comment has been minimized.

Show comment
Hide comment
@ghuls

ghuls Jul 27, 2017

BioJulia project seems to have good ideas that could be copied.

Search for BioJulia on https://julialang.org/blog/ for some interesting idea (like how to store FASTA sequences efficiently in memory).

BioJulia also has proper parsers for different bioinformatic formats which could be useful to look at (when reimplementing parsers with nom):

The parsing framework in Bio.jl is unique amongst bioinformatics libraries. Rather than hand-written parsers, we use grammar specifications that are compiled into parser code using a tool called ragel.

https://homes.cs.washington.edu/~dcjones/biojl/parsing.html

Some examples: https://github.com/BioJulia/GenomicFeatures.jl/tree/master/src/

ghuls commented Jul 27, 2017

BioJulia project seems to have good ideas that could be copied.

Search for BioJulia on https://julialang.org/blog/ for some interesting idea (like how to store FASTA sequences efficiently in memory).

BioJulia also has proper parsers for different bioinformatic formats which could be useful to look at (when reimplementing parsers with nom):

The parsing framework in Bio.jl is unique amongst bioinformatics libraries. Rather than hand-written parsers, we use grammar specifications that are compiled into parser code using a tool called ragel.

https://homes.cs.washington.edu/~dcjones/biojl/parsing.html

Some examples: https://github.com/BioJulia/GenomicFeatures.jl/tree/master/src/

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jul 28, 2017

Contributor

Thanks! This is certainly something to consider!

Contributor

johanneskoester commented Jul 28, 2017

Thanks! This is certainly something to consider!

@bnbowman

This comment has been minimized.

Show comment
Hide comment
@bnbowman

bnbowman Dec 22, 2017

Has anyone written a Partial-Order Aligner in Rust? I was thinking of trying my hand at it over the break, but I'd hate to duplicate effort if someone has already started

bnbowman commented Dec 22, 2017

Has anyone written a Partial-Order Aligner in Rust? I was thinking of trying my hand at it over the break, but I'd hate to duplicate effort if someone has already started

@bnbowman

This comment has been minimized.

Show comment
Hide comment
@bnbowman

bnbowman Dec 31, 2017

In fact, I want a partial-order aligner so much I, err, already wrote one:
https://github.com/rust-bio/rust-bio/tree/partial-order-aligner

So I guess I volunteer for that?

bnbowman commented Dec 31, 2017

In fact, I want a partial-order aligner so much I, err, already wrote one:
https://github.com/rust-bio/rust-bio/tree/partial-order-aligner

So I guess I volunteer for that?

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Jan 3, 2018

Contributor

Great!!

Contributor

johanneskoester commented Jan 3, 2018

Great!!

@wizofe

This comment has been minimized.

Show comment
Hide comment
@wizofe

wizofe May 13, 2018

@johanneskoester I just introduced myself here #27 :) Do you have any recommendations for a short and sweet project? That should take a day or couple, just to discover your code base and see what's going on!

wizofe commented May 13, 2018

@johanneskoester I just introduced myself here #27 :) Do you have any recommendations for a short and sweet project? That should take a day or couple, just to discover your code base and see what's going on!

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester May 14, 2018

Contributor

Hi and welcome @wizofe! Unfortunately, all the currently open todos are rather big. However, feel free to propose any textbook algorithm that is still missing. Another possibility is to write a file parser for a format that is not yet available in rust-bio or rust-htslib.

Contributor

johanneskoester commented May 14, 2018

Hi and welcome @wizofe! Unfortunately, all the currently open todos are rather big. However, feel free to propose any textbook algorithm that is still missing. Another possibility is to write a file parser for a format that is not yet available in rust-bio or rust-htslib.

@brianjimenez

This comment has been minimized.

Show comment
Hide comment
@brianjimenez

brianjimenez Aug 16, 2018

Hi! Just introduced myself in the separate thread, I'd like to know which are the plans to include PDB format parsing, especially for proteins.

brianjimenez commented Aug 16, 2018

Hi! Just introduced myself in the separate thread, I'd like to know which are the plans to include PDB format parsing, especially for proteins.

@kanishka-azimi

This comment has been minimized.

Show comment
Hide comment
@kanishka-azimi

kanishka-azimi Aug 22, 2018

Hi. I love dynamic programming algorithms, graph algorithms, and anything involving the concept of an approximation algorithm (TSP approximation). I have a few years of Haskell experience, but I have not touched Rust yet. I am kind of interested in working at the intersection of algorithms, Rust, and bioinformatics, especially if I find something that brings together distributed algorithms and sequential algorithms to scale horizontally and vertically in combination.

I also tend to enjoy short programming exercises like HackerRank or Codility, if helps in what tasks you suggest.

I'll try to look through the list of todo's, but feel free to suggest some "new to rust" tasks, for me to get my feet wet.

kanishka-azimi commented Aug 22, 2018

Hi. I love dynamic programming algorithms, graph algorithms, and anything involving the concept of an approximation algorithm (TSP approximation). I have a few years of Haskell experience, but I have not touched Rust yet. I am kind of interested in working at the intersection of algorithms, Rust, and bioinformatics, especially if I find something that brings together distributed algorithms and sequential algorithms to scale horizontally and vertically in combination.

I also tend to enjoy short programming exercises like HackerRank or Codility, if helps in what tasks you suggest.

I'll try to look through the list of todo's, but feel free to suggest some "new to rust" tasks, for me to get my feet wet.

@johanneskoester

This comment has been minimized.

Show comment
Hide comment
@johanneskoester

johanneskoester Sep 5, 2018

Contributor

@brianjimenez welcome! The plan is: if there is somebody who wants to do the work, we are happy to include it. In general the dev model is that things are added as needed and whenever a volunteer does the coding.
@kanishka-azimi welcome! In the current todos, there is no simple stuff left I am afraid. However, in general every single bioinformatics text-book algorithm is suitable for starting. My suggestion is to take one of the classic books, like this and check whether there is a nice algorithm that is not yet part of Rust-Bio. I am looking forward to your contribution and I will be happy to review it!

Sorry guys for the late response, I have been on vacation.

Contributor

johanneskoester commented Sep 5, 2018

@brianjimenez welcome! The plan is: if there is somebody who wants to do the work, we are happy to include it. In general the dev model is that things are added as needed and whenever a volunteer does the coding.
@kanishka-azimi welcome! In the current todos, there is no simple stuff left I am afraid. However, in general every single bioinformatics text-book algorithm is suitable for starting. My suggestion is to take one of the classic books, like this and check whether there is a nice algorithm that is not yet part of Rust-Bio. I am looking forward to your contribution and I will be happy to review it!

Sorry guys for the late response, I have been on vacation.

@brianjimenez

This comment has been minimized.

Show comment
Hide comment
@brianjimenez

brianjimenez Sep 5, 2018

@johanneskoester thanks a lot! I've started reading the 2018 edition Rust book a few weeks ago and messing around with a dirty approach to PDB parsing here. Still a lot of Rust to learn, but I'll happy to help with the PDB parts.

brianjimenez commented Sep 5, 2018

@johanneskoester thanks a lot! I've started reading the 2018 edition Rust book a few weeks ago and messing around with a dirty approach to PDB parsing here. Still a lot of Rust to learn, but I'll happy to help with the PDB parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment