BioClojure is (the seed of) a collection of functions that provides bioinformatics functionality to the clojure programming language. Clojure is a young functional programming language that runs on the JVM. BioClojure uses incanter to provide R-like statistical functionality.
Initial focus is next-generation sequencing and sequence variation. This includes:
- handling FASTQ and bam files (todo; probably interfacing with Picard)
- handling VCF files (done)
Either just download the latest jar file, or compile the jar yourself:
lein deps
lein compile
lein uberjar
“lein deps” will automatically download all dependencies.
This library includes a function to convert VCF files to tab-delimited format. This conversion includes all fields that are defined in the file; no information is omitted (i.e. all INFO tags are included).
Caution: this does not work for VCFv4 yet. That version allows for genotype fields to be dropped entirely. (Is simple to implement, but no time)
./scripts/vcf2tsv NA12878.vcf > NA12878.tsv
cljr repl
> (use 'bioclojure)
> (ns bioclojure)
> (def a (load-vcf "./data/sample.vcf"))
> (with-data a
> (view (histogram ($ :QUAL))))
At the moment incanter charts don’t provide logarithmic scales yet. To do this yourself, do
> (with-data a
> (view (histogram (map #(log %) ($ :QUAL)))))
Are automatically installed with “lein deps”
- clojure-contrib
- incanter
- As I’m completely new to clojure, this library will be slow to grow, at least initially.
- As I don’t use clojure (much) in real-life work (yet), this library will be slow to grow, at least initially.
- As I have to focus on real work, focus will be what I need: NGS.
Copyright © 2010 Jan Aerts
Licensed under the same terms as clojure