APIGenome - Big data genomics analysis libraries & tools
APIGenome consists of a collection of perl libraries and command-line utilities for big data genomics analysis. This repository started as a personal repository of small genomic analysis tools developed by Hyun Min Kang, but some of the utilities developed in this repository may be exposed more widely.
APIGenome provides a useful perl software development, in regards to handling command-line arguments, automated self-documentation, and useful APIs for genomic analysis.
Because APIGenome contains many "in-progress" software tools as a sandbox (as the name of default repository indicates), it contains many software tools that has not been fully described or documented. As APIGenome will be continuously updated with new tools, it will be always UNDER CONSTRUCTION. Many preliminary utilities will be partially documented, and may contain bugs. So use the software tools at your own risk.
Note that some of the software tools in this repository MAY MIGRATE OUT TO A STANDALONE REPOSITORY if the tool receives a wide attention enough to have their own brand name.
Requirement before installation
Currently, APIGenome installation was tested in Ubuntu and Mac OS X. If you find installation problems in other OS, please let the authors know.
First, you need standard UNIX tools including
grep, mv, rm, make, cat, cut, dirname, head, mkdir, sort, zcat installed.
Next, you will need to have a number of tools installed, including
autoconf, automake, libtool, perl, R, Rscript.
Cloning from github repository
At the parent directory of
htslib/, you can clone the current snapshot of this repository to install as well.
$ git clone https://github.com/hyunminkang/apigenome.git $ cd apigenome/ $ autoreconf -vfi $ ./configure --prefix [/path/to/install] $ make $ make install
It is recommended to specify
--prefix explictly as the installation without
--prefix argument has not been extensively tested.
How to use APIGenome utilities
APIGenome contains a list of many self-documented command line utilities. To understand how to use each of them, you can run each utility with -man or -help option to see the command line usages.
$ [path/to/apigenome]/bin/[utility-name] -man $ [path/to/apigenome]/bin/[utility-name] -help
List of available APIGenome utilities
The complete list of APIGenome Utility can be found at
Some of these utilities that are not finished and maybe under development for internal use. You may use them at your own risk, but the software tools listed below should be relatively more mature enough to get assistance from the developers when help is needed.
Each software tool is self-documented, and you can see the detailed document by typing
For C++ program, such as
cramore contains multiple programs inside it, and the documentation can be found by typing
$(INSTALL_DIR)/bin/cramore $(COMMAND_NAME) -help
The development status of the software are classified into four stages:
- alpha : The software tool is in early stage of development, or it is not yet ready for sharing. It may be buggy and the documentation is not comprehensive. Developer may not promptly respond to bug report or questions. Use at your own (high) risk.
- beta : The software tool is in a relatively good shape to share with others. It is either (a) in the stage of testing or (b) considered as a narrowly-shared (in-house) tools. It is self-documented, but it is unlikely to be comprehensively documented and interfaced because the software does not have wide attention, or it is in the test stage of development. For the software tool that is expected to move to release status, the documentation and interface will continute to be improved. Bugs may exist, but bug reports will receive attention and appreciation by authors.
- release : The software tool is released, and possibly already published. Polishing documentation, bug reports will be much appreciated. Questions regarding the released software tools will be more promptly answered.
- migrating : The software tool received a wide attention enough to be justified to be migrated and branded on its own. During the migration period, there can be two separate repository of the tool, but the new repository that will be migrated into may be more up-to-date, and the version in this repository is likely to stay in a more stable version.
Below is the list of APIGenome utilities, except for
cramore software tool. Note that the list is only occasionally updated, so may not reflect recent changes. See above if you need a more detailed documentation
|Category||Utility Name||Development Status||Brief Description|
| Sequence Reads |
align-dropseq | alpha | All-in-one alignment of DropSeq sequence reads |
align-pro | alpha | Align PROcap sequence data |
bam-quick-peek-batch | alpha | Produce simple summary statistics for a list of BAM files |
dropseq-resolve-barcode-trimming | alpha | Resolve barcode trimming issue for Dropseq |
dropseq-kallisto | alpha | Sequence alignment of Dropseq data using kallisto software tool |
demux-fastq | alpha | Demultiplex barcoded FASTQ for single-ended sequence data in DropSeq format |
now-seq-batch | alpha | Produce a summary of QC metrics from outputs of GotCloud alignment pipeline |
pileup-pro | alpha | Produce pileups for PROseq/PROcap data |
rev-trim | alpha | Reverse complement and trim sequence reads |
| Variant Calls |
bed-diff | beta | Compare genotype concordance and discordance between callset |
draw-afs | alpha | Draw allele frequency spectrum (AFS) from VCF sites |
inspect-sv | alpha | Examine the reads within a structural variant |
vcf-lookup-rsid | beta | Lookup a variant from VCF based on rsIDs |
vcf-issac-sumary | alpha | Sumamrize VCF files produced by Illumina's iSSAC pipeline |
vcf-add-rsid | beta | Add rsIDs to VCF files |
vcf-delta-svm | beta | DeltaSVM Method (Lee et al. Nat Genet 2015) implemented for VCF files |
vcf-extract-gt-only | alpha | Extract GT field only from a VCF |
vcf-f2-sharing | alpha | Extract GT field only from a VCF |
vcf-liftover | beta | Software for lifting over VCF files |
vcf-lookup-rsid | beta | Lookup rsID from a VCF |
vcf-milk-filter | alpha | Mendelian-inheritance and likelihood based variant filtering software |
vcf-resolve-chrX-hets | alpha | Resolve heterozygous genotyes from chrX based on likelihoods |
vcf-summary | beta | Produce basic summary of VCF files such as Ts/Tv, %dbSNP |
vcf-summary-merge | alpha | Merge multiple
vcf-summary output files |
vcf-summary-v2 | alpha | A newer version of
vcf-summary that includes indels |
vcfast | beta | Fast command line utility for processing VCF files (in C++) |
| Expression Array |
cel-extract-intensity | alpha | Extract probe-level intensity information from CEL files |
| Other Genomics Tool |
bed-tss-match | alpha | Identify transcription start sites from PROcap pileups |
count-dropseq | alpha | Produce digital expression matrix from Dropseq |
gene-conv-name | alpha | convert between gene names and symbols |
merge-tsv-generic | beta | Sum the values across multiple tab-delimited files formatted in a same way |
tsv-join | beta | Join multiple tab-delimited files pivoted by a shared column |
| Parallelization |
run-make | beta | Fault-tolerant parallization utility based on Makefile |