A set of tools for working with genomic and high throughput sequencing data.
Scala Java Other
Latest commit 0f0307b Feb 22, 2017 @tfenne tfenne committed on GitHub Support for cross-building scala 2.11 and 2.12. (#177)

README.md

Build Status codecov Codacy Badge Maven Central Dependency Status License Language

fgbio

A set of tools to analyze genomic data.

Goals

There are many toolkits available for analyzing genomic data; fgbio does not aim to be all things to all people but is specifically focused on providing:

  • Robust, well-tested tools.
  • An easy to use command-line.
  • Documentation for each tool.
  • Tools not found anywhere else.
  • Open source development for the benefit of the community and our clients.

Building

Cloning the Repository

Git LFS is used to store large files used in testing fgbio. In order to compile and run tests it is necessary to install git lfs. To retrieve the large files either:

  1. Clone the repository after installing git lfs, or
  2. In a previously cloned repository run git lfs pull once

After initial setup regular git commands (e.g. pull, fetch, push) will also operate on large files and no special handling is needed.

To clone the repository: git clone https://github.com/fulcrumgenomics/fgbio.git

Running the build

fgbio is built using sbt.

Use sbt assembly to build an executable jar in target/scala-2.11/.
Tests may be run with sbt test. Java SE 8 is required.

Command line

java -jar target/scala-2.11/fgbio-0.1.3-SNAPSHOT.jar to see the commands supported. Use java -jar target/scala-2.11/fgbio-0.1.3-SNAPSHOT.jar <command> to see the help message for a particular command.

Include fgbio in your project

You can include fgbio in your project:

"com.fulcrumgenomics" %% "fgbio" % "0.1.3-SNAPSHOT"

Overview

Fgbio is a command line tool to perform bioinformatic genomic data analysis. The collection of tools within fgbio are used by our customers for both ad-hoc data analysis and within their production pipelines. These tools typically operate on read-level data (ex. FASTQ, SAM, or BAM) or variant-level data (ex. VCF or BCF). They range from simple tools to filter reads in a BAM file, to tools to compute consensus reads from reads with the same molecular index/tag. See the list of tools for more detail on the tools

List of tools

Below we highlight a few tools that you may find useful. Please see the help message for a full list of tools available. In no particular order ...

  • Tools to work with unique molecular tags/indexes (Umis).
    • Annotating/Extract Umis from read-level data: AnnotateBamWithUmis and ExtractUmisFromBam.
    • Tools to manipulate read-level data containing Umis: CallMolecularConsensusReads and GroupReadsByUmi
  • Tools to manipulate read-level data:
    • Filter read-level data: FilterBam.
    • Randomize the order of read-level data: RandomizeBam.
    • Update read-level metadata: SetMateInformation and UpdateReadGroups.
  • Miscellaneous tools:
    • Pick molecular indices (ex. sample barcodes, or molecular indexes): PickIlluminaIndices.
    • Convert the output of HAPCUT (a tool for phasing variants): HapCutToVcf.
    • Find technical or synthetic sequences in read-level data: FindTechnicalReads.
    • Assess phased variant calls: AssessPhasing.

Contributing

Contributions are welcome and encouraged. We will do our best to provide an initial response to any pull request or issue within one-week. For urgent matters, please contact us directly.

Authors

License

fgbio is open source software released under the MIT License.