bustools is a program for manipulating BUS files for single cell RNA-Seq datasets.
The design and motivation for the BUS format and BUStools are described in detail in
P Melsted, V Ntranos, L Pachter, The Barcode, UMI, Set format and BUStools, Bioinformatics, btz279, 2019.
bustools works with BUS files which can be generated efficiently from raw sequencing data, e.g. using kallisto.
Binaries can be downloaded from the bustools website.
To compile bustools download the source code with
git clone https://github.com/BUStools/bustools.git
Navigate to the bustools directory
cd bustools
Make a build directory and move there:
mkdir build
cd build
Run cmake:
cmake ..
Build the code:
make
The bustools executable is now located in build/src. To install bustools into the cmake install prefix path type:
make install
To see a list of available commands type bustools
in the terminal
> bustools
Usage: bustools <CMD> [arguments] ..
Where <CMD> can be one of:
sort Sort bus file by barcodes and UMI
text Output as tab separated text file
Running bustools <CMD> without arguments prints usage information for <CMD>
Raw BUS output from pseudoalignment programs may be unsorted. To simply and accelerate downstream processing BUS files can be sorted using bustools sort
> bustools sort
Usage: bustools sort [options] bus-files
Options:
-t, --threads Number of threads to use
-o, --output File for sorted output
This will create a new BUS file where the BUS records are sorted by barcode first, UMI second, and equivalence class third.
BUS files can be converted to a tab-separated format for easy inspection and processing using shell scripts or high level languages. bustools text
> bustools text
Usage: bustools text [options] bus-files
Options:
-o, --output File for text output