Pangenome graphs built from raw sets of alignments may have complex local structures generated by common patterns of genome variation. These local nonlinearities can introduce difficulty in downstream analyses, visualization, and interpretation of variation graphs.
smoothxg
finds blocks of paths that are collinear within a variation graph.
It applies partial order alignment to each block, yielding an acyclic variation graph.
Then, to yield a "smoothed" graph, it walks the original paths to lace these subgraphs together.
The resulting graph only contains cyclic or inverting structures larger than the chosen block size, and is otherwise manifold linear.
In addition to providing a linear structure to the graph, smoothxg
can be used to extract the consensus pangenome graph by applying the heaviest bundle algorithm to each chain.
To find blocks, smoothxg
applies a greedy algorithm that assumes that the graph nodes are sorted according to their occurence in the graph's embedded paths.
The path-guided stochastic gradient descent based 1D sort implemented in odgi sort -Y
is designed to provide this kind of sort.
This sort is similar to a 1-dimensional graph layout.
After finding blocks
smoothxg
can operate an any input variation graph in GFA format.
The graph must have sequences represented as paths in P records, while the topology of the graph is in S and L records.
Path names should be unique.
seqwish
is a standard way to make such a graph.
smoothxg
uses cmake to build itself and its dependencies. At least GCC version 9.3.0 is required for compilation.
You can check your version via:
gcc --version
g++ --version
Clone the smoothxg
git repository and build with:
sudo apt-get update && sudo apt-get install -y libatomic-ops-dev libgsl-dev zlib1g-dev libzstd-dev libjemalloc-dev
git clone --recursive https://github.com/pangenome/smoothxg.git
cd smoothxg
cmake -H. -Bbuild && cmake --build build -- -j 4
To optimize for architecture
cmake -DCMAKE_BUILD_TYPE=Release .. && make -j 16 VERBOSE=1 && ctest . --verbose
libzstd-dev
must be of version 1.4 or higher.
Run tests:
ctest . --verbose
Note that smoothxg depends on git submodules:
git submodule update --init --recursive
In your source dir make sure git submodules are up-to-date and follow the instructions in guix.scm.
If you need to avoid machine-specific optimizations, use the CMAKE_BUILD_TYPE=Generic
build type:
cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Generic && cmake --build build -- -j 3
To build for a specific architecture you can use EXTRA_FLAGS
cmake -DCMAKE_BUILD_TYPE=Release -DEXTRA_FLAGS="-Ofast -march=znver1" .. && make -j 16 VERBOSE=1
And to make a static build add the -DBUILD_STATIC=ON
switch.
smoothxg
recipes for Bioconda are available at https://anaconda.org/bioconda/smoothxg.
To install the latest version using Conda
execute:
conda install -c bioconda smoothxg
First, clone the guix-genomics repository:
git clone https://github.com/ekg/guix-genomics
And install the smoothxg
package to your default GUIX environment:
GUIX_PACKAGE_PATH=. guix package -i smoothxg
Now smoothxg
is available as a global binary installation.
Add the following to your ~/.config/guix/channels.scm:
(cons*
(channel
(name 'guix-genomics)
(url "https://github.com/ekg/guix-genomics.git")
(branch "master"))
%default-channels)
First, pull all the packages, then install smoothxg
to your default GUIX environment:
guix pull
guix package -i smoothxg
If you want to build an environment only consisting of the smoothxg
binary, you can do:
guix environment --ad-hoc smoothxg
For more details about how to handle Guix channels, go to https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics.git.
To make the -S/--write-split-block-fastas
and -B/--write-poa-block-fastas
options available, and emit a table
with POA block statistics, add the -DPOA_DEBUG=ON
option:
cmake -H. -Bbuild -D CMAKE_BUILD_TYPE=Release -DPOA_DEBUG=ON && cmake --build build -- -j 3