Releases · jltsiren/gcsa2

23 Jan 03:52

jltsiren

v1.3

ee3cc0b

GCSA2 v1.3 Latest

Latest

Uses C++14 and the vgteam fork of SDSL.
Support for Clang.
Deterministic shuffle in locate(range, max_positions, results) to avoid platform-specific behavior.
Construction, serialization, and statistics fixes for empty indexes.
Installation script.

Assets 2

11 May 03:59

jltsiren

v1.2

19f429d

GCSA2 v1.2

Various improvements to index construction. Deals with some bottlenecks when the temporary files are on a fast SSD.

New functionality: locate(range, max_positions, results) returns a random subset of the matching positions in the query range.
Read and write data in smaller blocks to avoid the issue with >2 GB reads in GCC on macOS.
Delete temporary files when std::exit() is called.
Size limit is now the total for all temporary files. The default was increased to 2048 GB.
Faster index construction: faster preprocessing, better scheduling in PathGraph::extend().

Assets 2

23 Feb 22:46

jltsiren

v1.1

11ee0ba

GCSA2 v1.1

Support for haplotype-aware indexing and higher-order indexes in VG.

Node mappings for using separate sets of node identifiers for graph transformations and locate() queries.
- Intended for graphs pruned with vg prune --unfold-paths, which replaces complex subgraphs with subgraphs that only contain specific paths.
- Build a path graph for the unfolded graph but create an index that maps to the original graph.
Support for 4 doubling steps (paths of length up to 256).
Optionally specify a sample period instead of always sampling the initial offsets of the original nodes.

Assets 2

22 May 13:49

jltsiren

v1.0

086a627

GCSA2 v1.0

This version contains bug fixes and other minor improvements. As there have been no major changes since September 2016, it seems appropriate to call this version 1.0.

The prefix of temporary files is now gcsa_ instead of .gcsa_. No more large hidden files that may remain if the construction fails.
Construction is aborted if reading/writing temporary files fails.
Tools now display the version of GCSA2.
Fixed a buffer overflow in LCP construction.

Assets 2

01 Oct 18:46

jltsiren

v0.8

b652fc8

GCSA2 v0.8

This is a performance update.

New encoding for the FM-index that makes it as fast as the FM-index in BWA.
Kmer comparison: report the size of the symmetric difference between two indexes, and optionally output the kmers specific to one of the indexes.
The LCP array file has now a proper header.
Due to changes in file formats, old indexes must be rebuilt.
Headers are now located under include/gcsa.

Assets 2

16 Aug 13:40

jltsiren

v0.7

a2b377f

GCSA2 v0.7

This is a major construction update.

Faster index construction due to simplified disk I/O.
The index is now based on maximally pruned de Bruijn graphs, which are more intuitive and slightly smaller than the non-maximally pruned graphs in the earlier versions.
Overlapping subgraphs (e.g. a pruned variation graph and the reference path) can be indexed in separate files without excessive memory usage.
The Alphabet object is now a property of InputGraph, not a GCSA construction parameter.
Verbosity level can be changed runtime with Verbosity::set().
GCSA2 now compiles with an OpenMP-enabled Clang compiler. Index construction is slower than with g++ due to the lack of multi-threaded sorting.

Assets 2

16 Mar 13:40

jltsiren

v0.6.1

b433cc2

GCSA2 v0.6.1

This is a quick bug fix.

STNode::lcp() now returns the string depth of the node itself, not of its parent.
String depths can also be determined by using LCPArray::depth().

Assets 2

14 Mar 15:02

jltsiren

v0.6

d334980

GCSA2 v0.6

This is a major functionality update. It adds support for the following operations:

Counting queries determining the number of distinct start nodes in a lexicographic range of path nodes. The solution is based on a generalization of Sadakane's document counting structure.
Parent queries in the suffix tree in order to find maximal exact matches quickly. The solution is based on a range minimum tree over the LCP array, which can also be used to add support for other suffix tree operations. (The lack of inverse suffix array functionality in GCSA prevents us from making the index fully equivalent to a suffix tree.)
Counting the number of distinct kmers in the index. This is primarily useful for determining how much space is saved by pruning the de Bruijn graph. The same approach can be used for e.g. comparing two indexes based on the kmers they contain.

Other things to consider:

Index construction uses somewhat more time and memory due to the new structures.
Index size has increased by 10-15% (without the LCP array) or by about 50% (with the LCP array).
Index file format has changed. Old indexes cannot be used anymore, as a conversion tool would not be that much more efficient than rebuilding the indexes.

Assets 2

14 Dec 10:55

jltsiren

v0.5

fe8ccc9

GCSA2 v0.5

This is the first actual release of GCSA2. The indexes are smaller than in the earlier releases, and the interfaces have been frozen and documented.

The final index is typically 25% to 30% smaller than before. This is caused by more aggressive pruning of the de Bruijn graph.
Index construction is slightly faster than before due to asynchronous reading of temporary files and a faster de Bruijn graph implementation.
The construction interface, the high-level query interface, and the low-level query interface have been frozen and documented.
The index file format has changed. Old indexes can be converted to the new format by using the convert_gcsa tool.

Assets 2

22 Nov 14:36

jltsiren

v0.4

f3386ec

GCSA2 v0.4 Pre-release

Pre-release

This is the fourth pre-release of GCSA2. The memory usage of index construction is now significantly lower, at the price of 2x longer construction time.

The construction algorithm is now disk-based. It keeps the graphs on disk and loads at most one chromosome at a time into memory.
Graph order is no longer a hard limit for query length. Longer queries may still result in false positives, however.
Headers have been reorganized into public headers (gcsa.h, files.h, support.h, and utils.h) and internal headers (path_graph.h, dbg.h, internal.h).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: jltsiren/gcsa2

GCSA2 v1.3

GCSA2 v1.2

GCSA2 v1.1

GCSA2 v1.0

GCSA2 v0.8

GCSA2 v0.7

GCSA2 v0.6.1

GCSA2 v0.6

GCSA2 v0.5

GCSA2 v0.4