Skip to content

Commit

Permalink
Merge pull request #2510 from eseiler/doc/changelog
Browse files Browse the repository at this point in the history
[DOC] Update Changelog, Readme, and links
  • Loading branch information
eseiler committed Jan 29, 2024
2 parents 1b711f4 + 49b2dcb commit 77c2180
Show file tree
Hide file tree
Showing 278 changed files with 4,553 additions and 4,464 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Release 2.5.0 might work with older compilers, but there is no support.
Library Features
^^^^^^^^^^^^^^^^

- **Namespace**:
- The namespace was changed from ``seqan`` to ``seqan2`` to allow for interoperability with other SeqAn libraries.
- Sequence I/O:
- Accepting files that end in ``.fas``.
- Indexing:
Expand Down Expand Up @@ -391,7 +393,7 @@ Library Updates and Selected Bugfixes

- Journaled String Tree (new module):
- reference compressed string set structure
- for more details see the `publication <http://bioinformatics.oxfordjournals.org/content/30/24/3499.short>`_
- for more details see the `publication <https://bioinformatics.oxfordjournals.org/content/30/24/3499.short>`_

- STL containers:
- added a completely new adaptation to SeqAn interfaces that supports all STL containers, also ``std::array`` and ``std::forward_list``
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ The licenses for the applications themselves can be found in the LICENSE files.

## Prerequisites

Older compiler versions might work but are not supported.
Older compiler versions might work but are neither supported nor tested.

### Linux, macOS, FreeBSD
* GCC ≥ 10
* Clang/LLVM ≥ 11
* Intel Compiler ≥ 2022.1.0 (Intel OneAPI)
* GCC ≥ 11
* Clang/LLVM ≥ 15
* Intel oneAPI C++ Compiler 2024.0.2 (IntelLLVM)

### Windows
* Visual C++ ≥ 17.0 / Visual Studio ≥ 2022
Expand Down
8 changes: 4 additions & 4 deletions apps/alf/README
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
*** ALF - Alignment Free Sequence Comparison ***
http://www.seqan.de/projects/alf
https://www.seqan.de/apps/alf
January, 2012

---------------------------------------------------------------------------
Expand All @@ -20,15 +20,15 @@ Table of Contents
ALF can be used to calculate the pairwise similarity of sequences using
alignment-free methods. All methods which are implemented are based on
k-mer counts. More details can be found in the online documentation of the
alignment-free methods (www.seqan.de). By default, ALF uses the
alignment-free methods (www.seqan.de). By default, ALF uses the
N2 similarity measure.

---------------------------------------------------------------------------
2. Installation
---------------------------------------------------------------------------

ALF is distributed with SeqAn - The C++ Sequence Analysis Library (see
http://www.seqan.de). To build ALF from Git do the following:
https://www.seqan.de). To build ALF from Git do the following:

1) git clone https://github.com/seqan/seqan.git
2) mkdir -p build/Release
Expand Down Expand Up @@ -130,7 +130,7 @@ sequences from the input fasta file, for example:
---------------------------------------------------------------------------

These examples use the fasta file "small.fasta" which can be found in
seqan/apps/alf/example/. Copy this file to the directory where you
seqan/apps/alf/example/. Copy this file to the directory where you
execute alf.

(1) Run ALF with default settings on two sequences:
Expand Down
2 changes: 1 addition & 1 deletion apps/alf/alf.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ int main(int argc, const char * argv[])
addTextSection(parser, "Contact and References");
addListItem(parser, "For questions or comments, contact:", "Jonathan Goeke <goeke@molgen.mpg.de>");
addListItem(parser, "Please reference the following publication if you used ALF or the N2 method for your analysis:", "Jonathan Goeke, Marcel H. Schulz, Julia Lasserre, and Martin Vingron. Estimation of Pairwise Sequence Similarity of Mammalian Enhancers with Word Neighbourhood Counts. Bioinformatics (2012).");
addListItem(parser, "Project Homepage:", "http://www.seqan.de/projects/alf");
addListItem(parser, "Project Homepage:", "https://www.seqan.de/apps/alf");

// Parse command line.
seqan2::ArgumentParser::ParseResult res = seqan2::parse(parser, argc, argv);
Expand Down
20 changes: 10 additions & 10 deletions apps/bs_tools/README
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,22 @@ Table of Contents
1. Overview
---------------------------------------------------------------------------

BS tools are designed for the analysis of BS-Seq data, from bisulfite read
BS tools are designed for the analysis of BS-Seq data, from bisulfite read
mapping to SNP and methylation level calling at single-nucleotide resolution.
It consists out of two main tools: Bisar and Casbar.

Bisar reads three-letter mappings of bisulfite reads and computes
local pairwise four-letter realignments using an advanced statistical
Bisar reads three-letter mappings of bisulfite reads and computes
local pairwise four-letter realignments using an advanced statistical
alignment model.

These alignments can be then used in the subsequent tool Casbar for a simultaneous
SNP and methylation level calling. A Bayesian model is used to compute the
posterior probability for each possible genotype under the observed data
and a given methylation level. The methylation level maximizing the
posterior probability is determined and the genotype with the highest
These alignments can be then used in the subsequent tool Casbar for a simultaneous
SNP and methylation level calling. A Bayesian model is used to compute the
posterior probability for each possible genotype under the observed data
and a given methylation level. The methylation level maximizing the
posterior probability is determined and the genotype with the highest
probability is chosen.

The files README.bisar and README.bisar contain detailed documentation of
The files README.bisar and README.bisar contain detailed documentation of
the respective tools.

---------------------------------------------------------------------------
Expand Down Expand Up @@ -60,6 +60,6 @@ or
---------------------------------------------------------------------------

More detailed information you can find in the following document:
http://www.mi.fu-berlin.de/en/inf/groups/abi/theses/master_dipl/krakau/
https://www.mi.fu-berlin.de/en/inf/groups/abi/theses/master_dipl/krakau/
msc_thesis_krakau.pdf?1394119375

32 changes: 16 additions & 16 deletions apps/bs_tools/README.bisar
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,19 @@ Table of Contents
1. Overview
---------------------------------------------------------------------------

Bisar reads three-letter mappings of bisulfite reads and computes
local pairwise four-letter realignments using an advanced statistical
alignment model. The alignment scoring scheme incorporates the global
methylation rate, the bisulfite conversion rate and base qualities combined
Bisar reads three-letter mappings of bisulfite reads and computes
local pairwise four-letter realignments using an advanced statistical
alignment model. The alignment scoring scheme incorporates the global
methylation rate, the bisulfite conversion rate and base qualities combined
with base dependent sequencing error frequencies.
Mapping qualities are computed and after the final verification step only
Mapping qualities are computed and after the final verification step only
unique four-letter alignments are given out in a SAM file.

---------------------------------------------------------------------------
3. Usage
---------------------------------------------------------------------------

To get a short usage description of Bisar, you can execute bisar -h or
To get a short usage description of Bisar, you can execute bisar -h or
bisar --help.

Usage: bisar [OPTION]... <MAPPED READ FILE> <GENOME FILE> <READS FILE>
Expand All @@ -45,7 +45,7 @@ Usage: bisar [OPTION]... <MAPPED READ FILE> <GENOME FILE> <READS FILE>
[ -o FILE ], [--output-file FILE ]

Mapping output file. Valid filetype is: .sam.

[ -e3 NUM ], [ --max3-error NUM ]

Max. error rate in 3-letter alphabet. In range [0..100]. Default: 3.
Expand All @@ -62,15 +62,15 @@ Usage: bisar [OPTION]... <MAPPED READ FILE> <GENOME FILE> <READS FILE>

Use empirical substitution error frequencies of Illumina sequencing
data for alignment scoring scheme (corresponding to Dohm et al. 2008).

[ -nsi ], [ --ns-ins-errors ]

Use empirical insertion error frequencies of Illumina sequencing data for
Use empirical insertion error frequencies of Illumina sequencing data for
alignment scoring scheme (corresponding to Minoche et al. 2011).

[ -nsd ], [ --ns-del-errors ]

Use empirical deletion error frequencies of Illumina sequencing data for
Use empirical deletion error frequencies of Illumina sequencing data for
alignment scoring scheme (corresponding to Minoche et al. 2011).

[ -der NUM ], [ --del-error-rate NUM ]
Expand Down Expand Up @@ -118,7 +118,7 @@ Usage: bisar [OPTION]... <MAPPED READ FILE> <GENOME FILE> <READS FILE>
[ -h ], [ --help ]

Displays this help message.

[ --version ]

Display version information.
Expand All @@ -130,14 +130,14 @@ Usage: bisar [OPTION]... <MAPPED READ FILE> <GENOME FILE> <READS FILE>

The verified four-letter pairwise alignments are given out in SAM format.

See http://samtools.sourceforge.net/ for more details.
See https://samtools.sourceforge.net/ for more details.

---------------------------------------------------------------------------
5. Example
---------------------------------------------------------------------------

In order to compute realignments for all reads with up to 4% errors in their
three-letter alignment, while allowing up to 5% errors in four-letter
In order to compute realignments for all reads with up to 4% errors in their
three-letter alignment, while allowing up to 5% errors in four-letter
alignments call:

bisar -e3 4 -e4 5 -o mapped_reads_verified.sam mapped_reads.sam \
Expand All @@ -161,8 +161,8 @@ or
7. References
---------------------------------------------------------------------------

More detailed information about the underlying method you can find in the
More detailed information about the underlying method you can find in the
following document:
http://www.mi.fu-berlin.de/en/inf/groups/abi/theses/master_dipl/krakau/
https://www.mi.fu-berlin.de/en/inf/groups/abi/theses/master_dipl/krakau/
msc_thesis_krakau.pdf?1394119375

76 changes: 38 additions & 38 deletions apps/bs_tools/README.casbar
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,22 @@ Table of Contents
1. Overview
---------------------------------------------------------------------------

Casbar uses pairwise four-letter alignments given in SAM format for
simultaneous SNP and methylation level calling. A Bayesian model is used to
compute the posterior probability for each possible genotype under the
observed data and a given methylation level. The methylation level maximizing
the posterior probability is determined and the genotype with the highest
probability is chosen. Therefore the bisulfite conversion rate and base
qualities, if chosen combined with base dependent sequencing error frequencies,
are incorporated.
All called SNPs and methylation levels that pass the verification criteria
are given out in a VCF and BED file respectively.
Casbar uses pairwise four-letter alignments given in SAM format for
simultaneous SNP and methylation level calling. A Bayesian model is used to
compute the posterior probability for each possible genotype under the
observed data and a given methylation level. The methylation level maximizing
the posterior probability is determined and the genotype with the highest
probability is chosen. Therefore the bisulfite conversion rate and base
qualities, if chosen combined with base dependent sequencing error frequencies,
are incorporated.
All called SNPs and methylation levels that pass the verification criteria
are given out in a VCF and BED file respectively.

---------------------------------------------------------------------------
3. Usage
---------------------------------------------------------------------------

To get a short usage description of Casbar, you can execute casbar -h or
To get a short usage description of Casbar, you can execute casbar -h or
casbar --help.

Usage: casbar [OPTIONS] <GENOME FILE> <ALIGNMENT FILE> -o <SNP FILE> -b <METH-LEVEL FILE>
Expand All @@ -57,31 +57,31 @@ Usage: casbar [OPTIONS] <GENOME FILE> <ALIGNMENT FILE> -o <SNP FILE> -b <METH-L

[ -mp NUM], [ --max-pile NUM ]

Maximal number of matches allowed to pile up at the same genome position.
Maximal number of matches allowed to pile up at the same genome position.
In range [0..inf]. Default: 0.

[ -mmp ], [ --merged-max-pile ]

Do pile up correction on merged lanes.

[ -mc NUM ], [ --min-coverage NUM ]

Minimal required number of reads covering a candidate position.
Minimal required number of reads covering a candidate position.
In range [1..inf]. Default: 6.

[ -eb NUM ], [ --exclude-border NUM ]

Exclude read positions within eb base pairs of read borders for
Exclude read positions within eb base pairs of read borders for
SNP calling. Default: 0.

[ -su ], [ --suboptimal ]

Keep suboptimal reads.

[ -pws NUM ], [ --parse-window-size NUM ]

Genomic window size for parsing reads (concerns memory consumption,
choose smaller windows for higher coverage). In range [1..100000].
Genomic window size for parsing reads (concerns memory consumption,
choose smaller windows for higher coverage). In range [1..100000].
Default: 100000.

[ -I TEXT], [ --intervals TEXT ]
Expand All @@ -94,7 +94,7 @@ Usage: casbar [OPTIONS] <GENOME FILE> <ALIGNMENT FILE> -o <SNP FILE> -b <METH-L

[ -mm NUM ], [ --min-mutations NUM ]

Minimal number of deviating bases for calling. In range [1..inf].
Minimal number of deviating bases for calling. In range [1..inf].
Default: 3.

[ -mq NUM], [ --min-quality NUM ]
Expand All @@ -103,17 +103,17 @@ Usage: casbar [OPTIONS] <GENOME FILE> <ALIGNMENT FILE> -o <SNP FILE> -b <METH-L

[ -mmq NUM], [ --min-map-quality NUM ]

Minimum base call quality for a match to be considered. In range [0..inf].
Minimum base call quality for a match to be considered. In range [0..inf].
Default: 1.

[ -hes NUM ], [ --prob-hetero-snp NUM ]

Heterozygous SNP probability to compute genotype prior probabilities.
Heterozygous SNP probability to compute genotype prior probabilities.
In range [0..1]. Default: 0.005.

[ -hos NUM], [ --prob-homo-snp NUM ]

Homozygous SNP probability to compute genotype prior probabilities.
Homozygous SNP probability to compute genotype prior probabilities.
In range [0..1]. Default: 0.0005.

[ -msc NUM], [ --min-score NUM ]
Expand All @@ -126,7 +126,7 @@ Usage: casbar [OPTIONS] <GENOME FILE> <ALIGNMENT FILE> -o <SNP FILE> -b <METH-L

[ -nec ], [ --ns-errors-calling ]

Use empirical error frequencies of Illumina sequencing data to compute
Use empirical error frequencies of Illumina sequencing data to compute
likelihoods in bayesian model (corresponding to Dohm et al. 2008).

[ -v ], [ --verbose ]
Expand All @@ -136,15 +136,15 @@ Usage: casbar [OPTIONS] <GENOME FILE> <ALIGNMENT FILE> -o <SNP FILE> -b <METH-L
[ -vv ], [ --very-verbose ]

Enable very verbose output.

[ -q ], [ --quiet ]

Set verbosity to a minimum.

[ -h ], [ --help ]

Displays this help message.

[ --version ]

Display version information.
Expand All @@ -154,30 +154,30 @@ Usage: casbar [OPTIONS] <GENOME FILE> <ALIGNMENT FILE> -o <SNP FILE> -b <METH-L
4. Output Formats
---------------------------------------------------------------------------

All called SNPs that pass the verification criteria are given out in a VCF
file.
All called SNPs that pass the verification criteria are given out in a VCF
file.

All called methylation levels that pass the verification criteria
are given out in a BED file
(BED6 format: chrom, start, end, name, score, strand and methylation level).
All called methylation levels that pass the verification criteria
are given out in a BED file
(BED6 format: chrom, start, end, name, score, strand and methylation level).

---------------------------------------------------------------------------
5. Example
---------------------------------------------------------------------------

In order to perform SNP and methylation level calling by taking nonuniform
sequencing error probabilities into account call:
In order to perform SNP and methylation level calling by taking nonuniform
sequencing error probabilities into account call:

casbar -nec -o snps.vcf -b meth_levels.bed genome.fa \
mapped_bisulfite_reads.sam

To perform SNP and methylation level calling by taking nonuniform sequencing
To perform SNP and methylation level calling by taking nonuniform sequencing
error probabilities into account for genomic positions with a coverage >= 6
call:

casbar -nec -mc 6 -msc 5 -o snps.vcf -b meth_levels.bed genome.fa \
mapped_bisulfite_reads.sam

The minimum score in this case a SNP or methylation level to be called is 5.

---------------------------------------------------------------------------
Expand All @@ -193,8 +193,8 @@ or
7. References
---------------------------------------------------------------------------

More detailed information about the underlying method you can find in the
More detailed information about the underlying method you can find in the
following document:
http://www.mi.fu-berlin.de/en/inf/groups/abi/theses/master_dipl/krakau/
https://www.mi.fu-berlin.de/en/inf/groups/abi/theses/master_dipl/krakau/
msc_thesis_krakau.pdf?1394119375

0 comments on commit 77c2180

Please sign in to comment.