Skip to content

Coding guideline

Enrico Seiler edited this page Mar 22, 2022 · 6 revisions

Table of contents



Library structure

There are three structures or hierarchies in the library:

  1. Filesystem hierarchy
  2. Module hierarchy (almost identical to filesystem hierarchy)
  3. Namespace hierarchy

Filesystem and module hierarchy

  • The library is structured into modules and submodules, represented by directories.
  • Modules are subdirectories of the top-level include directory.
  • Submodules are all subdirectories of module directories, except subfolders called detail (see below).
  • Modules and submodules may contain a detail folder that does not constitute a submodule, it belongs to the (sub-)module it is in and contains implementation detail (all headers inside are not part of the API / may change at any time).
  • There are no sub-sub-modules, a third level folder is only allowed, if it is a detail folder.
  • All (sub-)modules must provide a "meta-header", called all.hpp.
    • The meta-header includes all headers of the module (except those with _detail in their name) and all of its submodules' meta-headers (but not explicitly any content from a detail subdirectory).
    • Unless otherwise required, the order of includes shall be alphabetical with submodules before files [this is also the filesystem's order]
    • There is a top-level "meta-header" that includes all modules.
    • The meta-header defines a Doxygen module of its name for the documentation.
    • Library headers should never include all.hpp from other (sub-)modules, always include exactly the header required.
  • All documented entities that are not members (files, free functions, classes...) shall set the documentation property \ingroup MODULENAME to their respective (sub-)module, i.e., the name of the folder they are in, except where this is called detail (in that case it is its parent).
  • We should include as few as possible in the headers.
    • This applies to both std:: and seqan3::.
    • In snippets, we can be more verbose, i.e., including <vector>, even though it is already included by some seqan3 header.
    • In projects: hpp should only include what is needed in the header. cpp should include what is additionally needed. But it would also make sense to use as few as possible in the cpp.
    • The reasoning for hpp is that you want to include as few as possible since multiple translation units might include the header, but the "implementation" is provided via some cpp which then might be linked.

Example

seqan3/alphabet/aminoacid/all.hpp   // meta-header of the aminoacid sub-module of the alphabet module
seqan3/alphabet/aminoacid/...       // aminoacid sub-module of the alphabet module
seqan3/alphabet/.../...             // other sub-modules
seqan3/alphabet/detail/...          // implementation detail of the alphabet module
seqan3/alphabet/.../...             // other sub-modules
seqan3/alphabet/all.hpp             // meta-header of the alphabet module
seqan3/alphabet/...                 // alphabet module
seqan3/.../...                      // other modules and sub-modules
seqan3/all.hpp                      // global meta-header that provides all of the API

Namespaces

SeqAn namespaces

  • namespace seqan3: everything, except the following exceptions
  • namespace seqan3::detail: all free/global functions, metafunctions, static variables and class definitions that are considered implementation detail and not part of the API (and not a private/protected member of a class)
  • namespace seqan3::view: seqan-defined views (usually in seqan3/range/view)
  • namespace std: overloads of std:: functions like std::begin (rarely) (We write a std::... as we read it as [stood]::....)
  • nothing should be in the global namespace
  • never have using namespace ... in a header file, except inside the test framework

Syntax

  • braces go on newline
  • closing braces shall be documented
  • there is no indention!
  • for better readability and because we don't indent, nested namespaces are not declared inside parent, but separately and with full name

Example

namespace seqan3::detail
{

void my_non_public_function()
{
   // ...
}

} // namespace seqan3::detail

namespace seqan3
{

void my_public_function()
{
    detail::my_non_public_function();
    // ...
}

} // namespace seqan3

File structure and naming

General

  • header-only: SeqAn is a header-only library and all functionality is implemented in header files.
  • extension: Header files have the extension .hpp.
  • file-names:
    • all-lower snake_case
    • only lower case standard characters [a-z] and underscore!
    • generalised singular (concept.hpp instead of concepts.hpp)
    • if all the content of a file is inside the namespace seqan3::detail, the filename shall end _detail.hpp or the file be placed in a detail subdirectory
  • UTF-8: All files are expected to be UTF-8 encoded. Special characters are allowed in documentation and string literals, but not in regular code (function names etc.) and file names.
  • self-contained: every header shall include every other header that it needs.
  • seqan3/core/platform.hpp: every header that doesn't include another SeqAn3 header shall include seqan3/core/platform.hpp.
  • visibility: every header that is not in a detail subfolder or contains _detail in its name is considered part of the API and may be directly included by users of the library.

Contents

  1. Copyright notice:

    1. Copy from the license file or another header.
    2. Update the year if necessary.
    3. Don't add an Author line, instead add it to doc (see below).
  2. File-Documentation:

    1. \file This says Doxygen that the documentation block belongs to this file
    2. \brief + one-line description starting with upper case and ending in .
    3. \author your name <your AT mail.com>
    4. \ingroup MODULENAME (files don't get this!)
    5. optionally a longer description
  3. Single-inclusion: #pragma once (more information); we don't use header guards.

  4. Names and order of includes: (an empty line between each block, sorted alphabetically within)

    1. C system library (rarely!)
    2. C++ Standard Library, e.g.
      • #include <vector> and
      • #include <seqan3/std/ranges>, this header is in the future the same as #include <ranges>, so it will be ordered as if the prefix seqan3/std is not there.
    3. SDSL
    4. Ranges-V3 (Always #include <seqan3/std/ranges> before including any Ranges-V3 header)
    5. SeqAn3
    6. Cereal
    7. Lemon
    • All headers are always included with <header>, not with "header", even SeqAn3!
    • The reasoning for this order is System β†’ required Dependencies β†’ SeqAn β†’ optional Dependencies (because the inclusion of optional dependencies might depend on values/macros from other headers, especially platform.hpp)
    • Of course there are exceptions to this rule, but they should be very well argued!
  5. rest of file (likely starts with a namespace opening)

Example

// -----------------------------------------------------------------------------------------------------
// Copyright (c) 2006-2020, Knut Reinert & Freie UniversitΓ€t Berlin
// Copyright (c) 2016-2020, Knut Reinert & MPI fΓΌr molekulare Genetik
// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License
// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md
// -----------------------------------------------------------------------------------------------------

/*!\file
 * \brief Contains many nice things.
 * \author Hannes Hauswedell <hannes.hauswedell AT fu-berlin.de>
 * \ingroup alphabet
 */

#pragma once

#include <any>
#include <type_traits>
#include <vector>

#include <range/v3/view/drop.hpp>
#include <range/v3/view/take.hpp>

#include <seqan3/range/container/concept.hpp>

// here comes the code

Exceptions

Headers only in Debug

In some cases, you need headers only in DEBUG mode, e.g., when static_asserting a concept. In these cases, you may include the header inside the DEBUG block, but only if this actually improves readability of the file.

10 different DEBUG blocks which each include different headers do not improve readability. In this case, please move everything to an extra _detail.hpp.

Spacing, Indention, Alignment, and Naming

Indention, general formatting

  • always indent by four spaces, no tabs allowed
    • indention larger than four may be used to achieve alignment (see below)
  • indent every scope, except namespaces
  • always place opening brace on newline
  • no trailing whitespace. EVER.
  • maximum line length is 120.

Operators and parentheses

Some basic rules:

  • semicolon, comma ;, , – never a space before, always a newline or space after
  • arithmetic +, -, *, / – always a space before, always a newline or space after
  • logical && || – always a space before, always a newline or space after (do not use alternative and, or)
  • comparison ==, !=, <, >, <=, >= always a space before, always a newline or space after
  • bitshift/stream <<, >> – always a space before, always a newline or space after
  • subscript [, ] never a space before either, never a space after [
  • references &, && – always a space before, always a space after (unless variable omitted in function declaration)

Parenthesis ( and ):

  • do not use for casts, use c++ style casts instead (usually static_cast<>())
  • do not use for initialization, use brace-initialization instead {}
  • for, if, while
    • space between keyword and (, no space after (
    • no space before ), newline after )
  • function declarations, definitions, and call – no space before, no space after (
  • if you don't close an open parenthesis, align the next lines after opening (

Splitting code over multiple lines:

  • in general, put operators at the end of the line and begin a word on the next line aligned with the corresponding word of the current line, e.g.:
if (foo &&
    bar &&
    bax)
//...

func(foo,
     bar,
     bax);

my_enum e = my_enum::FOO |
            my_enum::BAR |
            my_enum::BAX;
  • An exception to this rule is the pipe-symbol in the context of range and views where it is put on the beginning of the line, either aligned with a pipe-symbol on the previous line, or with the assignment operator:
auto v = foo | view::bar
             | view::bax;

auto v = view::myvee(foo)
       | view::myvee2
       | view::myvee3;

Braces

General:

  • opening braces always go to the beginning of a new line
    • only exception: tiny lambdas that completely fit into one line (including, e.g., surrounding function)
  • always lead to indention of contents, except for namespaces
  • empty bodies can be closed on the same line as opening
  • otherwise, the closing brace goes on a newline as well
  • always balance braces, i.e., if you have if and else and one body has braces, the other must, too.

Variable

General:

  • for all types that are not built-in arithmetic types, use brace initialization if at all possible (not () or =
  • initialise all variables upon declaration, unless you really know what you are doing (if in doubt, initialise with empty {})

const-ness:

  • when possible, make variable constexpr or const (in that order)
  • always use "east-const", i.e., put const on the right of the type that is modified; see http://slashslash.info/eastconst/ for more info
  • if a variable is constexpr, put the constexpr on the left of the type (west-constexpr)

Global variables:

  • should be inline and constexpr

Examples I

// brace-initialize, don't use =
int i{7};
int & k{i};
float f{4.5};

// assignment
i = 8;
f = 3.4;

// loops
for (size_t j = 0; j < i; ++j)
    std::cout << j << '\n';

// linebreaks and alignment for readability
// in this case add braces, even for one-line body
for (size_t j = 0;
     (j < i) && some_very_long_condition_or_call();
     ++j)
{
    std::cout << j << '\n';
}

// always balance braces
if (i < 7)
{
    i = 21; // just one line
}
else
{
   i = 9;
   f = 13.3;
}

// tiny lambda may go in one line
auto f = [] (int & i) { ++i; };

// long lambda must not
std::for_each(v.begin(), v.end(), [] (int & i)
{
    i += 17;
    // ...
});

Functions

General:

  • use return type auto only if it actually improves readability
  • use trailing return type only when strictly necessary

Spacing (independent of declaration, definition, or invocation):

  • no space before, no space after (
  • always a newline/; after ) except when using trailing return type

Line breaks:

  • you may always put different argument on individual lines for improved readability, especially in function definitions
  • if you put one argument on an extra line, put each on its separate line
  • if you do, also do this for template arguments if you have a function template
  • if all arguments are on individual lines, but you still exceed line length 120, move constexpr or inline and the return value to its separate line

Alignment:

  • if you have line breaks, align lines after opening (

  • place inline or constexpr before the return type of the function (note that in contrast to variable definitions, the constexpr keyword does not influence the return type of the function, so it doesn't go to the right of the type)

Empty lines in function body:

  • No double-newlines, ever.
  • Always empty newline before new scopes if they are not part of if or a loop.
  • Always empty newline after } that closes a scope.
  • Newline after for-loop body that doesn't have {} is highly recommended
  • No strict rules otherwise; attempt to improve readability.
  • e.g., if a function only has three statements, it may be OK to not have any empty lines. If it is longer, group statements.

Classes and structs

TODO (old guide: Classes and Structs)

Templates

General:

  • use typename, not class
  • don't use short concept forms
  • always put the requires clause on its separate line
  • indent requires clause by four spaces
  • if one part of the function header is aligned, align also the rest

Spacing:

  • no space after opening < and no space before closing >
  • template declaration has space before opening
  • template usage or specialization has no space before opening
  • no space between multiple closing >

Line breaks:

  • you may always put template parameters on individual lines for improved readability
  • you are required to this for function templates where the function parameters are also on individual lines

Alignment:

  • if you have line breaks, align lines after opening <

Examples

Functions and function templates

// small function, readable
inline void my_free_function(int const i, float const f)
{
    // ...
}

// larger header -> introduce linebreaks and alignment for readability
template <typename my_type,
          typename my_type2>
    requires std::is_integral_v<my_type> &&
             std::is_floating_point_v<my_type2>
inline void my_free_function(my_type const i,
                             my_type2 const f)
{
    // ...
}

Functions

We distinguish between

  • member functions of a class or struct, also called methods
  • free functions in the scope of namespace (not class/struct), also called global functions

First read the Chapter on Functions in the CoreGuidelines!

Function arguments and return types

Basics:

  • in-parameters are parameters that are only read from in the function
    1. shall be type variable (copy) if you want a copy inside function
    2. shall be type const variable for small built-ins, i.e., mainly arithmetic types
    3. shall be type const & variable for specific class types
    4. shall be type && variable for parameters with type deduction (templated parameters)
    5. shall be type const & /**/ for type-only parameters / tags
  • in-out-parameters are parameters that are both read from, and written to
    • shall be & in all cases
  • out-parameters are parameters that are only written to
    • shall be return values (if multiple return values, put them in std::tuple<>)
    • in case you need to specialize over the type of the out-parameter, treat it as in-out
  • ordering shall be 1. in-out, 2. in, 3. in (tags and pure type parameters)
void foobar(type3 & in_out, type4 const & in, tag_type const & /**/)

Reasons:

  • in-parameters:
    1. If you plan on copying the argument inside your function, you should instead copy it in the signature because this enables usage of the move-constructor, saving the copy operation if the function is passed a temporary. It also eases writing exception-safe code. [See also the copy and swap idiom]
    2. The specified types are smaller than references.
    3. Don't copy, since you don't have to; use const protection because you can.
    4. Since the type before the && is subject to type deduction, the type is not an rvalue reference, but a forwarding reference. This implies that it can resolve to &, && and also const & so it is more generic than only const &. This is especially important for objects that are not const-iterable like certain ranges.
    5. If you are not going to use an argument's value, omit the variable to enable compiler optimization.
  • out-parameters
    1. Since C++17 there is guaranteed copy elision on return values, so we don't need to worry about it and just return. There are also so-called structured bindings to easily access the return values.

Function argument number and order

Number of arguments:

  • should be ≀ 5
  • use ranges instead of individual begin + end iterators (if applicable)
  • use std::pairs and std::tuples instead of individual value parameters (if applicable)
  • use traits instead of individual type parameters (if applicable)

There is no strict policy on the order of arguments (e.g., "output before input"), use the following guideline:

  1. "data arguments" (e.g., a string that is being processed)
    • "input data arguments" come before "output data arguments" (but often the latter are return values anyway)
  2. "option arguments" (e.g., how the string shall be processed)
  3. type-only "option arguments" (e.g., tags or traits)

Also keep in mind that if you want to default some parameters, they need to be at the end. Execution policies are a special case and always come first.

Design your function signature so that there aren't too many possible interfaces, ideally 1-2, but not more than 3-5 (with and without defaults).

Function templates

  1. always constrain your template parameters!
  2. choose the least-constrained concept that works for your algorithm
  3. but enforce all the requirements that you actually have!
  4. use forwarding reference && instead of const &, also for read only parameters (see above)

Examples for 2.:

  • do not require a container_concept if a forward_range_concept is sufficient
  • do not require a random_access_sequence_concept if a sequence_concept is sufficient

Examples for 3.:

  • if you do require random access, make sure that you include the corresponding requirements!

Member functions

TODO?

Free functions

TODO?

Metafunctions - "functions" always evaluated at compile time

We distinguish

  • value metafunctions are functions or other constructs that return a value at compile time
  • type metafunctions are constructs that return a type

Value metafunctions

There are different ways to implement value metafunctions in C++:

  1. struct templates with enum definitions
  2. struct templates with static const or static constexpr data members
  3. free constexpr functions (only meta if evaluated in constexpr context)
  4. global constexpr variable templates

In SeqAn3 we use the style of the STL which is:

  • a (possibly constrained) struct template with static constexpr value member; and
  • a shortcut of the same name, suffixed with _v as a constexpr variable template

Example:

template <typename alphabet_type>
    requires detail::internal_alphabet_concept<alphabet_type>
struct alphabet_size
{
    static constexpr underlying_integral_t<alphabet_type> value = alphabet_type::value_size;
};

template <typename alphabet_type>
constexpr underlying_integral_t<alphabet_type> alphabet_size_v = alphabet_size<alphabet_type>::value;

Note that internally, you may of course use constexpr functions or other forms of metaprogramming, but the public interfaces shall be as specified here.

Type metafunctions

There are different ways to implement type metafunctions in C++:

  1. struct templates with typedef or using declarations
  2. global templatised using declarations
  3. calling decltype() on (constexpr) functions

In SeqAn3 we use the style of the STL which is:

  • a (possibly constrained) struct template with a local type alias; and
  • a global shortcut of the same name, suffixed with _t as a templatised using declaration

Example:

template <typename alphabet_type>
    requires detail::internal_alphabet_concept<alphabet_type>
struct underlying_integral
{
    using type = typename alphabet_type::integral_type;
};

template <typename alphabet_type>
using underlying_integral_t = typename underlying_integral<alphabet_type>::type;

Specializing type metafunctions

There are different ways to specialize type metafunctions:

The first case is especially handy for template subclassing, but it has the drawback that it does not work if regular inheritance is used (which is now more often the case because we rely on concepts in other places):

template <typename type>
struct is_foo : std::false_type
{};

template <typename ...>
struct is_foo<foo_impl<...>> : std::true_type
{};
//is_foo<foo_impl<int>> == true_type

template <typename t>
struct my_type : foo_impl<t>
{
//...
};
//is_foo<my_type<int>> == false_type

The desired behaviour can be achieved with a template template and constraints:

template <typename type>
struct is_foo : std::false_type
{};

template <template <typename...> type, typename ...types>
    requires std::is_base_of_v<foo_impl<types...>, type<types...>>
struct is_foo<type<types...>> : std::true_type
{};
//is_foo<foo_impl<int>> == true_type

template <typename t>
struct my_type : foo_impl<t>
{
//...
};

//is_foo<my_type<int>> == true_type

Exceptions

Rules for SeqAn3 on Exception-Safety

  • Always guarantee at least the basic exception guarantee (2)!
  • If you can, enforce the strong exception guarantee (3)
  • move construction, move assignment and swap should always be no-throw

See section Exception-Safety for details on exception safety.

Rules for SeqAn3 for the noexcept specifier

When do we use noexcept:

  • If we can ensure that everything within the function body can never throw
  • If it is critical that the function does not throw (move semantics)
    • Attempt to always make move construction, move assignment and swap noexcept!
    • Use the noexcept()-operator if necessary
  • If there is a measurable performance gain (tests!)

Note: Since explicitly defaulted constructors are noexcept if they can, do not explicitly declare them noexcept, except if you want to enforce this.

See section The noexcept specifier (C++11) for details on noexcept.

Related issue: #45 Related design discussion: 2020-03-30

Exception-Safety

Safety-Guarantee

  1. none or unknown
  2. basic (invariants of the component are preserved, and no resources are leaked)
  3. strong (if an exception is thrown there are no effects)
  4. no-throw (the code will never ever throw)

The noexcept specifier

Adding noexcept to your function declaration will tell the compiler: This function never throws!

void my_function() noexcept // "will never throw"
{
   // ...
}

Benefits:

  • Ensures the no-throw exception guarantee (see above)

    -> can be used accordingly (e.g., when using it in another function to ensure a strong exception guarantee)

  • The compiler may optimize your code (e.g., efficient move with std::move_if_noexcept)

Pitfalls:

What happens if you throw from a noexcept function? Your program terminates, usually without unwinding the stack.

  • Terminating the program isn't the nicest way to report an error to your user.
  • Terminating prevents any error handling
  • Removing noexcept can break the API

Take home message: Use noexcept if you are confident, avoid if in doubt.

The noexcept operator

If you are uncertain if something throws, you can use a conditional noexcept:

template <typename t>
int my_function(t & v) noexcept(noexcept(std::declval<t &>().size()))
{
    return v.size();
}

Which code is noexcept?

  • Functions that are declared noexcept
  • Construction of trivial types (e.g., int)
  • Explicitly defaulted constructor and assignment operator foo() = defaulted; (c++11) are implicitly noexcept and constexpr if they can (see stack overflow which references the standard)

Recommended reading

Thread Safety

There are 4 categories for a function:

  1. not thread-safe or unknown
  2. does not modify data (safe to be called from multiple threads, as long as no other functions modifies the data)
  3. modifies, but re-entrant (safe to be called from multiple threads, as long as the data is different – different parameters or member function on different object)
  4. thread-safe (always safe)

Some rules-of-thumb:

  • All const member functions should be 2 or 4.
  • All non-const member functions should be 3 or 4.
  • All free functions shall be 2 or 4 (if they take only copy or const & parameters) or 3 or 4 (otherwise)

Every function starts with 1., but should at least guarantee 2.

TODO: maybe rename to "data races"? Use other definitions?

Clone this wiki locally