Skip to content

Style Guide

Patrick Schratz edited this page Sep 13, 2021 · 5 revisions

styler "mlr-style"

We have our own "mlr-style" which can be automatically applied to code via the styler package.

Follow these steps to format your code:

  1. Install {styler} (remotes::install_github("mlr-org/styler.mlr"))

  2. Apply the style either

    2.1 to the whole package: styler.mlr::style_pkg()

    2.2 to a specific file: styler.mlr::style_file(<file>)

    2.3 use the RStudio addin to style the "active file"

When using 2.3, make sure you've set the following option in .Rprofile:

options(styler.addins_style_transformer = "styler.mlr::mlr_style()")

You can make this more dynamic with the following

if (grepl("mlr", getwd()) || grepl("paradox", getwd())) {
  options(styler.addins_style_transformer = "styler.mlr::mlr_style()")
}

This setting uses the default tidyverse_style for all projects expect the ones that inherit "mlr" in the name.

Automatic styling for every commit

Styling can be automated via pre-commit which is a library that triggers useful pre-commit hooks. To use pre-commit, do the following:

  1. Install the package via install.packages("precommit")
  2. precommit::use_precommit()
  3. Adjust the created pre-commit-config.yml file to your needs. Especially changing the setting which does the styling is important - otherwise you'll style with the tidyverse style. You can also c/p one our existing configs, see here for an example.
  4. Every one in a while you should update the hooks. To do so, call precommit::autoupdate().

Theoretical style guide

We mainly use Hadley's Advanced R Style Guide with slight modifications and comments.

1. Language:

Code and documentation is always written in English, never in German, French or whatever. The same holds for file and directory names.

2. One command per line and semicolon:

Put every statement / command in its own line. Do not put a semicolon at the end of a statement. This is R not C.

Bad:

x = 1;
x = 1; y = 2; z = 3

3. Naming style

Name functions, variables and arguments in lowercase with a separating underscore, so my_arg = 1; do_that(my_arg). But R6 class names are in camelcase, e.g. MyNiceClass.

4. Assignment operator:

Use = instead of <- for assignments.

5. Comments:

Use a single # (not two ##), then one space, on the same level of indentation as the code you comment, to start a comment line. Usually, you should not put a comment on the same line as the code you comment. Combine meaningful identifier names and well written code, which is as self-documenting as possible, with short, precise lines of comments. Complicated stuff needs lengthier comments. No or too few comments are bad, but too verbose or unnecessary comments are also (less) bad. Usually, it is good style to prefix smaller "blocks of code", e.g., half a page of a for loop, where you "do a certain thing" with 1-2 comment lines that explain what is going to happen now.

6. Strings:

Define strings with double quotes, so "hello" instead of 'hello'.

7. Always use TRUE and FALSE instead of T and F.

8. Write integers as 1L instead of 1:

Because 1 actually means 1.0, a numeric, in R. One noteable exception is the sequence constructor : which always creates integers.

9. Write 1:3 instead of c(1:3):

1:3 is already a vector.

10. if, for, while statements:

Put a single space in between if, while, repeat and its following, opening parenthesis (. Do not write if (ok == TRUE) or if (ok == FALSE) if ok already is a boolean value, write if (ok) or if (!ok), respectively. If the body of the statement consists of only one line, the language allows us to omit the curly braces. This can be a good thing if it keeps the code together (less scrolling is better reading and understanding), but you should only use this when the code structure is very simple, not with, e.g., complicated, nested if statements. If you use it, always put the single line on a separate line and indent it. If in doubt, always use the braces. If you use curly braces with else, the curly brace before the else goes on the same line as the else.

Good:

if (condition) {
  ...
}

if (condition) {
  ...
} else {
  ...
}

for (i in 1:10) {
  ...
}

while (not.done) {
  ...
}

# in rare cases OK
if (condition)
  x = 1

Bad:

if(condition) {
  ...
}

if (condition) {
  ...
}
else {
  ...
}

11. Return statement:

Try to explicitly use the return statement and do not rely on R's convention to return the value of the last evaluated expression of the called function, especially if your function is longer and you return in multiple places. Deviate if your function is shorter, e.g., for short anonymous functions. The basic distinction is whether you have used an imperative or a functional coding style for the respective function. R allows both and I mix both styles heavily. If your function is more like a procedure, i.e., it has no meaningful return value, return invisible(NULL).

12. Use whitespace lines to structure your code:

Do not put arbitrary empty lines in your code, but instead use them sparsely to structure your code into "blocks of actions" that make sense. Usually, you want to put at least a short comment line before such a block that explains its contents, see next point. This structuring guides the reader and allows him to catch his breath.

13. Code distribution in files:

Try to put one single function definition into one .R file. Name the file like the function. If you have some very short helper functions you can deviate from this.

14. Function length and abstraction:

Good functions very often cover 1 to 3 screen pages. Of course, some complicated stuff sometimes is longer. If that happens, think about introducing another level of indirection, e.g., more functions or data types. Maybe this is a good time for refactoring? If your function or source file covers 5000 lines of code (Have seen those. Not just once.) you are doing it wrong - and your code will not be maintainable.

15. Local helper functions defined in a parent function:

Can be OK, if the inner function is only used in this context and pretty simple. Otherwise try to avoid.

16. FIXMEs:

If you discover something bad or suspicious, and you really don't have much time and it's a very local thing, comment the problem and add # FIXME:. Be precise in the description and err on the side of verbosity, otherwise other people (possibly including yourself) will not understand what you meant when they read this in the future. If you use a proper editor, it will help you searching through these issues later. In many cases it is a lot better to open an issue instead!

17. Imported packages and double-colons

If you import API from a foreign package, no not refer to it all of the time with ::. Use this only for suggested packages (then it's required) or in case of name-clashes.

Bad (in an extension package importing mlr3):

mlr3::train()
mlr3::predict()
mlr3::resample()

Good (in an extension package importing mlr3):

train()
predict()
resample()

Exceptions to the rules:

Intelligent and experienced people stick to their style definition 99% of the time and are able to recognize the 1% of cases where deviations are not only OK, but better. In case of doubt, stick to the law.