forked from avehtari/stan-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
style-guide.Rmd
403 lines (299 loc) · 11.7 KB
/
style-guide.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
# Appendix 1. Stan Program Style Guide {-}
This chapter describes the preferred style for laying out Stan
models. These are not rules of the language, but simply
recommendations for laying out programs in a text editor. Although
these recommendations may seem arbitrary, they are similar to those of
many teams for many programming languages. Like rules for typesetting
text, the goal is to achieve readability without wasting white space
either vertically or horizontally.
## Choose a Consistent Style
The most important point of style is consistency. Consistent coding
style makes it easier to read not only a single program, but multiple
programs. So when departing from this style guide, the number one
recommendation is to do so consistently.
## Line Length
Line lengths should not exceed 80 characters.^[Even 80 characters may be too many for rendering in print; for instance, in this manual, the number of code characters that fit on a line is about 65.]
This is a typical recommendation for many programming language style
guides because it makes it easier to lay out text edit windows side by
side and to view the code on the web without wrapping, easier to view
diffs from version control, etc. About the only thing that is
sacrificed is laying out expressions on a single line.
## File Extensions
The recommended file extension for Stan model files is `.stan`.
For Stan data dump files, the recommended extension is `.R`, or
more informatively, `.data.R`.
## Variable Naming
The recommended variable naming is to follow C/C++ naming
conventions, in which variables are lowercase, with the underscore
character (`_`) used as a separator. Thus it is preferred to use
`sigma_y`, rather than the run together `sigmay`, camel-case
`sigmaY`, or capitalized camel-case `SigmaY`. Even matrix
variables should be lowercased.
The exception to the lowercasing recommendation, which also follows
the C/C++ conventions, is for size constants, for which the
recommended form is a single uppercase letter. The reason for this is
that it allows the loop variables to match. So loops over the indices of
an $M \times N$ matrix $a$ would look as follows.
```
for (m in 1:M)
for (n in 1:N)
a[m,n] = ...
```
## Local Variable Scope
Declaring local variables in the block in which they are used aids in
understanding programs because it cuts down on the amount of text
scanning or memory required to reunite the declaration and definition.
The following Stan program corresponds to a direct translation of a
BUGS model, which uses a different element of `mu` in each
iteration.
```
model {
real mu[N];
for (n in 1:N) {
mu[n] = alpha * x[n] + beta;
y[n] ~ normal(mu[n],sigma);
}
}
```
Because variables can be reused in Stan and because they should be
declared locally for clarity, this model should be recoded as follows.
```
model {
for (n in 1:N) {
real mu;
mu = alpha * x[n] + beta;
y[n] ~ normal(mu,sigma);
}
}
```
The local variable can be eliminated altogether, as follows.
```
model {
for (n in 1:N)
y[n] ~ normal(alpha * x[n] + beta, sigma);
}
```
There is unlikely to be any measurable efficiency difference
between the last two implementations, but both should be a bit
more efficient than the BUGS translation.
#### Scope of Compound Structures with Componentwise Assignment {-}
In the case of local variables for compound structures, such as
arrays, vectors, or matrices, if they are built up component by
component rather than in large chunks, it can be more efficient to
declare a local variable for the structure outside of the block
in which it is used. This allows it to be allocated once and then
reused.
```
model {
vector[K] mu;
for (n in 1:N) {
for (k in 1:K)
mu[k] = ...;
y[n] ~ multi_normal(mu,Sigma);
}
```
In this case, the vector `mu` will be allocated
outside of both loops, and used a total of `N` times.
## Parentheses and Brackets
### Optional Parentheses for Single-Statement Blocks {-}
Single-statement blocks can be rendered in one of two ways. The fully
explicit bracketed way is as follows.
```
for (n in 1:N) {
y[n] ~ normal(mu,1);
}
```
The following statement without brackets has the same effect.
```
for (n in 1:N)
y[n] ~ normal(mu,1);
```
Single-statement blocks can also be written on a single line, as
in the following example.
```
for (n in 1:N) y[n] ~ normal(mu,1);
```
These can be much harder to read than the first example. Only use this
style if the statement is simple, as in this example. Unless
there are many similar cases, it's almost always clearer to put
each sampling statement on its own line.
Conditional and looping statements may also be written without brackets.
The use of for loops without brackets can be dangerous. For instance,
consider this program.
```
for (n in 1:N)
z[n] ~ normal(nu,1);
y[n] ~ normal(mu,1);
```
Because Stan ignores whitespace and the parser completes a statement
as eagerly as possible (just as in C++), the previous program is
equivalent to the following program.
```
for (n in 1:N) {
z[n] ~ normal(nu,1);
}
y[n] ~ normal(mu,1);
```
### Parentheses in Nested Operator Expressions {-}
The preferred style for operators minimizes parentheses. This reduces
clutter in code that can actually make it harder to read expressions.
For example, the expression `a~+~b~*~c` is preferred to the
equivalent `a~+~(b~*~c)` or `(a~+~(b~*~c))`. The operator
precedences and associativities follow those of pretty much every
programming language including Fortran, C++, R, and Python; full
details are provided in the reference manual.
Similarly, comparison operators can usually be written with minimal
bracketing, with the form `y[n] > 0 || x[n] != 0` preferred to
the bracketed form `(y[n] > 0) || (x[n] != 0)`.
### No Open Brackets on Own Line {-}
Vertical space is valuable as it controls how much of a program you
can see. The preferred Stan style is as shown in the previous
section, not as follows.
```
for (n in 1:N)
{
y[n] ~ normal(mu,1);
}
```
This also goes for parameters blocks, transformed data blocks,
which should look as follows.
```
transformed parameters {
real sigma;
...
}
```
## Conditionals
Stan supports the full C++-style conditional syntax,
allowing real or integer values to act as conditions, as follows.
```
real x;
...
if (x) {
// executes if x not equal to 0
...
}
```
### Explicit Comparisons of Non-Boolean Conditions {-}
The preferred form is to write the condition out explicitly for
integer or real values that are not produced as the result of a
comparison or boolean operation, as follows.
```
if (x != 0) ...
```
## Functions
Functions are laid out the same way as in languages such as Java and
C++. For example,
```
real foo(real x, real y) {
return sqrt(x * log(y));
}
```
The return type is flush left, the parentheses for the arguments are
adjacent to the arguments and function name, and there is a space
after the comma for arguments after the first. The open curly brace
for the body is on the same line as the function name, following the
layout of loops and conditionals. The body itself is indented; here
we use two spaces. The close curly brace appears on its own line.
If function names or argument lists are long, they can be
written as
```
matrix
function_to_do_some_hairy_algebra(matrix thingamabob,
vector doohickey2) {
...body...
}
```
The function starts a new line, under the type. The arguments are
aligned under each other.
Function documentation should follow the Javadoc and Doxygen styles.
Here's an example repeated from the [documenting functions
section](#documenting-functions.section).
```
/**
* Return a data matrix of specified size with rows
* corresponding to items and the first column filled
* with the value 1 to represent the intercept and the
* remaining columns randomly filled with unit-normal draws.
*
* @param N Number of rows correspond to data items
* @param K Number of predictors, counting the intercept, per
* item.
* @return Simulated predictor matrix.
*/
matrix predictors_rng(int N, int K) {
...
```
The open comment is `/**`, asterisks are aligned below the first
asterisk of the open comment, and the end comment `*/` is also
aligned on the asterisk. The tags `@param` and `@return`
are used to label function arguments (i.e., parameters) and return
values.
## White Space
Stan allows spaces between elements of a program. The white space
characters allowed in Stan programs include the space (ASCII
`0x20`), line feed (ASCII `0x0A`), carriage return
(`0x0D`), and tab (`0x09`). Stan treats all whitespace
characters interchangeably, with any sequence of whitespace characters
being syntactically equivalent to a single space character.
Nevertheless, effective use of whitespace is the key to good program
layout.
### Line Breaks Between Statements and Declarations {-}
It is dispreferred to have multiple statements or declarations on the
same line, as in the following example.
```
transformed parameters {
real mu_centered; real sigma;
mu = (mu_raw - mean_mu_raw); sigma = pow(tau,-2);
}
```
These should be broken into four separate lines.
### No Tabs {-}
Stan programs should not contain tab characters. They are legal and
may be used anywhere other whitespace occurs. Using tabs to layout a
program is highly unportable because the number of spaces
represented by a single tab character varies depending on which
program is doing the rendering and how it is configured.
### Two-Character Indents {-}
Stan has standardized on two space characters of indentation, which is
the standard convention for C/C++ code. Another sensible choice is
four spaces, which is the convention for Java and Python. Just be
consistent.
### Space Between `if`, `{` and Condition
Use a space after `if`s. For instance, use `if (x < y) ...`, not
`if(x < y) ...`.
### No Space For Function Calls {-}
There is no space between a function name and the function it applies
to. For instance, use `normal(0,1)`, not `normal (0,1)`.
### Spaces Around Operators {-}
There should be spaces around binary operators. For instance, use
`y[1]~=~x`, not `y[1]=x`, use `(x~+~y)~*~z` not
`(x+y)*z`.
### Breaking Expressions across Lines {-}
Sometimes expressions are too long to fit on a single line. In that case, the recommended form is to break *before* an operator,^[This is the usual convention in both typesetting and other programming languages. Neither R nor BUGS allows breaks before an operator because they allow newlines to signal the end of an expression or statement.] aligning the operator to indicate scoping. For example, use the following form (though not the content; inverting matrices is almost always a bad idea).
```
target += (y - mu)' * inv(Sigma) * (y - mu);
```
Here, the multiplication operator (`*`) is aligned to clearly
signal the multiplicands in the product.
For function arguments, break after a comma and line the next
argument up underneath as follows.
```
y[n] ~ normal(alpha + beta * x + gamma * y,
pow(tau,-0.5));
```
### Optional Spaces after Commas {-}
Optionally use spaces after commas in function arguments for clarity.
For example, `normal(alpha * x[n] + beta,sigma)` can also be
written as `normal(alpha~*~x[n]~+~beta,~sigma)`.
### Unix Newlines {-}
Wherever possible, Stan programs should use a single line feed
character to separate lines. All of the Stan developers (so far, at
least) work on Unix-like operating systems and using a standard
newline makes the programs easier for us to read and share.
#### Platform Specificity of Newlines {-}
Newlines are signaled in Unix-like operating systems such as Linux and
Mac OS X with a single line-feed (LF) character (ASCII code point
`0x0A`). Newlines are signaled in Windows using two characters,
a carriage return (CR) character (ASCII code point `0x0D`)
followed by a line-feed (LF) character.