-
Notifications
You must be signed in to change notification settings - Fork 184
/
creating_linters.Rmd
322 lines (255 loc) · 14.1 KB
/
creating_linters.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
title: "Creating new linters"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Creating new linters}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
This vignette describes the steps necessary to create a new linter.
See the last section for some details specific to writing new linters for `{lintr}`.
A good example of a simple linter is the `pipe_call_linter`.
```r
#' Pipe call linter
#'
#' Force explicit calls in magrittr pipes, e.g.,
#' `1:3 %>% sum()` instead of `1:3 %>% sum`.
#'
#' @evalRd rd_tags("pipe_call_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export
pipe_call_linter <- function() {
xpath <- "//expr[preceding-sibling::SPECIAL[text() = '%>%'] and *[1][self::SYMBOL]]"
Linter(linter_level = "expression", function(source_expression) {
if (!is_lint_level(source_expression, "expression")) {
return(list())
}
xml <- source_expression$xml_parsed_content
if (is.null(xml)) return(list())
bad_expr <- xml2::xml_find_all(xml, xpath)
xml_nodes_to_lints(
bad_expr,
source_expression = source_expression,
lint_message = "Use explicit calls in magrittr pipes, i.e., `a %>% foo` should be `a %>% foo()`.",
type = "warning"
)
})
}
```
Let's walk through the parts of the linter individually.
## Writing the linter ##
```r
#' Pipe call linter
#'
#' Force explicit calls in magrittr pipes, e.g.,
#' `1:3 %>% sum()` instead of `1:3 %>% sum`.
```
Describe the linter, giving it a title and briefly covering the usages that are discouraged when the linter is active.
```r
#' @evalRd rd_tags("pipe_call_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export
```
These lines (1) generate a Tags section in the documentation for the linter^[NB: this is a helper function for generating custom Rd styling. See R/linter_tags.R.]; (2) link to the full table of available linters; and (3) mark the function for export. The most unfamiliar here is probably (1), which can be skipped outside of `lintr` itself.
```r
pipe_call_linter <- function() {
```
Next, we define the name of the new linter. The convention is to suffix all
linter names with `_linter`. All `_linter` functions are function factories that return a
closure that will do the actual linting function. We could define additional parameters
that are useful for the linter in this function declaration (see, e.g. `assignment_linter`), but
`pipe_call_linter` requires no additional arguments.
```r
xpath <- "//expr[preceding-sibling::SPECIAL[text() = '%>%'] and *[1][self::SYMBOL]]"
```
Here is the core linter logic. `xpath` is an XPath expression for expressions matching the discouraged usage.
`xpath` is saved inside the factory code (as opposed to inside the linter itself) for efficiency. Often,
the `xpath` will be somewhat complicated / involve some assembly code using `paste()` or `glue::glue()`[^See
`infix_spaces_linter()` for an example of this], in which case it is preferable to execute this code only
once when creating the linter; the cached XPath is then re-used on each expression in each file where the
linter is run.
Let's examine the XPath a bit more closely:
```xpath
//expr # global search (//) for 'expr' nodes (R expressions), at any nesting level
[ # node[...] looks for any 'node' satisfying conditions in ...
preceding-sibling:: # "siblings" are at the same nesting level in XML
SPECIAL[ # 'SPECIAL' is the parse token for infix operators like %% or %+%
text() = '%>%' # text() returns the string associated with this node
] #
and # combine conditions with 'and'
* # match any node
[1] # match the first such node
[self::SYMBOL] # match if the current node is a 'SYMBOL' (i.e., a 'name' in R)
] #
```
Taken together, that means we want to match `expr` nodes preceded by the `%>%` infix operator whose first child node is a `name`.
That maps pretty closely to the description of what the `pipe_call_linter` is looking for, but there is subtlety in mapping
between the R code you're used to and how they show up in the XML representation. `expr` nodes in particular take some practice
to get accustomed to -- use the plentiful XPath-based linters in `lintr` as a guide to get extra
practice^[The W3schools tutorials are also quite helpful; see https://www.w3schools.com/xml/xpath_intro.asp]. Note: `xml2`
implements XPath 1.0, which lacks some helpful features available in XPath 2.0.
```r
Linter(function(source_expression) {
```
This is the closure. It will be called on the `source_expression` variable that
contains the top level expressions in the file to be linted. The call to
`Linter()` gives this closure the class 'linter' (it also guesses the name of the linter; see `?Linter` for more details).
The raw text of the expression is available from `source_file$content`. However,
it is not generally possible to implement linters from the raw text -- consider
`equals_na_linter`. If we just look for `== NA` in the text of the file, we'll
generate many false positives, e.g. in comments (such as `# note: is.na() is the proper way to check == NA`)
or inside character literals (such as `warning("don't use == NA to check missingness")`).
We're also likely to generate false negatives, for example when `==` and `NA` appear on different lines!
Working around these issues using only the un-parsed text in every situation amounts to re-implementing the parser.
Therefore it is recommended to work with the tokens from `source_file$parsed_content` or `source_file$xml_parsed_content`,
as they are tokenized from the `R` parser. These tokens are obtained
from `parse()` and `utils::getParseData()` calls done prior to calling the new linter.
`getParseData()` returns a `data.frame` with information from the source parse
tree of the file being linted. A list of tokens is available from
[r-source/src/main/gram.y](https://github.com/r-devel/r-svn/blob/master/src/main/gram.y#L395-L412).
`source_file$xml_parsed_content` uses `xmlparsedata::xml_parse_data()` to convert the `getParseData()` output into an XML tree,
which enables writing linter logic in [XPath](https://www.w3schools.com/xml/xpath_intro.asp), a powerful language
for expressing paths within the nested XML data structure. Most linters in `lintr` are built using XPath because it is a
powerful language for computation on the abstract syntax tree / parse tree.
```r
if (!is_lint_level(source_expression, "expression")) {
return(list())
}
```
Here, we return early if `source_expression` is not the expression-level object. `get_source_expression()` returns an
object that parses the input file in two ways -- once is done expression-by-expression, the other contains all of the
expressions in the file. This is done to facilitate caching. Suppose your package has a long source file (e.g., 100s of expressions) --
rather than run linters on every expression every time the file is updated, when caching is activated `lintr` will
only run the linter again on expressions that have changed.
Note that this check is unnecessary because we provided `linter_level = "expression"` which guarantees that `source_expression` will be at the expression level and not at the file level.
Therefore, it is preferable to write expression-level linters whenever possible. Two types of exceptions observed in `lintr` are
(1) when several or all expressions are _required_ to ensure the linter logic applies (e.g., `conjunct_test_linter` looks for
consecutive calls to `stopifnot()`, which will typically appear on consecutive expressions) or (2) when the linter logic
is very simple & fast to compute, so that the overhead of re-running the linter is low (e.g., `single_quotes_linter`). In those cases,
use `is_lint_level(source_expression, "file")`.
```r
xml <- source_expression$xml_parsed_content
bad_expr <- xml2::xml_find_all(xml, xpath)
```
`source_expression$xml_parsed_content` is copied to a local variable
(this is not strictly necessary but facilitates debugging). Then `xml2::xml_find_all()` is used to execute the XPath on
this particular expression. Keep in mind that it is typically possible for a single expression to generate more than one
lint -- for example, `x %>% na.omit %>% sum` will trigger the `pipe_call_linter()` twice^[This is particularly important
if you want the `message` field in the resulting `Lint()` to vary depending on the exact violation that's found. For
`pipe_call_linter()`, the message is always the same. See `assignment_linter()` for an example where the `message` can vary.].
```r
xml_nodes_to_lints(
bad_expr,
source_expression = source_expression,
lint_message = "Use explicit calls in magrittr pipes, i.e., `a %>% foo` should be `a %>% foo()`.",
type = "warning"
)
```
Finally, we pass the matching XML node(s) to `xml_nodes_to_lints()`, which returns `Lint` objects corresponding to any "bad" usages
found in `source_expression`. See `?Lint` and `?xml_nodes_to_lints` for details about the arguments.
Note that here, the `message` for the lint is always the same, but for many linters,
the message is customized to more closely match the observed usage. In such cases, `xml_nodes_to_lint()` can conveniently
accept a function in `lint_message` which takes a node as input and converts it to a customized message.
See, for example, `seq_linter`.
```r
linter_level = "expression"
```
This is a more efficient way to implement the condition `if (!is_lint_level(source_expression, "expression"))`.
## Writing linter tests
(NB: this section uses the `assignment_linter()` which has simpler examples than `pipe_continuation_linter()`.)
`{lintr}` works best inside the `{testthat}` unit testing framework, in particular, `{lintr}`
exports `lintr::expect_lint()` which is designed as a companion to other testthat expectations.
You can define tests inside separate `test_that` calls. Most of the linters use the same default form.
```r
test_that("returns the correct linting", {
```
You then test a series of expectations for the linter using `expect_lint`.
Please see `?expect_lint` for a full description of the parameters.
The main three aspects to test are:
1. Linter returns no lints when there is nothing to lint, e.g.
```r
expect_no_lint("blah", assignment_linter())
```
2. Linter returns a lint when there is something to lint, e.g.
```r
expect_lint("blah=1",
rex("Use <-, not =, for assignment."),
assignment_linter()
)
```
3. As many edge cases as you can think of that might break it, e.g.
```r
expect_lint("fun((blah = fun(1)))",
rex("Use <-, not =, for assignment."),
assignment_linter()
)
```
You may want to test that additional `lint` attributes are correct,
such as the type, line number, column number, e.g.
```r
expect_lint("blah=1",
list(message = "assignment", line_number = 1, column_number = 5, type = "style"),
assignment_linter()
)
```
Finally, it is a good idea to test that your linter reports multiple lints if
needed, e.g.
```r
expect_lint("blah=1; blah=2",
list(
list(line_number = 1, column_number = 5),
list(line_number = 1, column_number = 13),
),
assignment_linter()
)
```
It is always better to write too many tests rather than too few.
## Other utilities for writing custom linters
Besides `is_lint_level()`, `{lintr}` also exports some other helpers generally useful
for writing custom linters; these are used a lot in the internals of our own helpers,
and so they've been tested and demonstrated their utility already.
* `get_r_string()`: Whenever your linter needs to examine the value of a character
literal (e.g., whether an argument value is set to some string), use this to
extract the string exactly as R will see it. This is especially important to make
your logic robust to R-4-style raw strings like `R"-(hello)-"`, which is otherwise
difficult to express, for example as an XPath.
* `xml_find_function_calls()`: Whenever your linter needs to query R function calls,
e.g. via the XPath `//SYMBOL_FUNCTION_CALL[text() = 'myfun']`, use this member of
`source_expression` to obtain the function call nodes more efficiently.
Instead of
```r
xml <- source_expression$xml_parsed_content
xpath <- "//SYMBOL_FUNCTION_CALL[text() = 'myfun']/parent::expr/some/cond"
xml_find_all(xml, xpath)
```
use
```r
xml_calls <- source_expression$xml_find_function_calls("myfun")
call_xpath <- "parent::expr/some/cond"
xml_find_all(xml_calls, call_xpath)
```
* `make_linter_from_xpath()` and `make_linter_from_function_xpath()`: Whenever your
linter can be expressed by a static XPath and a static message, use `make_linter_from_xpath()`
or, if the XPath starts with `//SYMBOL_FUNCTION_CALL`, use `make_linter_from_function_xpath()`.
Instead of `make_linter_from_xpath(xpath = "//SYMBOL_FUNCTION_CALL[text() = 'foo' or text() = 'bar']/cond")`,
use `make_linter_from_function_xpath(function_names = c("foo", "bar"), xpath = "cond")`.
## Contributing to `{lintr}`
### More details about writing tests for new `{lintr}` linters
The `{lintr}` package uses [testthat](https://github.com/r-lib/testthat) for
testing. You can run all of the currently available tests using
`devtools::test()`. If you want to run only the tests in a given file use the
`filter` argument to `devtools::test()`.
Linter tests should be put in the
[tests/testthat/](https://github.com/r-lib/lintr/tree/main/tests/testthat)
folder. The test filename should be the linter name prefixed by `test-`, e.g.
`test-pipe_continuation_linter.R`.
### Adding your linter to the default_linters ##
If your linter implements part of the tidyverse style guide you can add it to `default_linters`.
This object is created in the file `zzz.R` (this name ensures that it will always run after all
the linters are defined). Add your linter name to the `default_linters`
list before the `NULL` at the end, and add a corresponding test case to the test script
`./tests/testthat/default_linter_testcode.R`.
### Submit pull request ##
Push your changes to a branch of your fork of the
[lintr](https://github.com/r-lib/lintr) repository, and submit a pull
request to get your linter merged into lintr!