Baseline lint rules #11
a-frantz
started this conversation in
Rule Proposals
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
WDL Lint Rules
At the time of writing (April 2024), we consider all rules herein to be up for debate. Please open a discussion in this repository if you would like to suggest changes to these rules. Please keep it to one rule per discussion.
If you would like to propose a rule not contained in this document, start a new discussion please.
If you would like to implement a rule contained in this document, please leave a comment saying so and we, the maintainers, will respond with further instruction.
Rules wrapped in 🟢 are rules which we consider uncontroversial and more-or-less ready to be implemented.
Rules wrapped in 🟡 are rules which we think need more nuance added to them. The maintainers are potentially not in agreement on these rules. Gathering more opinions on yellow rules would be valuable.
Rules wrapped in 🔴 are rules which we've decided not to pursue for whatever reason. These are "won't dos".
Table of Contents
version_declaration_placement
import_placement
import_sort
blanks_between_elements
inconsistent_newlines
line_width
expression_spacing
input_spacing
snake_case
pascal_case
indentation
unwanted_whitespace
one_empty_line
newline_eof
comment_whitespace
preamble_comment
double_quotes
trailing_comma
key_value_pairs
section_missing
missing_runtime_block
missing_meta_block
missing_parameter_meta_block
missing_output_block
section_order
description_missing
nonmatching_output
matching_parameter_meta
input_not_sorted
disallowed_input_name
disallowed_output_name
no_curly_commands
mutable_container
immutable_container_not_tagged
file_coercion
runtime_failable_optional_coercion
runtime_failable_nonempty_coercion
incomplete_call
name_collision
unused_import
unused_declaration
select_nonoptional_array
deprecated_unknown_runtime_key
deprecated_placeholder_option
deprecated_object
Rules
🟢
version_declaration_placement
🟢The WDL version declaration is required for any v1 WDL document to be parsed. A missing version declaration will result in a validation error. However an incorrect placement of the decleration will be considered a lint warning. The version decleration should be the very first line in the document, unless there are preamble comments. In which case, there should be a blank line between the comment(s) and the version declaration. There should always be a blank line following the version declaration.
Group:
spacing
Example
Good:
version 1.1 ...
Also good:
🟢
import_placement
&&import_sort
🟢All import statements should follow the WDL version declaration (with one empty line between the version and the first import statement).
Import statements should be sorted by the lexicographical ordering (GNU
sort
withLC_COLLATE=C
) of each line. No extra white space is allowed between symbols or lines.import_placement
group:spacing
import_sort
group:sorting
Example
Good:
🟢
blanks_between_elements
🟢There should be a blank line between each WDL element at the root indentation level (such as the import block and any task/workflow definitions) and between sections of a WDL task or workflow. Never have a blank line when indentation levels are changing (such as between the opening of a workflow definition and the meta section). There should also never be blanks within a meta, parameter meta, input, output, or runtime section. See example for a complete WDL document with proper spacing between elements. Note the blank lines between meta, parameter meta, input, the first call or first private declaration, output, and runtime for the example task. The blank line between the workflow definition and the task definition is also important.
Group:
spacing
Example
Good:
🟢
inconsistent_newlines
🟢Files should not mix
\n
and\r\n
line breaks. Pick one and use it consistently in your project.🟢
line_width
🟢WDL lines should be less than or equal to 90 characters wide whenever possible. This line width restriction applies to embedded code in the
command
block. Exceptions would be long strings that WDL doesn't allow to be broken up within the meta and parameter meta sections. Another exception iscontainer
lines inside theruntime
block of a task. (See the rulesmutable_container
&&immutable_container_not_tagged
for more information about permittedcontainer
lines.)🟢
expression_spacing
🟢The following rules lead to consistently readable expressions. Note that the below rules only apply to places in WDL code where an expression is being evaluated. There are separate rules for whitespace in other code locations.
The following tokens should be surrounded by whitespace when used as an infix in an expression:
=
==
!=
&&
||
<
<=
>
>=
+
-
*
/
%
The following tokens should not be followed by whitespace (including newlines) when used as a prefix in an expression:
+
-
!
Opening brackets (either
(
or[
) should not be followed by a space but may be followed by a newline.Closing brackets (either
)
or]
) should not be preceeded by whitespace unless that whitespace is indentation.Sometimes a long expression will exceed the maximum line width. In these cases, one or more linebreaks must be introduced. Line continuations should be indented one more level than the beginning of the expression. There should never be more than one level of indentation change per-line.
If bracketed content (things between
()
or[]
) must be split onto multiple lines, a newline should follow the opening bracket, the contents should be indented an additional level, then the closing bracket should be de-indented to match the indentation of the opening bracket.If you are line splitting an expression on an infix operator, the operator and at least the beginning of the RHS operand should be on the continued line. (i.e. an operator should not be on a line by itself.)
If you are using the
if...then...else...
construct as part of your expression and it needs to be line split, the entire construct should be wrapped in parentheses (()
). The opening parenthesis should be immediately followed by a newline.if
,then
, andelse
should all start a line one more level of indentation than the wrapping paratheses. The closing parenthesis should be on the same level of indentation as the opening parenthesis.If you are using the
if...then...else...
construct on one line, it does not need to be wrapped in parentheses. However, if any of the 3 clauses are more complex than a single identifier, they should be wrapped in parentheses.Sometimes a developer will choose to line split an expression despite it being able to all fit on one line that is <=90 characters wide. That is perfectly acceptable, though you may notice in the below examples the single line form can be more readable. There is "wiggle" room allowed by the above rules. This is intentional, and allows developers to choose a more compact or a more spaced out expression.
Group:
spacing
Example
Complex example that fits on one line:
Same example with as much line splitting as permissible:
Another complex example written as compactly as permissible:
The same expression with as many newlines inserted as permissible:
🟡
input_spacing
🟡When making calls from a workflow, it is more readable and easier to edit if the supplied inputs are each on their own line. This does inflate the line count of a WDL document, but it is worth it for the consistent readability. An exception can be made (but does not have to be made), for calls with only a single parameter. In those cases, it is permissable to keep the input on the same line as the call.
When newline separating inputs to a call, always follow each input with a trailing comma. See examples below.
The assignment operator (
=
) should always be surrounded by whitespace.The
input:
token should always be on the same line and separated by one space from the opening curly bracket ({
), likeso:call foo_task { input:
. Never newline-separate the opening bracket ({
) from theinput:
token.If there is a singular input and it can fit on that same line while being equal to or under the 90 character width limit, all tokens should be separated by a single space, likeso:
call foo_task { input: bam = sorted_bam }
. Note that there is no trailing comma here. This could optionally be split into 3 lines, likeso:Note the trailing comma.
Group:
spacing
Example
Good (note that these examples take advantage of the input simplifying mechanism made available in the v1.1 specification):
🟡
snake_case
🟡Variables, tasks, and workflows should be in lowercase "snake_case".
Group:
naming
🟢
pascal_case
🟢Declared structs should be in "PascalCase".
Group:
naming
🔴
indentation
🔴Indentation should be 4 spaces and never tab literals.
🟢
unwanted_whitespace
🟢No whitespace on empty lines. No whitespace at the end of lines.
🟢
one_empty_line
🟢At most one empty line in a row. There should never be 2 or more blank lines in a row.
Group:
spacing
🟢
newline_eof
🟢The file must end with a newline.
Group:
spacing
🟡
comment_whitespace
🟡Comments on the same line as code should have 2 spaces before the
#
and one space before the comment text. Comments on their own line should match the indentation level around them and have one space between the#
and the comment text. Keep in mind that even comments must be kept below the 90 character width limit.Group:
spacing
Example
Good:
Bad:
🟢
preamble_comment
🟢Full line comments which start with a double-pound-sign (
##
) are given special treatment. They are called preamble-comments, and should only be used at the very beginning of a file. These special comments are meant as a place for documentation that doesn't belong in any of the language-supported meta sections. This is often whole file documentation or documentation for structs contained in the file.sprocket doc
is under active development at the time of writing. This tool will auto-document any WDL file in Markdown. Preamble-comments found bysprocket doc
will be re-written in the output.md
files (without the##
prefix). Arbitrary and multi-line Markdown constructions will be preserved and rendered.Preamble comments may only precede the version declaration. They are not allowed anywhere else.
🟢
double_quotes
🟢All quotes should be double quotes.
Group:
naming
Example
Good:
Bad:
🟡
trailing_comma
🟡All items in a comma-delimited object or list should be followed by a comma, including the last item. An exception is made for lists for which all items are on the same line, in which case there should not be a trailing comma following the last item. Note that single-line lists are not allowed in the
meta
orparameter_meta
sections. See rulekey_value_pairs
for more information.Example
Good:
Bad:
🟢
key_value_pairs
🟢All lists and objects in the
meta
andparameter_meta
sections should have one element per line (i.e. newline separate elements). A key/value pair are considered one element if the value is atomic (i.e. not a list or an object). Otherwise have the key and opening bracket on the same line; subsequently indent one level; put one value per line; and have the closing bracket on its own line at the same indentation level of the key.Lines with string values in the meta and parameter meta section are allowed to surpass the 90 character line width rule.
Example
Good:
Bad:
🟢
section_missing
&§ion_order
🟢For workflows, the following sections must be present and in this order:
meta
,parameter_meta
,input
, (body),output
. "(body)" represents all calls and declarations.For tasks, the following sections must be present and in this order:
meta
,parameter_meta
,input
, (private declarations),command
,output
,runtime
section_missing
group:completeness
section_order
group:sorting
🟢
description_missing
🟢The meta section should have a
description
of the task or workflow.The contents of the
description
will not be checked by the linter. However, we do have unenforced recommendations for what we believe makes a good description. It should be in active voice, beginning the first sentence with a verb. Each task/workflow is doing something. The first sentence should be a succinct description of what that "something" is. Feel free to use more than one sentence to describe your tasks and workflows. If you would rather keep yourdescription
entry succinct, you may write a more detailed entry under thehelp
key. Additional and arbitrarymeta
entries are permitted (includingexternal_help
,author
, andemail
keys).Group:
completeness
Example
Good:
🟢
nonmatching_output
🟢The
meta
section should have anoutput
key and keys with descriptions for each output of the task/workflow. These must match exactly. i.e. for each named output of a task or workflow, there should be an entry undermeta.output
with that same name. Additionally, these entries should be in the same order (that order is up to the developer to decide). No extraneous output entries are allowed. There should not be any blank lines inside the entiremeta
section.Group:
completeness
Example
Good:
🟢
matching_parameter_meta
&&input_not_sorted
🟢All inputs must have a corresponding parameter meta entry. No extraneous parameter meta entries are allowed. Inputs and parameter meta must be in the same order. No blank lines are allowed within either the input or parameter_meta blocks.
There are 2 levels of ordering for the input sections.
The high level ordering must be in this order:
Within each of the above 3 sections, follow this sort order based on variable type:
File
Array[*]+
Array[*]
struct
Object
Map[*, *]
Pair[*, *]
String
Boolean
Float
Int
For ordering of the same compound type (
Array[*]
,struct
,Map[*, *]
,Pair[*, *]
), drop the outermost type (Array
,Map
, etc.) and recursively apply above sorting on the first inner type*
, with ties broken by the second inner type. Continue this pattern as far as possible. Once this ordering is satisfied, it is up to the developer for final order of inputs of the same type.Does this sort order seem complex and hard to follow? Have no fear! You can rely on
sprocket
's auto-formatting capabilities to handle this for you! (At the time of writing, auto-formatting has not yet been implemented. But it's on the docket forsprocket
.)matching_parameter_meta
group:completeness
inputs_not_sorted
group:sorting
Example
Here is a complex set of inputs in the proper sort order. The corresponding
parameter_meta
has been ommitted due it's great length, but it follows the same order. The full example can be seen here. Note that the linkedstjudecloud/workflows
repository is not fully compliant with this document.🟢
disallowed_input_name
🟢Any input name matching these regular expressions will be flagged:
/^[iI]n[A-Z_]/
,/^input/i
or/^..?$/
. It is redundant and needlessly verbose to use an input's name to specify that it is an input. Input names should be short yet descriptive. Prefixing a name within
orinput
adds length to the name without adding clarity or context.Additionally, names with only 2 characters can lead to confusion and obfuscates the content of an output. Output names should be at least 3 characters long.Group:
naming
Example
All the below input names would be flagged:
🟢
disallowed_output_name
🟢Any output name matching these regular expressions will be flagged:
/^[oO]ut[A-Z_]/
,/^output/i
or/^..?$/
. It is redundant and needlessly verbose to use an output's name to specify that it is an output. Output names should be short yet descriptive. Prefixing a name without
oroutput
adds length to the name without adding clarity or context. Additionally, names with only 2 characters can lead to confusion and obfuscates the content of an output. Output names should be at least 3 characters long.Group:
naming
Example
All the below output names would be flagged:
🟢
no_curly_commands
🟢command
blocks should be wrapped with arrows (<<<
>>>
) instead of curly brackets ({
}
). Certain Bash constructions cause problems with parsing the bracket notation. There are no such problems with the arrow notation.Example
Good:
Bad:
🟡
mutable_container
&&immutable_container_not_tagged
🟡All tasks should run in an immutable container. This ensures reproducibility across time and environments.
wdl-grammar
andwdl-ast
will look for a:SHASUM
tag in yourcontainer
declarations and warn if one is missing. Asha
digest will always point to the exact same image, whereas tags are mutable. This mutability makes even versioned tags problematic when we want to ensure reproducibility.While the confidence in persistence gained by using
sha
digests for pulling container images is valuable, it comes at a cost: lack of human readability. So we enforce followingcontainer
strings with a comment which gives a human readable name to the image being pulled. It is imperative that thesha
digest and the human readable tag are kept in sync. It's extra overhead compared to just using a tag directly, but we consider it well worth it.Note that
container
lines are permitted to exceed the 90 character width limit.Group:
container
Example
Bad:
Good:
🟡
file_coercion
🟡String-to-File coercions should only be used in task output. Anywhere else will be flagged. Hardcoding filepaths like this might work in your local environment, but this is not a portable solution and should be discouraged.
Example
Good:
Bad:
🟢
runtime_failable_optional_coercion
🟢It is always allowed by the specification to coerce an optional type into a required type. However, these coercions may fail at runtime.
wdl-grammar
andwdl-ast
will warn if it's possible for your coercions to fail at runtime. There are a number of ways to prevent runtime failable coercions, and this document will not go through them all. However some of the common methods for preventing these failures are by wrapping the coercion inside anif (defined(X))
block or eliminating the coercion completely with aselect_first([])
statement.Example
Bad:
Better:
Although the above solves the lint warning issue, it's not very well written. It was a contrived example to illustrate one way to silence the lint warning (wrap the coercion in an
if (defined(<>))
block). We will leave it as an exercise for the reader what the "Best" option might be. (there are at least 2 ways: either rewrite thegreet
task to accept an optionalname
, or use the standard library'sselect_first()
function with a default name. Either implementation will eliminate the lint warning).🟢
runtime_failable_nonempty_coercion
🟢This rule is very similar
runtime_failable_optional_coercion
. It concerns coercing a potentially empty array (Array[*]
) into an non-empty arrayArray[*]+
. This may fail at runtime depending on user input. Much likeruntime_failable_optional_coercion
, we will not exhaustively cover solutions at this time (maybe in a future version of this document?).🟢
incomplete_call
🟢A workflow must provide every required input to a task. Technically, this is not a requirement of WDL which is why this rule is a lint warning instead of a validation error. The required inputs could be supplied in the input JSON with the following syntax:
workflow_name.task_name.required_input_name: <value>
. However this is confusing and error-prone when reading the workflow. All required inputs should be declared and supplied in the top-level workflow. Even if that input is only being used in one call. This lint warning will only trigger if the workflow has declaredallowNestedInputs: true
in themeta
block. If this declaration is not there, according to the specification the above method of supplying inputs to nested calls is disbarred. If that is the case,wdl-grammar
andwdl-ast
will throw a validation error instead of a lint warning.🟢
name_collision
🟢Name collisions of the same kind will result in a validation error, as it will be ambiguous to the execution engine which instance of the name you are referring to. However, name collisions between declarations of different kinds, such as a workflow name and a struct type, can be parsed and will not be a validation error. Instead they will fall under this lint warning, as they can be very confusing to readers (if not execution engines).
Example
Bad:
The above example may be difficult to read due to it's repeated use of the name
foo
, but it is perfectly parseable WDL that wouldn't cause an issue for an execution engine. To the engine, it is perfectly clear which instance offoo
is being referred to in each place. However, if you usewdl-grammar
andwdl-ast
, you will get multiple lint warnings over the name collisions onfoo
.The below example disambiguates the workflow from the task by adding suffixes (
_wf
and_task
respectively). It keepsfoo
as the name for the two inputs and takes advantage of a mechanism added by the v1.1 specification that allows us to drop the<input>=
if the task input (foo
) and the declaration in the current scope (foo
) share the same name. (See the "Call Statement" section for more information about this mechanism.)🟢
unused_import
🟢An import which is not used anywhere in the document. While relatively harmless, these unused imports can be confusing to readers and maintainers alike.
🟢
unused_declaration
🟢Nothing references a declaration. Much like unused imports, these can be confusing to readers and maintainers.
🟢
select_nonoptional_array
🟢It is unnecessary to use
select_first()
orselect_all()
on an array which is non-optional. This usage is confusing.Example
Bad:
Good:
🟢
deprecated_unknown_runtime_key
🟢Arbitrary "hints" were at one point allowed in the runtime section. However this has been deprecated. Additionally, the
docker
key has been deprecated in favor ofcontainer
. The list of reserved and allowed keys for the runtime section are:Keys implemented by every execution engine
container
cpu
memory
gpu
disks
maxRetries
returnCodes
"Hints" optionally implemented by execution engines**
maxCpu
maxMemory
shortTask
localizationOptional
inputs
outputs
Group:
deprecated
🟢
deprecated_placeholder_option
🟢The "placeholder options" of
sep
,true/false
, anddefault
have been deprecated in favor of other methods for acheiving similar effect. See the "good" example below for the alternatives.Group:
deprecated
Example
Bad:
Good:
🟢
deprecated_object
🟢The
Object
type has been deprecated in favor of usingStruct
types.Group:
deprecated
Example
Bad:
Good:
Beta Was this translation helpful? Give feedback.
All reactions